Tools: Stop Copy-Pasting from Images: Build a Universal Screen Translator with Python

Tools: Stop Copy-Pasting from Images: Build a Universal Screen Translator with Python

Source: Dev.to

The Superpower We Wanted ## That meant it had to be: ## The Secret Sauce: How Lingo-Live Works ## Conclusion Lingo-Live started with a frustration I’m sure you’ve felt too. Have you ever tried copying text from a YouTube video? Or translating a Japanese error message inside a game? Yeah. You can’t. Because it’s not text — it’s just pixels. Most of us end up doing one of two things: It’s clunky. It breaks focus. And honestly, we can do better. So I built Lingo-Live — a sleek desktop app that lets you translate anything you see on your screen instantly. I didn’t want just another translation app. I wanted something that felt like a superpower. Press Ctrl + Alt + T, drag over any part of your screen, and boom — translated text appears on top of whatever you’re doing. Python made this possible. It’s basically a Swiss Army knife for building tools like this. Here’s how everything comes together. 1. The “Glass” Overlay The trickiest part was creating a window that stays on top without being annoying. I used CustomTkinter to build a frameless, translucent overlay that feels light and modern. The result feels less like an app and more like a layer on your desktop. When you trigger the hotkey, Lingo-Live doesn’t try to “read the screen.” Conceptually, it looks like this: That’s where the magic starts — turning images into actual text. 3. The Brain (Translation) Once OCR gives us something like こんにちは, we need a translation that actually makes sense. This is where Lingo.dev comes in. Instead of raw dictionary swaps, it handles context properly, which makes a huge difference — especially for UI text, error messages, and game dialogue. The result feels natural, not robotic. 4. The Voice (Text-to-Speech) Sometimes you don’t want to read. You just want to hear it. So I added Edge TTS, which uses the same high-quality voices found in Microsoft Edge. Now Lingo-Live can read translations out loud — great for pronunciation or just staying hands-free. “Fish are vertebrate animals that live in water…” 5. Leveling Up: AI Summarization Full translations are great, but sometimes you just want the gist. So I added a Summarize button powered by Google Gemini. 6. Make It Yours: Settings That Actually Matter I didn’t want Lingo-Live to feel rigid, so I built a full settings system backed by JSON. [- Change the hotkey (Alt + Z? Sure.) Best part? All changes apply instantly — no restarts, no reloads. Building your own tools is one of the most satisfying parts of being a developer. Lingo-Live solves a problem I run into constantly: text that’s trapped inside images, videos, and games. Instead of working around it, I built something that feels fast, modern, and genuinely useful. If you’ve ever rage-typed a foreign error message at 2 AM, this app is for you. Lingo.dev makes localization feel effortless—turning a painful, error-prone task into a smooth, developer-friendly experience. Check out the repo at https://github.com/Samar-365/lingo_live, clone the code, and stop copy-pasting from pixels. Special thanks to @sumitsaurabh927 and @maxprilutskiy for their continuous guidance throughout the hackathon and also for providing us this great opportunity. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse CODE_BLOCK: screenshot = ImageGrab.grab(bbox=(x1, y1, x2, y2)) text = ocr_engine.extract_text(screenshot) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: screenshot = ImageGrab.grab(bbox=(x1, y1, x2, y2)) text = ocr_engine.extract_text(screenshot) CODE_BLOCK: screenshot = ImageGrab.grab(bbox=(x1, y1, x2, y2)) text = ocr_engine.extract_text(screenshot) - painfully typing everything by hand, or - pulling out our phones and using Google Lens, holding it up to the screen like it’s 2010. - Invisible – runs quietly in the background - Instant – hit a hotkey, select an area, get a translation - Modern – glassy UI, dark mode, blur effects, no Windows-95 vibes - Always on top so translations stay visible - Semi-transparent so you can still see context underneath - Frameless — no ugly title bar; custom drag-and-drop instead - Lets you select a region - Takes a screenshot of just that area - Sends it to Tesseract OCR to extract text from the pixels - The translated text is sent to Gemini - It returns a clean, one-sentence summary - You get the point instantly Perfect for skimming foreign articles, long error messages, or RPG dialogue dumps. - Switch themes (dark mode is the correct choice) - Pick different fonts (Roboto > Segoe UI, fight me)](url)