Real-time sign language to speech translator with plug-and-play gesture database. MediaPipe landmark detection, dual online/offline TTS, and virtual audio routing for video calls.
Over 70 million deaf individuals worldwide rely on interpreters for daily communication. Existing solutions are either expensive, cloud-dependent, or limited to ASL. This project targets a universal, plug-and-play approach — any sign language can be added by anyone.
The core insight was that hand signs vary across languages — ISL, ASL, BSL all have different gesture patterns. Training a YOLO model would require massive per-language datasets, making it impractical.
MediaPipe solved this elegantly: it extracts 21 hand landmark coordinates without any pre-trained gesture data. The system becomes plug-and-play — users can record new gestures in minutes, save them to SQLite, and share the database files. A community could build an entire language in days.
We chose Tkinter for the UI because the goal was to distribute it as a standalone .exe file — making it accessible to non-technical deaf users without requiring Python installation.
MediaPipe + SQLite gesture DB
No training needed. 21-point landmarks as features. Anyone can create/share gesture sets. True plug-and-play.
YOLO-based hand sign detection
Requires massive per-language training datasets. Different sign languages = different models. Not scalable for community use.
Web-based UI (Flask/React)
Requires server hosting. Goal was offline-first desktop app distributable as .exe for non-technical users.
MediaPipe Landmark Detection
Plug & Play Gesture Database
Dual TTS Pipeline
Virtual Audio Routing