Device Microphone → PCM Audio (16kHz) → WebSocket → Cloud Run → Gemini Live API
↓ Haptic + Visual Alerts ← Function Call (trigger_alert) CODE_BLOCK: Device Microphone → PCM Audio (16kHz) → WebSocket → Cloud Run → Gemini Live API
↓ Haptic + Visual Alerts ← Function Call (trigger_alert) CODE_BLOCK: Device Microphone → PCM Audio (16kHz) → WebSocket → Cloud Run → Gemini Live API
↓ Haptic + Visual Alerts ← Function Call (trigger_alert) CODE_BLOCK: const triggerTool: FunctionDeclaration = { name: 'trigger_alert', description: "'Call this when an environmental sound or keyword matches alert categories.'," parameters: { type: Type.OBJECT, properties: { alert_id: { type: Type.STRING, description: "'The ID of the alert to trigger.' }," context: { type: Type.STRING, description: "'Short summary of what was heard.' }" }, required: ['alert_id'] } }; CODE_BLOCK: const triggerTool: FunctionDeclaration = { name: 'trigger_alert', description: "'Call this when an environmental sound or keyword matches alert categories.'," parameters: { type: Type.OBJECT, properties: { alert_id: { type: Type.STRING, description: "'The ID of the alert to trigger.' }," context: { type: Type.STRING, description: "'Short summary of what was heard.' }" }, required: ['alert_id'] } }; CODE_BLOCK: const triggerTool: FunctionDeclaration = { name: 'trigger_alert', description: "'Call this when an environmental sound or keyword matches alert categories.'," parameters: { type: Type.OBJECT, properties: { alert_id: { type: Type.STRING, description: "'The ID of the alert to trigger.' }," context: { type: Type.STRING, description: "'Short summary of what was heard.' }" }, required: ['alert_id'] } }; CODE_BLOCK: steps: - name: 'gcr.io/cloud-builders/docker' args: ['build', '-t', 'gcr.io/$PROJECT_ID/silentear-backend', '.'] - name: 'gcr.io/cloud-builders/docker' args: ['push', 'gcr.io/$PROJECT_ID/silentear-backend'] - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk' entrypoint: gcloud args: ['run', 'deploy', 'silentear-backend', '--image=gcr.io/$PROJECT_ID/silentear-backend', ...] CODE_BLOCK: steps: - name: 'gcr.io/cloud-builders/docker' args: ['build', '-t', 'gcr.io/$PROJECT_ID/silentear-backend', '.'] - name: 'gcr.io/cloud-builders/docker' args: ['push', 'gcr.io/$PROJECT_ID/silentear-backend'] - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk' entrypoint: gcloud args: ['run', 'deploy', 'silentear-backend', '--image=gcr.io/$PROJECT_ID/silentear-backend', ...] CODE_BLOCK: steps: - name: 'gcr.io/cloud-builders/docker' args: ['build', '-t', 'gcr.io/$PROJECT_ID/silentear-backend', '.'] - name: 'gcr.io/cloud-builders/docker' args: ['push', 'gcr.io/$PROJECT_ID/silentear-backend'] - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk' entrypoint: gcloud args: ['run', 'deploy', 'silentear-backend', '--image=gcr.io/$PROJECT_ID/silentear-backend', ...]
- Gemini Live API — for continuous bidirectional audio streaming with function calling
- Gemini 3 Flash — for scene intelligence, smart transcript refinement, and trigger auto-discovery
- Frontend: A responsive React 19 + TypeScript PWA styled with Tailwind CSS, integrating the Web Audio API for local FFT sound classification to reduce latency for high-frequency alarms.
- Backend: A Node.js + Express backend deployed on Google Cloud Run. I implemented a WebSocket proxy to manage bidirectional streaming audio required by the Gemini Live API.
- AI Integration: Using the @google/genai SDK, I stream live audio into Gemini with defined tools (function calling) such as trigger_alert. When Gemini detects critical sounds or phrases, it triggers events pushed through WebSocket directly to the frontend. Gemini 3 Flash via REST handles transcript refinement, environmental context interpretation, trigger grouping, and Voice Deck predictions.
- Data & Media: Supabase (PostgreSQL + Realtime + Storage) manages user profiles, custom SignMoji libraries, triggers, and caregiver dashboard synchronization. Cloud Firestore stores alert history, device status, and trigger configurations.
- Scene Analysis: Periodically summarizes what's happening around the user ("Two people are having a conversation nearby. Someone mentioned your name.")
- Transcript Refinement: Takes noisy, choppy speech fragments and reconstructs them into clean, readable sentences
- Trigger Auto-Discovery: Analyzes ambient audio patterns and suggests new alert categories the user might want
- Customizable Triggers — Users define their own alert words (doorbell, fire, baby, their name) with unique vibration patterns and colors
- Sign Language Videos — Alerts can include ASL, BSL, or PSL sign language video demonstrations
- SignMoji — A companion sign language library where users can record, search, or link sign videos with AI-generated icons, synced across devices
- Voice Deck — A text-to-speech tool with AI-powered phrase suggestions, letting deaf users "speak" through their device
- Caregiver Dashboard — Family members can monitor alerts in real time via Supabase real-time subscriptions
- Offline Mode — Falls back to browser Speech Recognition API when cloud isn't available
- Multi-Language — Supports 10 languages for transcript processing