Tools: Building Architect: Real-time AI Interior Design With Gemini Live...
This post was created for the purposes of entering the Gemini Live Agent Challenge hackathon. #GeminiLiveAgentChallenge
ARCHITECT is a real-time AI interior design assistant. You point your phone camera at any room, talk to the agent naturally, and it generates photorealistic redesigns — all in real-time, all through voice.
The core premise: what if you had a talented interior designer who could literally see your room, understand your style preferences from a conversation, and instantly show you a reimagined version? That's ARCHITECT.
Most AI voice assistants are turn-based: you speak, you wait, it responds. Gemini's Live API is different — it's a persistent bidirectional stream where audio, video frames, and tool calls all flow simultaneously. This enabled an interaction pattern that wasn't possible before:
The single WebSocket carries everything: 16kHz PCM audio in, 24kHz PCM audio out, JPEG frames in, JSON events, and binary image payloads out. There's no "please hold while I process" — it's genuinely live.
The agent is built with Google's ADK (LlmAgent) wrapping Gemini 2.0 Flash Live as the underlying model. ADK handles the agent loop; Gemini handles multimodal understanding and tool call orchestration.
ADK's docstring-based schema inference is underrated — you write a clear docstring and it generates the JSON schema for tool calling automatically. No manual tools array.
The interesting architectural detail is the binary framing. Everything goes over one WebSocket:
For audio frames: header is {"type":"audio"}, payload is raw PCM. For camera frames: header is {"type":"frame"}, payload is JPEG bytes. For server-to-client audio: same protocol in reverse.
This lets the frontend handle audio, video, and events all in one onmessage handler without multiplexing connections.
Source: Dev.to