Live dialogue

Fluid and natural live dialogue and translation capabilities, for powerful voice-first applications.

Slide 1 of 4

Our most advanced audio models push new frontiers with intuitive inputs, intelligent understanding and natural expressiveness

Best for low-latency, fluid and natural vocal rhythm. Solves complex tasks while recognizing nuances in voices like pitch and pace.

Best for real-time speech-to-speech translation. Overcomes language barriers across 70+ languages while maintaining the speaker’s natural tone and rhythm.

Best for directing intonation and inflection. Intuitive audio tags give you granular command over style, pace, and tone with unprecedented precision.

Natural and powerful audio models. Helping people communicate, developers build, and enterprises manage business.

Talk in real-time. Control with precision. Understand every nuance.

Slide 1 of 3

Fluid and natural live dialogue and translation capabilities, for powerful voice-first applications.

Craft anything from short snippets to long-form narratives, with granular control over style, pace, delivery and performance.

Go beyond simple transcription, with models that identify who’s talking and understand the intent behind the words.

Our audio models generate natural vocals at speed and scale for different developer workflows.

Best for low-latency, fluid and natural vocal rhythm. Solves complex tasks while recognizing nuances in voices like pitch and pace.

Best for real-time speech-to-speech translation, overcoming language barriers across 70+ languages.

Best for directing intonation and inflection. Intuitive audio tags give you granular command over style, pace, and tone with unprecedented precision.

Explore what you can do with Gemini Audio

Showcasing Gemini Flash Live

Holds fluid and natural low-latency conversations while calling functions to manage multi-step and complex large-scale tasks.

Showcasing Gemini Flash TTS

Best for directing intonation and inflection. Intuitive audio tags give you granular command over style, pace, and tone with unprecedented precision.

Showcasing 3.5 Live Translate

Translates multiple languages in a single session, while preserving each speaker’s original intonation, pacing and pitch.

Building with responsibility at the core

We’ve proactively assessed potential risks during every stage of the development process for these native audio features, using what we’ve learned to inform our mitigation strategies. We validate these measures through rigorous internal and external safety evaluations, including comprehensive red teaming for responsible deployment.

All audio outputs from our models are marked with SynthID, our advanced watermarking technology, allowing you to detect whether speech has been created or edited using Google AI.

The fastest path from prompt to production

AI-powered video creation for work

Get started with cutting-edge AI models

Low-latency, real-time voice and video interactions with Gemini

Build, scale, and govern agents

Deploy specialized agents for product discovery, shopping, and customer service

Understand your world and communicate across languages