Google DeepMind — June 2026

Fluid, natural voice translation with Gemini 3.5 Live Translate

Gemini 3.5 Live Translate is our latest audio model, delivering near real-time speech-to-speech translation in over 70 languages — preserving intonation, pacing, and pitch.

Rolling out across Google products and developer platforms

Gemini 3.5 Live Translate is available in public preview via the Gemini Live API and Google AI Studio, in private preview for Google Meet, and rolling out globally in Google Translate on Android and iOS.

Google AI Studio & Gemini Live API

Public preview for developers. Build real-time voice translation apps with the Gemini Live API. Integrate in minutes.

Google Meet

Speech translation in video meetings with 70+ languages and 2000+ language combinations. Private preview for Workspace customers.

Google Translate (Android & iOS)

Rolling out globally. Connect headphones for seamless tone-preserving translation. New listening mode on Android.

Developer Ecosystem

Platforms like Agora, Fishjam, LiveKit, Pipecat, and Vision Agents enable developers to build custom voice translation apps.

A continuous stream, not turn-by-turn

Unlike traditional translation systems that wait for a speaker to finish, Gemini 3.5 Live Translate processes speech as it is streamed — detecting languages automatically, handling multilingual input, and staying just seconds behind the speaker.

1

Streaming audio capture

The model ingests live audio in real time, automatically detecting the source language from 70+ supported languages without any manual configuration.

2

Real-time semantic translation

Gemini 3.5 processes meaning and intent, not just words. It preserves intonation, pacing, and pitch so the translated speech sounds natural and expressive.

3

Synthetic speech generation

The model generates smooth, natural-sounding translated speech output that maintains the speaker's vocal character — all within seconds of the original utterance.

4

Watermarked with SynthID

All generated audio is imperceptibly watermarked with SynthID, ensuring AI-generated content remains detectable and preventing misuse.

What makes 3.5 Live Translate different

From automatic language detection to noise robustness, the model is built for real-world voice translation scenarios.

Real-time streaming

Processes speech as it's streamed, staying just a few seconds behind the speaker — no awkward pauses.

🌐

70+ languages

Automatic detection and translation across 70+ languages without manual language selection.

🎷

Tone preservation

Maintains the speaker's intonation, pacing, and pitch — translation that sounds like you.

🎧

Noise robust

Built for loud environments. Handles background noise and multiple speakers with ease.

SynthID watermarking

Every output carries an imperceptible SynthID watermark for responsible AI deployment.

Developer API

Public preview on the Gemini Live API. Integrate real-time translation into any application.

🤝

2000+ combinations

In Google Meet, supports translation between any of 70+ languages — not just to and from English.

📱

Listening mode

On Android, hold phone to ear like a call for private, headphone-free translation.

What industry leaders are saying

Partners and developers testing Gemini 3.5 Live Translate share their first impressions.

"The ability to auto-detect multiple languages and produce accurate, low-latency translation is a breakthrough for multilingual communication."
Philipp Kandal
Chief Product Officer, Grab
"Gemini 3.5 Live Translate makes multilingual voice effortless. The latency and naturalness are remarkable."
Jesse Hall
Staff Developer Advocate, LiveKit
"Blown away by the speed, accuracy, and liveliness. This is what real-time translation should feel like."
Nash Ramdial
Director, Vision Agents
"Sets a new frontier for real-time multimedia streaming. The quality is exceptional."
Maciej Rys
VP Engineering, Software Mansion
"SOTA results with low latency and high accuracy that set a new bar for speech translation."
Mason Adams
Developer Evangelist, Agora
"The translation quality for global audiences is really promising. We are excited to partner with Google DeepMind."
Bella Baek
Chief AI Officer, CJ ENM

Frequently asked questions

Quick answers about Gemini 3.5 Live Translate, its capabilities, and availability.

What is Gemini 3.5 Live Translate?
Gemini 3.5 Live Translate is Google DeepMind's latest audio model for near real-time speech-to-speech translation. It processes live audio, detects 70+ languages automatically, and generates natural-sounding translated speech that preserves the speaker's intonation, pacing, and pitch — all within seconds of the original utterance.
How is this different from existing translation tools?
Unlike turn-by-turn translation systems that wait for a speaker to finish before translating, 3.5 Live Translate generates speech continuously, staying just seconds behind the speaker. It also preserves vocal tone, maintains natural pacing, and automatically handles multilingual input without manual language configuration.
What languages are supported?
Gemini 3.5 Live Translate supports over 70 languages for automatic detection and translation. In Google Meet, this enables over 2000 language combinations in a single meeting — covering translation between any supported language pair, not just to and from English.
Where can I use it today?
It is available in public preview for developers via the Gemini Live API and Google AI Studio. For enterprises, it is entering private preview in Google Meet. The Google Translate app on Android and iOS is rolling out globally with 3.5 Live Translate support.
Is the output watermarked?
Yes. All audio generated by Gemini 3.5 Live Translate is watermarked with SynthID, an imperceptible watermark woven into the audio output. This ensures AI-generated content remains detectable and helps prevent misinformation.
What hardware do I need to run it?
For end users, any smartphone or computer with the Google Translate app, Google Meet, or Google AI Studio will work. Developers can access the model via the Gemini Live API — cloud-based inference, no specialized local hardware required.