Gemini 3.1 Flash Live: 오디오 AI를 더욱 자연스럽고 안정적으로 만들기

hackernews | | 🔬 연구
#flash live #gemini #gemini 3.1 #review #리뷰 #실시간 대화 #오디오 ai
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

구글은 차세대 음성 중심 AI를 위한 가장 고품질의 오디오 모델인 'Gemini 3.1 Flash Live'를 공개하고, 개발자 및 기업, 일반 사용자를 대상으로 서비스를 확대한다고 밝혔습니다. 이 모델은 복잡한 기능 호출 벤치마크에서 90.8%를 기록하는 등 독보적인 추론 능력과 작업 실행 능력을 자랑하며, 사용자의 감정 변화를 이해하고 대화 흐름을 최대 2배까지 유지하는 등 자연스러운 상호작용을 제공합니다. 또한 200개 이상 국가에서 실시간 대화가 가능하도록 검색 서비스를 전 세계로 확장하고, AI 생성 오디오 식별을 위한 워터마크 기술도 적용했습니다.

본문

Gemini 3.1 Flash Live: Making audio AI more natural and reliable Today, we’re advancing Gemini’s real-time dialogue capabilities with Gemini 3.1 Flash Live, our highest-quality audio and voice model yet. It delivers the speed and natural rhythm needed for the next generation of voice-first AI, offering a more intuitive experience for developers, enterprises and everyday users. 3.1 Flash Live is available across Google products: - For developers in preview via the Gemini Live API in Google AI Studio - For enterprises in Gemini Enterprise for Customer Experience - For everyone via Search Live and Gemini Live For developers: Robust reasoning and task execution We’ve improved 3.1 Flash Live’s overall quality, making it more reliable for developers and enterprises to build voice-first agents that can complete complex tasks at scale. On ComplexFuncBench Audio, a benchmark that captures multi-step function calling with various constraints, it leads with a score of 90.8% compared to our previous model. On Scale AI’s Audio MultiChallenge, Gemini 3.1 Flash Live leads with a score of 36.1% with “thinking” on. The benchmark specifically tests complex instruction following and long-horizon reasoning amidst the interruptions and hesitations typical of real-world audio. 3.1 Flash Live also has improved tonal understanding to deliver more natural dialogue. In Gemini Enterprise for Customer Experience, it’s even more effective at recognizing acoustic nuances like pitch and pace than 2.5 Flash Native Audio. It’s also better at dynamically adjusting its response to users' expressions of frustration or confusion. 3.1 Flash Live lets you build voice-ready agents that handle complex tasks in noisy environments. Illustrative demonstration built with Gemini 3.1 Pro, powered by Gemini 3.1 Flash Live. 3.1 Flash Live lets you use your voice to vibe code and quickly iterate. Illustrative demonstration built with Gemini 3.1 Pro, powered by Gemini 3.1 Flash Live. Companies like Verizon, LiveKit and The Home Depot have given positive feedback on 3.1 Flash Live in their workflows, highlighting its improved, natural conversation. For everyone: More natural and intuitive interactions In Gemini Live and Search Live, the 3.1 Flash Live model delivers more helpful and natural responses, whether you’re asking quick daily questions or engaging in more complex conversations. With the 3.1 Flash Live model under the hood, Gemini Live delivers faster responses compared to the previous model and it can follow the thread of your conversation for twice as long, keeping your train of thought intact during longer brainstorms. 3.1 Flash Live makes Gemini Live faster and more helpful 3.1 Flash Live is also inherently multilingual, which enables this week’s global expansion of Search Live. With this launch, people in more than 200 countries and territories can now have real-time, multimodal conversations with Search in their preferred language. Get real-time troubleshooting help using 3.1 Flash Live in Search Live Try Gemini 3.1 Flash Live All audio generated by 3.1 Flash Live is watermarked with SynthID. This imperceptible watermark is interwoven directly into the audio output, allowing the reliable detection of AI-generated content to help prevent misinformation. For more information on our approach to safety and responsibility, see the model card. Experience the naturalness and reliability of 3.1 Flash Live, starting today. We look forward to seeing how you interact and build with it.

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →