Google

Gemini 2.5 Flash Native Audio

Googlegemini-2.5-flash-native-audio-preview-12-2025

Conversational realtime with 30 pickable voices + 24 output languages. WebSocket-based, ephemeral token auth. Native audio understanding + generation in one round-trip.

Best for: Conversational voice agents, multilingual chat, voice picker UX

At a glance

Price

$0.0389 / min

Speed

fast

Quality tier

frontier open

Per minute of audio processed. Min billable: 1 minute.

Capabilities

Audio output

Returns audio bytes (mp3/wav) via the /v1/audio/* endpoints

Audio input

Accepts audio uploads for transcription or scene understanding

Quick start

Get API Key
curl https://kymaapi.com/v1/chat/completions \
  -H "Authorization: Bearer $KYMA_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash-native-audio-preview-12-2025",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Details

CreatorGoogle
Model IDgemini-2.5-flash-native-audio-preview-12-2025
Quality tierfrontier open
Cost tierbalanced
Input modalityAudio
Output modalityAudio

Try Gemini 2.5 Flash Native Audio now

$0.50 free credits on signup. No credit card required.