Gemini 2.5 Flash Native Audio
Google
gemini-2.5-flash-native-audio-preview-12-2025Conversational realtime with 30 pickable voices + 24 output languages. WebSocket-based, ephemeral token auth. Native audio understanding + generation in one round-trip.
Best for: Conversational voice agents, multilingual chat, voice picker UX
At a glance
Price
$0.0389 / min
Speed
fast
Quality tier
frontier open
Per minute of audio processed. Min billable: 1 minute.
Capabilities
Audio output
Returns audio bytes (mp3/wav) via the /v1/audio/* endpoints
Audio input
Accepts audio uploads for transcription or scene understanding
Quick start
Get API Keycurl https://kymaapi.com/v1/chat/completions \
-H "Authorization: Bearer $KYMA_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-flash-native-audio-preview-12-2025",
"messages": [{"role": "user", "content": "Hello"}]
}'Details
CreatorGoogle
Model IDgemini-2.5-flash-native-audio-preview-12-2025
Quality tierfrontier open
Cost tierbalanced
Input modalityAudio
Output modalityAudio
Try Gemini 2.5 Flash Native Audio now
$0.50 free credits on signup. No credit card required.