# Kyma API > LLM API gateway for open-source and frontier models. One endpoint, 73 models — 29 language/reasoning, 16 image, 11 video, 17 audio. OpenAI- and Anthropic-SDK compatible, multi-route redundancy with automatic failover. Pay per token for language models, per image for image generation, per second or per clip for video, and per character / minute / track / generation for audio. Cached prompt input is billed at 10% of the input rate. $0.50 free credit on signup, no card required. Base URL: `https://kymaapi.com/v1` ## Docs - [Introduction](https://docs.kymaapi.com/introduction): What Kyma API is and why it exists - [Quickstart](https://docs.kymaapi.com/quickstart): Get an API key and make your first request in 30 seconds - [Pricing](https://kymaapi.com/pricing): Per-model pricing for language, image, video, and audio - [Model Recommendations](https://docs.kymaapi.com/models/recommended): Which model to use for which task - [Model Aliases](https://docs.kymaapi.com/guides/model-aliases): Use "best", "fast", "code", "cheap" instead of exact model IDs - [FAQ](https://docs.kymaapi.com/faq): Common questions and answers ## SDK and Integration Guides - [Any OpenAI Client](https://docs.kymaapi.com/guides/any-openai-client): Drop-in replacement for any OpenAI-compatible SDK or tool - [Python](https://docs.kymaapi.com/guides/python): Full Python guide with streaming, async, error handling - [JavaScript](https://docs.kymaapi.com/guides/javascript): Node.js and browser integration - [Anthropic SDK](https://docs.kymaapi.com/guides/anthropic): Use Kyma with the Anthropic Messages API - [cURL](https://docs.kymaapi.com/guides/curl): Raw HTTP examples - [LangChain](https://docs.kymaapi.com/guides/langchain): LangChain ChatOpenAI integration - [Vercel AI SDK](https://docs.kymaapi.com/guides/vercel-ai-sdk): Next.js and Vercel AI SDK provider ## AI Coding Agent Setup - [Universal Agent Setup](https://docs.kymaapi.com/guides/agent-setup): One guide for all agents. Auto-config via /v1/config endpoint - [Cline](https://kymaapi.com/for/cline): Cline (VS Code) setup with Kyma - [Roo Code](https://kymaapi.com/for/roo-code): Roo Code setup with Kyma - [Cursor](https://kymaapi.com/for/cursor): Cursor IDE setup with Kyma - [Claude Code](https://kymaapi.com/for/claude-code): Claude Code setup with Kyma - [OpenClaw](https://kymaapi.com/for/openclaw): OpenClaw setup with Kyma - [Windsurf](https://docs.kymaapi.com/guides/windsurf): Windsurf setup with Kyma - [Aider](https://docs.kymaapi.com/guides/aider): Aider CLI setup with Kyma - [LangChain](https://kymaapi.com/for/langchain): LangChain ChatOpenAI setup with Kyma - [n8n](https://kymaapi.com/for/n8n): n8n no-code automation setup with Kyma - [Open WebUI](https://kymaapi.com/for/openwebui): Open WebUI connection setup with Kyma ## Features - [Streaming](https://docs.kymaapi.com/guides/streaming): SSE streaming for all models - [Prompt Caching](https://docs.kymaapi.com/guides/prompt-caching): 90% discount on cached input tokens - [Tool Calling](https://docs.kymaapi.com/guides/tool-calling): Function calling across all supported models - [Structured Outputs](https://docs.kymaapi.com/guides/structured-outputs): JSON mode and response_format - [Authentication](https://docs.kymaapi.com/guides/authentication): API keys (ky-) and session tokens (ks-) - [Error Handling](https://docs.kymaapi.com/guides/error-handling): Error codes, retry logic, rate limit headers ## Site Pages - [Pricing](https://kymaapi.com/pricing): Full pricing for every model - [Models](https://kymaapi.com/models): Browse all models with live performance data - [Image Generation API](https://kymaapi.com/image-generation): FLUX, Ideogram, Recraft, Imagen - [Video Generation API](https://kymaapi.com/video-generation): Kling, Seedance, Hailuo, Veo - [Text-to-Speech API](https://kymaapi.com/text-to-speech): ElevenLabs and MiniMax voices - [Speech-to-Text API](https://kymaapi.com/transcription): Whisper transcription - [Compare](https://kymaapi.com/compare): How Kyma compares to other gateways and direct APIs - [Rankings](https://kymaapi.com/rankings): Top models by tokens, speed, and uptime (live) - [Status](https://kymaapi.com/status): Live per-model availability - [Blog](https://kymaapi.com/blog): Guides on free LLM APIs, coding agents, image/video/audio - [Glossary](https://kymaapi.com/glossary): LLM gateway, router, prompt caching, failover, tool calling defined ## API Reference - [POST /v1/chat/completions](https://docs.kymaapi.com/api-reference/chat-completions): OpenAI-compatible chat completions - [GET /v1/models](https://docs.kymaapi.com/api-reference/models-list): List all models with pricing and capabilities - [POST /v1/images/generations](https://docs.kymaapi.com/api-reference/images-generations): Image generation (async) - [POST /v1/videos/generations](https://docs.kymaapi.com/api-reference/videos-generations): Video generation (async) - [POST /v1/audio/transcriptions](https://docs.kymaapi.com/api-reference/audio-transcriptions): Speech-to-text - [POST /v1/audio/speech](https://docs.kymaapi.com/api-reference/audio-speech): Text-to-speech - [POST /v1/auth/register](https://docs.kymaapi.com/api-reference/auth-register): Create account and get API key ## Discovery Endpoints (no auth required) - `GET /v1/models` — Full model list with pricing, context windows, capabilities - `GET /v1/models/recommend?usecase=coding` — Model recommendation by use case - `GET /v1/models/recommend?agent=cline` — Agent-specific recommendation with config example - `GET /v1/config` — Auto-configuration for agents (base_url, models, endpoints) - `GET /v1/capabilities` — Gateway capabilities summary - `GET /v1/credits/pricing` — Per-model pricing table - `GET /v1/status` — Service status - `GET /v1/models/uptime` — 7-day per-model uptime percentages ## Models All language models use the OpenAI chat completions format. Counts and prices below are generated from the live catalog. ### Language and reasoning models (per 1M tokens) - `sonar-pro` — Sonar Pro (Perplexity), 200K context (vision). $4.05 in / $20.25 out per 1M, cached input 10%. Perplexity's pro web-search model. Deeper multi-step search, 200K context, longer cited answers. Per-request search fee on top of tokens. - `qwen-3.7-max` — Qwen 3.7 Max (Alibaba), 1M context (tools, reasoning). $3.38 in / $10.13 out per 1M, cached input 10%. Alibaba's newest closed-weight flagship. 1M context, top reasoning + multilingual. - `deepseek-v4-pro` — DeepSeek V4 Pro (DeepSeek), 1M context (tools, reasoning). $2.35 in / $4.70 out per 1M, cached input 10%. 1.6T MoE flagship. 1M context. Top reasoning tier. - `gemini-3.5-flash` — Gemini 3.5 Flash (Google), 1M context (tools, vision, reasoning). $2.02 in / $12.15 out per 1M, cached input 10%. Newest Gemini Flash. 1M context, multimodal input. - `glm-5.1` — GLM 5.1 (Zhipu AI), 203K context (tools, reasoning). $1.89 in / $5.94 out per 1M, cached input 10%. #1 SWE-Bench Pro open-weight. 8-hour agentic runs. - `grok-4.3` — Grok 4.3 (xAI), 1M context (tools, reasoning). $1.69 in / $3.38 out per 1M, cached input 10%. xAI's frontier model. 1M context, strong reasoning + tool use. - `grok-build` — Grok Build (xAI), 256K context (tools, reasoning). $1.35 in / $2.70 out per 1M, cached input 10%. xAI's coding-specialized model. Fast, tool-native, built for agentic dev. - `sonar` — Sonar (Perplexity), 127K context (vision). $1.35 in / $1.35 out per 1M, cached input 10%. Perplexity's live web-search model. Returns current, cited answers — grounded in a real-time search of the web. Bills a small per-request search fee on top of tokens. - `kimi-k2.6` — Kimi K2.6 (Moonshot AI), 262K context (tools, vision, reasoning). $1.28 in / $5.40 out per 1M. Moonshot's newest. Agentic + vision + reasoning. 262K context. - `llama-3.3-70b` — Llama 3.3 70B (Meta), 128K context (tools, reasoning). $1.19 in / $1.19 out per 1M, cached input 10%. Most popular open model. Great all-rounder. - `deepseek-v3` — DeepSeek V3 (DeepSeek), 160K context (tools, reasoning). $0.81 in / $2.29 out per 1M. Previous-gen flagship. Stable, proven. - `deepseek-r1` — DeepSeek R1 (DeepSeek), 64K context (tools, reasoning). $0.743 in / $2.90 out per 1M. Top reasoning model. 96% cheaper than o1. - `qwen-3.6-plus` — Qwen 3.6 Plus (Alibaba), 131K context (tools, reasoning). $0.675 in / $4.05 out per 1M, cached input 10%. Alibaba's newest flagship. #1 on Kyma. - `qwen-3-coder` — Qwen 3 Coder (Alibaba), 131K context (tools, reasoning). $0.675 in / $2.16 out per 1M, cached input 10%. Purpose-built for code generation. - `gemini-3-flash` — Gemini 3 Flash (Google), 1M context (tools, vision, reasoning). $0.675 in / $4.05 out per 1M, cached input 10%. Newest Gemini. 1M context. - `nemotron-3-ultra-550b` — Nemotron 3 Ultra 550B (NVIDIA), 1M context (tools, reasoning). $0.675 in / $3.38 out per 1M, cached input 10%. NVIDIA's strongest US open-weight. 550B MoE (55B active), hybrid Mamba-Transformer. 1M context, 300+ tok/s. - `kimi-k2.5` — Kimi K2.5 (Moonshot AI), 262K context (tools, vision, reasoning). $0.675 in / $3.78 out per 1M. Multimodal agentic. 262K context. - `qwen-3.7-plus` — Qwen 3.7 Plus (Alibaba), 1M context (tools, vision, reasoning). $0.54 in / $2.16 out per 1M, cached input 10%. Alibaba's newest Plus flagship. 1M context, vision input, top agentic + reasoning. - `minimax-m2.5` — MiniMax M2.5 (MiniMax), 197K context (tools, reasoning). $0.405 in / $1.62 out per 1M, cached input 10%. SWE-bench 80.2%. Top agentic coding. - `minimax-m3` — MiniMax M3 (MiniMax), 1M context (tools, vision, reasoning). $0.405 in / $1.62 out per 1M, cached input 10%. MSA sparse attention. SWE-Bench Pro 59%, Terminal-Bench 66%. Agentic coding, 1M context, multimodal input. - `gemini-2.5-flash` — Gemini 2.5 Flash (Google), 1M context (tools, vision, reasoning). $0.405 in / $3.38 out per 1M, cached input 10%. Google's fastest. 1M context. - `minimax-m2.7` — MiniMax M2.7 (MiniMax), 205K context (tools, reasoning). $0.405 in / $1.62 out per 1M. Next-gen agentic productivity. - `qwen-3-32b` — Qwen 3 32B (Alibaba), 33K context (tools, reasoning). $0.392 in / $0.81 out per 1M, cached input 10%. Top coding model. Ultra fast inference. - `step-3.7-flash` — Step 3.7 Flash (StepFun), 256K context (tools, vision, reasoning). $0.27 in / $1.55 out per 1M, cached input 10%. StepFun's fast flash tier. 256K context, multimodal input, tool calling. Cheap throughput. - `gpt-oss-120b` — GPT-OSS 120B (OpenAI), 128K context (tools). $0.203 in / $0.81 out per 1M, cached input 10%. OpenAI's open source. 120B parameters. - `gemma-4-31b` — Gemma 4 31B (Google), 128K context (tools, vision). $0.189 in / $0.54 out per 1M, cached input 10%. Google's newest open model. Multimodal. - `deepseek-v4-flash` — DeepSeek V4 Flash (DeepSeek), 1M context (tools, reasoning). $0.189 in / $0.378 out per 1M, cached input 10%. 284B MoE. 1M context. Fast + cheap V4 tier. - `glm-4.5-air` — GLM 4.5 Air (Zhipu AI), 131K context (tools, reasoning). $0.176 in / $1.15 out per 1M, cached input 10%. Cheap agentic MoE (106B/12B active). Fast with implicit caching. - `glm-4.7-flash` — GLM 4.7 Flash (Zhipu AI), 203K context (tools, reasoning). $0.081 in / $0.54 out per 1M, cached input 10%. Ultra cheap. 200K context. Fast. ### Image generation - `minimax-image-01` — MiniMax Image 01 (MiniMax). $0.005 / image. Sub-cent image generation. Cheapest tier on Kyma — $0.005 per image flat regardless of resolution. 5 aspect ratios (1:1, 16:9, 9:16, 4:3, 3:4). Best for high-volume / budget workflows. - `imagen-4-fast` — Imagen 4 Fast (Google). $0.027 / image. Google Imagen 4 fast tier — quickest gen, lower fidelity. Photoreal, multi-style. - `flux-2-pro` — FLUX.2 Pro (Black Forest Labs). $0.0405 / image. BFL's 32B flagship (3× larger than Flux 1.1). Photoreal, multi-reference (up to 10 sources), unified gen+edit, ~60% accurate text-in-image. $0.03/MP base + $0.015 per extra MP. - `nano-banana` — Nano Banana (Google). $0.046 / image. Google Gemini image-gen. Native edit-mode (image-in + prompt → image-out). 3 size tiers (512/1K/2K). Cheapest at small sizes. - `nano-banana-3-flash` — Nano Banana 3 Flash (preview) (Google). $0.046 / image. Newer Gemini 3.1 image-gen, preview tier. Same edit-mode + tier pricing as stable; sharper output. Routed to Vertex global region. - `flux-kontext-pro` — FLUX.1 Kontext Pro (Black Forest Labs). $0.054 / image. Image-to-image edit and refinement. Mask + inpaint. - `recraft-v4` — Recraft V4 (Recraft). $0.054 / image. Top of HF Arena (#1, beats Midjourney V8 / DALL-E 3 / FLUX). Design-aware composition, lighting, textures. - `recraft-v3` — Recraft V3 (Recraft). $0.054 / image. Legacy. Recommend recraft-v4 for new projects (same price, top of HF Arena). - `imagen-4` — Imagen 4 (Google). $0.054 / image. Google Imagen 4 standard. Photoreal humans, sharp text, rich composition. Default quality tier. - `flux-1.1-ultra` — FLUX 1.1 Pro Ultra (Black Forest Labs). $0.081 / image. Legacy. Recommend flux-2-pro for new projects (cheaper at 1MP, higher quality, multi-reference). - `imagen-4-ultra` — Imagen 4 Ultra (Google). $0.081 / image. Google Imagen 4 highest fidelity. Best detail, slowest gen. Use for hero / print-ready assets. - `gpt-image-2` — GPT Image 2 (OpenAI). $0.081 / image. OpenAI's flagship image model (Apr 2026). Near-perfect text-in-image (multilingual), reasoning-augmented composition, photorealism. Quality tiers low/medium/high — picker default medium. 1024×1024 / 1024×1536 / 1536×1024 / 2048×2048. - `ideogram-v3` — Ideogram V3 (Ideogram). $0.108 / image. Text-in-image specialist. Best for typography, packaging, logos. - `recraft-v4-vector` — Recraft V4 Vector (Recraft). $0.108 / image. Native SVG output — actual paths + structured layers, edit in Figma/Illustrator. Only model on the market that ships true vector files. - `recraft-v4-pro` — Recraft V4 Pro (Recraft). $0.3375 / image. Recraft V4 at 4MP for print-ready / large-scale assets. Same design taste as V4, higher resolution. - `recraft-v4-vector-pro` — Recraft V4 Vector Pro (Recraft). $0.405 / image. Native SVG at 4MP for print-ready vector assets. Same as V4 Vector with higher detail / scale. ### Video generation - `kling-2.5-pro` — Kling 2.5 Pro (Kuaishou). $0.0945 / second. Cinematic 5-10s video. Photoreal humans, smooth motion. Cheapest Kling tier. T2V or I2V via image_url. - `kling-3-pro` — Kling 3 Pro (Kuaishou). $0.1512 / second. Flagship Kling. Photoreal humans, smooth motion, sharper than 2.5. T2V or I2V via image_url. For native audio, use kling-3-pro-audio. - `kling-3-pro-audio` — Kling 3 Pro (Audio) (Kuaishou). $0.2268 / second. Kling 3 Pro with native audio (ambient + dialogue). Same visuals as kling-3-pro plus synchronized sound. ~50% premium for audio. - `seedance-2-pro` — Seedance 2 Pro (ByteDance). $0.4096 / second. ByteDance flagship video. Multi-shot, native audio bundled, dynamic camera moves. T2V or I2V via image_url. 720p. - `seedance-2-fast` — Seedance 2 Fast (ByteDance). $0.3266 / second. Seedance 2 fast tier — quicker generation, ~20% cheaper than Pro. Native audio bundled. Best for short social clips. - `veo-3-fast` — Veo 3 Fast (Google). $0.135 / second. Google Veo 3 fast tier — 720p, no audio. Cheapest Veo. Balanced quality and speed for social/drafts. - `veo-3` — Veo 3 (Google). $0.54 / second. Google Veo 3 flagship — 1080p with native audio (dialogue + ambient + lip-sync). Top-quality cinematic clips. - `elevenlabs-music` — ElevenLabs Music (ElevenLabs). $0.135 / second. Prompt-driven music generation. Lyrics, instrumental, configurable duration up to 5 min. - `hailuo-02-512p` — Hailuo 02 (512p) (MiniMax). $0.14 / clip. MiniMax Hailuo 02 at 512p — cheapest video tier on Kyma. Flat $0.140 per clip (6s or 10s). T2V or I2V via image_url. Best for social shorts, rapid iteration, budget motion. - `hailuo-02-768p` — Hailuo 02 (768p) (MiniMax). $0.42 / clip. Hailuo 02 at 768p — mid tier balanced quality vs cost. Flat $0.420 per clip. T2V or I2V via image_url. - `hailuo-02-1080p` — Hailuo 02 (1080p) (MiniMax). $0.78 / clip. Hailuo 02 at 1080p — premium tier, full HD output. Flat $0.780 per clip. T2V or I2V via image_url. ### Audio — speech, transcription, music, sound - `whisper-v3-turbo` — Whisper Large v3 Turbo (OpenAI). $0.0009 / minute. Speech-to-text. 228x realtime inference. Transcripts with timestamps + language detect. - `gpt-4o-mini-transcribe-2025-12-15` — GPT-4o mini Transcribe (OpenAI). $0.004 / minute. Speech-to-text. OpenAI's premium quality STT — best real-world accuracy on conversational audio, noisy backgrounds, and code-switching (Vi/En etc). - `gemini-3-flash-audio` — Gemini 3 Flash (Audio) (Google). $0.0026 / minute. Audio understanding. Hears tone, music, SFX, language, speaker emotion — beyond pure transcription. Inline payload up to 30 min. - `gpt-realtime-translate` — GPT Realtime Translate (OpenAI). $0.0459 / minute. Native audio-to-audio translation with voice cloning. Preserves original speaker tone. 13 target languages (es/pt/fr/ja/ru/zh/de/ko/hi/id/vi/it/en). - `gemini-2.5-flash-native-audio-preview-12-2025` — Gemini 2.5 Flash Native Audio (Google). $0.0389 / minute. Conversational realtime with 30 pickable voices + 24 output languages. WebSocket-based, ephemeral token auth. Native audio understanding + generation in one round-trip. - `gemini-3.5-live-translate-preview` — Gemini 3.5 Live Translate (Google). $0.0635 / minute. Low-latency audio-to-audio speech translation. Near real-time speech-to-speech across 70+ languages, preserving the speaker's intonation, pacing, and pitch. WebSocket-based realtime session. - `eleven-multilingual-v2` — ElevenLabs Multilingual v2 (ElevenLabs). $0.405 / 1K characters. Hero-quality multilingual TTS. 29 languages, expressive voices, brand-safe consistent delivery. - `eleven-v3` — ElevenLabs v3 (ElevenLabs). $0.405 / 1K characters. Most expressive TTS. Emotional range, audio tags, and lifelike delivery across 70+ languages. - `eleven-flash-v2-5` — ElevenLabs Flash v2.5 (ElevenLabs). $0.2025 / 1K characters. Ultra-low-latency TTS, ~75ms time-to-first-byte. Half the per-char cost of Multilingual v2. 32 languages. - `eleven-turbo-v2-5` — ElevenLabs Turbo v2.5 (ElevenLabs). $0.2025 / 1K characters. Balanced TTS — quicker than Multilingual, better quality than Flash. Half cost vs Multilingual. 32 languages. - `elevenlabs-sfx` — ElevenLabs Sound Effects (ElevenLabs). $0.027 / generation. Generates non-speech audio (whoosh, explosion, rain) from a text prompt. Flat $0.027 per generation, 0.5-22 sec. - `minimax-speech-hd` — MiniMax Speech HD (MiniMax). $0.14 / 1K characters. MiniMax HD voice. Multilingual, expressive, ~2.9× cheaper than ElevenLabs Multilingual v2 at the same production quality tier. - `minimax-speech-turbo` — MiniMax Speech Turbo (MiniMax). $0.09 / 1K characters. MiniMax low-latency voice. Multilingual, ~2.2× cheaper than ElevenLabs Flash v2.5. Best for bulk TTS, real-time voice agents, conversational AI. - `minimax-music` — MiniMax Music (MiniMax). $0.045 / track. Lyrics-driven music generation. Music-2.0 family. Up to 5 minutes per call, ~90× cheaper than ElevenLabs Music for non-hero use cases. - `minimax-music-pro` — MiniMax Music Pro (MiniMax). $0.21 / track. Music-2.6 (latest pro family). Higher fidelity than Music-2.0, richer arrangements. Still ~19× cheaper than ElevenLabs Music for production-tier output. - `minimax-voice-clone` — MiniMax Voice Clone (MiniMax). $2.10 / generation. Clone a voice from a 10s-5min reference recording. Returns a voice_id usable in /v1/audio/speech with any MiniMax HD/Turbo SKU. Flat one-time charge per cloned voice. - `minimax-voice-design` — MiniMax Voice Design (MiniMax). $4.20 / generation. Generate a synthesized voice profile from a natural-language description (no reference audio needed). Returns a voice_id usable in /v1/audio/speech with any MiniMax HD/Turbo SKU. Flat one-time charge per designed voice. ## Model Aliases Send these as the model name and Kyma resolves to the best current model: - `best` → `qwen-3.6-plus` - `fast` → `qwen-3-32b` - `code` → `qwen-3-coder` - `cheap` → `gemini-2.5-flash` - `long-context` → `gemini-2.5-flash` - `vision` → `gemma-4-31b` - `reasoning` → `deepseek-r1` - `agent` → `kimi-k2.6` - `best-agent` → `kimi-k2.6` - `balanced` → `llama-3.3-70b` - `glm-flagship` → `glm-5.1` - `search` → `sonar` - `transcribe` → `whisper-v3-turbo` - `transcribe-quality` → `gpt-4o-mini-transcribe-2025-12-15` - `audio-understand` → `gemini-3-flash-audio` ## Key Facts - OpenAI SDK compatible: set base_url to https://kymaapi.com/v1 and use a ky- API key - Anthropic Messages API compatible: POST /v1/messages - Image generation: POST /v1/images/generations (async, poll GET /v1/jobs/{id}) - Video generation: POST /v1/videos/generations (async, poll GET /v1/jobs/{id}) - Audio: transcription (POST /v1/audio/transcriptions), text-to-speech (POST /v1/audio/speech), understanding (POST /v1/audio/understand) - Multi-route redundancy: automatic failover keeps user-facing errors near zero - Prompt caching: 90% discount on cached input tokens, automatic where supported - Tool calling and JSON structured outputs: supported on language models - Rate limits: tier-based, scaling with credits purchased - Free tier: $0.50 signup credit, no credit card required - Signup: https://kymaapi.com or POST /v1/auth/register ## Optional - [Changelog](https://docs.kymaapi.com/changelog): Recent updates - [Use Cases](https://docs.kymaapi.com/guides/use-cases/chatbot): Chatbot, coding agent, RAG, data extraction, automation - [MCP Server](https://docs.kymaapi.com/guides/mcp-server): Model Context Protocol server for Claude Desktop and Cursor - [Kyma Agent](https://docs.kymaapi.com/guides/agent): Terminal-first coding agent (@kyma-api/agent) - [Kyma Ter](https://kymaapi.com/ter): Local multi-agent workspace