Back to blog

How to Build an AI Agent Backend with Open-Source Models

April 12, 20263 min read

What Makes an Agent Different from a Chatbot

A chatbot responds to messages. An agent takes actions.

The key difference is the tool loop: the model calls functions, gets results, and decides what to do next. This lets you build systems that browse the web, query databases, execute code, and make decisions autonomously.

The Architecture

User Request
    ↓
LLM (with tools defined)
    ↓
┌─ If tool_call → Execute tool → Feed result back → Loop
└─ If stop → Return final response

Every agent needs:

  1. An LLM that supports function calling
  2. A set of tools (functions the LLM can call)
  3. A loop that handles tool calls until the LLM is done

Choosing the Right Model

ModelStrengthCost/1K reqsBest For
kimi-k2.5Best tool calling~$1.09Complex multi-step agents
deepseek-v3Great value~$0.86Simple agents, budget builds
qwen-3.6-plusBest overall~$0.75General agents
qwen-3-32bFastest~$0.36Speed-critical agents

We recommend kimi-k2.5 for agents that need reliable function calling.

Building the Agent (Python)

from openai import OpenAI
import json

client = OpenAI(
    base_url="https://kymaapi.com/v1",
    api_key="ky-your-key"
)

# Define tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "search",
            "description": "Search for information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"}
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "calculate",
            "description": "Evaluate a math expression",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {"type": "string"}
                },
                "required": ["expression"]
            }
        }
    }
]

# Tool implementations
def execute_tool(name: str, args: dict) -> str:
    if name == "search":
        return f"Results for '{args['query']}': [your search results here]"
    if name == "calculate":
        return str(eval(args["expression"]))  # use a safe math parser in production
    return "Unknown tool"

# The agent loop
def run_agent(task: str, max_steps: int = 5) -> str:
    messages = [
        {"role": "system", "content": "You are a helpful agent. Use tools when needed. Think step by step."},
        {"role": "user", "content": task}
    ]

    for step in range(max_steps):
        response = client.chat.completions.create(
            model="kimi-k2.5",
            messages=messages,
            tools=tools,
        )

        choice = response.choices[0]
        messages.append(choice.message)

        # Agent is done
        if choice.finish_reason == "stop":
            return choice.message.content

        # Process tool calls
        if choice.message.tool_calls:
            for call in choice.message.tool_calls:
                args = json.loads(call.function.arguments)
                result = execute_tool(call.function.name, args)
                messages.append({
                    "role": "tool",
                    "tool_call_id": call.id,
                    "content": result
                })

    return messages[-1].content or "Max steps reached"

# Run it
answer = run_agent("What is the square root of 144 plus 25?")
print(answer)

Production Tips

Error handling. Wrap tool execution in try/catch. If a tool fails, return the error message so the LLM can adapt.

Max steps. Always set a limit. Without it, a confused model can loop forever. 5-10 steps covers most use cases.

Conversation history. Store messages in a database for multi-turn agent sessions. Use gemini-2.5-flash (1M context) if history gets long.

Cost control. Each step is a new API call. A 5-step agent run with kimi-k2.5 costs about $0.03 — roughly 33 agent runs per dollar.

Streaming. For real-time UIs, stream the final response. Use stream=True on the last call when finish_reason == "stop".

Cost Comparison

ProviderModelCost for 1K agent runs (5 steps each)
Kymakimi-k2.5~$30
Kymadeepseek-v3~$25
OpenAIgpt-4o~$150
Anthropicclaude-3.5-sonnet~$120

Open-source models via Kyma are 4-5x cheaper for agent workloads.

Next Steps