Function Calling (tool use primitive)
Function calling is the primitive that turns an LLM into something that can act. You describe tools as JSON schemas. The LLM decides when to call them and with what arguments, returning a structured function call you execute in your code. The result goes back to the LLM. This is the atom of every agent.
Function Calling (tool use primitive)
Watch or read first
- OpenAI function calling guide: https://platform.openai.com/docs/guides/function-calling
- Anthropic tool use guide: https://docs.anthropic.com/claude/docs/tool-use
- Daily Dose DS, "Building blocks of AI Agents - Tools" in the AI Engineering Guidebook (2025, paid): https://www.dailydoseofds.com/ai-engineering-guidebook/
TL;DR
Function calling is the primitive that turns an LLM into something that can act. You describe tools as JSON schemas. The LLM decides when to call them and with what arguments, returning a structured function call you execute in your code. The result goes back to the LLM. This is the atom of every agent.
The historical problem
Before function calling, making an LLM call an API meant:
- Prompt-engineer the model to emit a format like "CALL: search(query=X)"
- Parse that format with regex
- Hope the model stays on format
- Handle failures constantly
It worked in demos, broke at scale. The format drifted. Error handling was brittle.
OpenAI introduced function calling in June 2023. Anthropic followed in 2024. Google, Cohere, Mistral, open-source models adopted it. By 2026, function calling is a standard feature of every serious LLM API.
How it works
The three-step dance
1. You define tools (name, description, JSON schema for arguments).
2. You call the LLM with your prompt and the tools list.
The LLM replies with either:
- a text answer, OR
- a function call (tool name + arguments)
3. If a function call: you execute it in your code, send the result
back to the LLM, and loop until it returns a text answer.
Example: OpenAI API shape
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given city.",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
}]
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Weather in Paris?"}],
tools=tools,
)
# Model responds with a tool_call:
# { name: "get_weather", arguments: { "city": "Paris" } }
Example: Anthropic API shape
tools = [{
"name": "get_weather",
"description": "Get the current weather in a given city.",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}]
response = anthropic.messages.create(
model="claude-haiku-4.5",
messages=[{"role": "user", "content": "Weather in Paris?"}],
tools=tools,
)
# Response is a list of blocks. If tool_use, you get:
# { type: "tool_use", name: "get_weather", input: { "city": "Paris" } }
Anthropic calls it tool_use. OpenAI calls it tool_calls. Same mechanism.
Full round-trip loop
messages = [{"role": "user", "content": "Weather in Paris, in Celsius?"}]
while True:
response = llm.complete(messages, tools=tools)
if response.is_text():
return response.text
for tool_call in response.tool_calls:
result = execute_tool(tool_call.name, tool_call.arguments)
messages.append({"role": "assistant", "content": None, "tool_calls": [tool_call]})
messages.append({"role": "tool", "tool_call_id": tool_call.id, "content": result})
# loop
This is the heart of every ReAct agent. See react pattern.
Parallel tool calls
Since late 2023, OpenAI and Anthropic support parallel tool calls: the model returns multiple function calls in one turn. You execute them in parallel, return all results together.
# Response contains:
[
ToolCall(name="get_weather", args={"city": "Paris"}),
ToolCall(name="get_weather", args={"city": "Tokyo"}),
]
# You execute both concurrently, feed both results back.
This cuts latency for multi-tool tasks significantly.
Tool choice control
You can force the model's behavior:
auto(default): model decidesrequired(OpenAI) /any(Anthropic): model must call a toolnone: model must not call a tool- Specific tool: model must call THIS tool
Useful for structured extraction: define a single tool whose schema matches your output, force it.
Function calling vs MCP
Function calling is the API-level primitive. MCP (Model Context Protocol, Anthropic 2024-11) is a protocol for exposing tools.
Without MCP:
Your app -- embeds tool logic --> LLM
Each framework redefines tools its own way.
With MCP:
MCP server -- exposes tools --> MCP client -- converts to function calls --> LLM
Any app can use any MCP server.
Under the hood, the LLM still uses function calling. MCP is the transport/discovery/auth layer above it. See [[../06-mcp/README]] and agent protocols.
Relevance today (2026)
Function calling is a commodity
Every serious LLM in 2026 supports function calling:
- OpenAI (GPT-4o, o1, o3)
- Anthropic (Claude 3/4/4.5)
- Google (Gemini 2.5)
- Cohere (Command R+)
- Mistral (Large, Medium)
- Open-source: Llama 3.1+, Qwen 2.5+, DeepSeek
Quality varies. OpenAI and Anthropic set the bar. Open-source tools often need extra prompt engineering.
Structured outputs blurred the line
Both OpenAI and Anthropic offer structured output modes with grammar-constrained decoding. These use the same JSON schema machinery as tool calls. For pure extraction tasks, structured outputs are cleaner; for actions, tools remain the primitive. See structured outputs.
MCP changes how tools are shared
Before MCP: each team wrote its tools as Python/TS functions specific to its framework. After MCP: tools become network-accessible services. One MCP server for Gmail, one for GitHub, one for Stripe. Any agent can use any of them.
By late 2026, the MCP marketplace has hundreds of servers. Teams expose their internal tools via MCP for their own agents.
Multi-turn tool loops are cheap
With prompt caching (see prompt caching), the conversation history cost stays flat across a long agent loop. Tool-heavy agents went from expensive to routine.
Pitfalls have not disappeared
- The LLM still hallucinates tool arguments
- It still chooses the wrong tool sometimes
- Parallel tool calls need dependency management
- Tool errors must be surfaced, not buried
Function calling is mature, not magical.
Critical questions
- Why not just parse free-form "call this function" text? (Function calling uses grammar-constrained decoding in many providers; the JSON is guaranteed valid. Free-form parsing is brittle.)
- What if the LLM calls a non-existent tool? (Most providers reject it at generation. Some leak. Your code must handle unknown tool names.)
- Should descriptions be long or short? (Specific, example-rich, 1-3 sentences. Tool name alone is often ambiguous.)
- How do you decide if your app needs tools or structured outputs? (Structured outputs for "extract fields from this text". Tools for "take an action based on the user's intent".)
- Can you combine tool use with reasoning models? (Yes, on OpenAI o3 and Claude Opus 4.5 thinking. The model thinks before calling tools. Slower, more reliable.)
- Why do some providers require you to wrap tool_call IDs in responses? (So the LLM can match results to calls when multiple tools fired in parallel.)
Production pitfalls
- Too many tools. Accuracy drops past 5-10 tools. Cluster by agent, or use dynamic tool selection (describe top-k relevant tools only).
- Ambiguous descriptions. LLM picks the wrong tool. Disambiguate with examples.
- Missing required arguments. The LLM can still hallucinate missing fields. Validate strictly.
- Tool output too long. A 50KB HTML page as tool output blows the context. Summarize or truncate tool results.
- No retry / timeout. External APIs fail. Wrap with retry + backoff; report failures to the LLM as observations, not exceptions.
- Destructive tools with no confirmation.
delete_useraccessible in a chat agent = disaster. Add a confirm step. - No tool-level guardrails. Model calls
send_emailwith content you never approved. Add middleware that checks every tool call against policy. - Schema drift between LLM providers. Porting from OpenAI to Anthropic requires schema translation. MCP helps standardize.
- Hallucinated tool names after long conversations. Periodically re-inject the tool list in the system prompt if the conversation is very long.
Alternatives / Comparisons
| Approach | When |
|---|---|
| Prompt-only ("say CALL: search(X)") | Pre-2023 hack. Do not use. |
| Function calling (native API) | Default for 2026 agents |
| Structured outputs | Pure extraction, no action |
| MCP over function calling | Multi-agent, reusable tools |
| Python REPL as a tool | Code-heavy tasks (pandas, math) |
| Direct code generation (level 5) | Advanced agents, more risk |
Mental parallels (non-AI)
- Remote control for appliances: the LLM is the couch user. The tool schema is the remote's button layout. Pressing a button produces a defined effect. Without the remote (tools), the user can only complain. With it, they can act.
- CLI vs GUI: function calling is the CLI for LLMs. Structured, typed, discoverable. Raw text prompting is the GUI - flexible, less precise.
- REST APIs in a web app: each tool is an endpoint. The schema is the contract. The LLM is the client.
- Delegation: like when you tell a junior engineer "use these 5 tools exactly; any other way, come back to me". Function calling enforces this.
Mini-lab
labs/function-calling/ (to create):
- Build a small agent with three tools:
search_web(Tavily)read_url(fetch + strip)save_note(append to SQLite)
- Give it a task: "Research the current state of MCP adoption and save a one-page summary."
- Use OpenAI first, then port to Anthropic. Log schema differences.
- Expose the same tools via an MCP server. Rewrite the agent to consume them via MCP. Compare code size and reusability.
Stack: uv, openai, anthropic, tavily-python, modelcontextprotocol Python SDK.
Further reading
Canonical
- OpenAI function calling guide - https://platform.openai.com/docs/guides/function-calling
- Anthropic tool use guide - https://docs.anthropic.com/claude/docs/tool-use
- Gemini function calling - https://ai.google.dev/gemini-api/docs/function-calling
Related in this KB
- what is an agent
- agent building blocks
- react pattern
- agent protocols
- json prompting
- structured outputs
- [[../06-mcp/README]]
Frameworks
- LangChain
@tooldecorator: https://python.langchain.com/docs/how_to/custom_tools/ - LlamaIndex
FunctionTool: https://docs.llamaindex.ai/en/stable/module_guides/deploying/agents/tools/ - CrewAI
BaseTool: https://docs.crewai.com/concepts/tools - PydanticAI
@agent.tool: https://ai.pydantic.dev/tools/ - OpenAI Agents SDK: https://openai.github.io/openai-agents-python/