LLM Function Calling: Giving Language Models a Way to Act
Function calling (also called tool use) is a capability that lets an LLM output structured commands — like get_weather(location='Cairo') — which your own code then executes, bridging the gap between what the model says and what it can do.
Why This Matters
A raw LLM is a text generator with a hard ceiling: it knows nothing about the world after its training cutoff, it cannot run computations, it cannot press buttons, and it cannot look things up in real time. Ask a stock LLM "What's the weather in Tokyo?" and it either guesses from memory (often wrong) or apologizes and says it can't browse the internet.
Function calling removes that ceiling. It turns the LLM from a talk-only oracle into a reasoning dispatcher that can decide when to call out to external tools, which tool to use, and what arguments to pass — then incorporate the result back into a coherent answer. This is the single most important mechanism behind every modern AI agent: Browser Use, code assistants, automation pipelines, and research agents all depend on it.
Prerequisites
Prerequisites
You should have a basic sense of what a large language model is (a neural network trained to predict and generate text). No deep ML knowledge needed. Also helpful: the idea that software can communicate with other software through structured requests and responses (APIs). Function calling is essentially an LLM speaking the language of APIs.
Core Idea
Imagine you have a very smart assistant who is locked in a soundproof room and can only send you written notes. You can ask them anything, but they can't actually do anything — no phone, no computer, no access to the outside world.
Function calling is like sliding a set of buttons and dials under the door. The assistant can now write back: "Please press the blue button labeled 'weather' and dial in 'Tokyo'." You press it, get the result, and slide it back in. The assistant reads it and gives you a complete, accurate answer.
The LLM does not execute the function. The LLM requests the function call. Your application code is the one that actually runs it.
sequenceDiagram
autonumber
participant User as 👤 User
participant App as 📱 Your Application
participant LLM as 🧠 LLM
participant Tool as ☁️ External Tool/API
rect rgb(212, 168, 83, 0.1)
User->>App: "What's the weather in Tokyo?"
App->>LLM: Prompt + function definitions
end
rect rgb(167, 139, 250, 0.1)
LLM->>App: Call get_weather(location="Tokyo")
App->>Tool: Execute get_weather("Tokyo")
end
rect rgb(52, 211, 153, 0.1)
Tool-->>App: {"temp": 22, "condition": "cloudy"}
App->>LLM: Function result + original prompt
LLM->>App: "It's 22°C and cloudy in Tokyo."
App->>User: "It's 22°C and cloudy in Tokyo."
end
How It Actually Works
The Three-Step Dance
Function calling follows a fixed choreography:
Step 1 — Declare the tools. You provide the LLM with JSON schemas describing each available function: its name, description, and expected parameters. For example:
{
"name": "get_weather",
"description": "Get current temperature and conditions for a city",
"parameters": {
"type": "object",
"properties": {
"location": { "type": "string", "description": "City name" },
"unit": { "type": "string", "enum": ["celsius", "fahrenheit"] }
},
"required": ["location"]
}
}
The LLM does not run this code. It receives the schema as part of its prompt (a system message describing what tools are available).
Step 2 — The LLM decides to call. When the user asks "What's the weather in Tokyo?", the LLM recognizes that a tool matches the intent. Instead of generating a plain text answer, it outputs a structured object — typically JSON — naming the function and its arguments:
{
"function": "get_weather",
"arguments": { "location": "Tokyo", "unit": "celsius" }
}
This is the critical distinction: the LLM doesn't execute anything. It produces a structured request for execution.
Step 3 — Your code executes and returns the result. Your application intercepts this structured output, calls the real API (e.g., a weather service), and sends the result back to the LLM as a follow-up message. The LLM then incorporates the tool's output into a final natural language response.
What Makes This Possible
LLMs are not inherently good at function calling — they have to be fine-tuned for it. The major model providers (OpenAI with GPT-4, Anthropic with Claude, Google with Gemini) all trained their models on datasets that pair natural language prompts with structured function call outputs. The model learns patterns like: when the user asks about real-time data, output a function call instead of a textual guess.
The fine-tuning also teaches the model to handle:
- Ambiguity: "What's the weather?" — the model asks for a location via a parameter.
- Simultaneous calls: Some tasks need multiple functions (search for flights, then check weather at the destination).
- Error handling: If a function returns an error, the model can decide to retry, ask for clarification, or fall back to a different tool.
Worked Example: Browser Use
In Browser Use (covered in another Q4bits article), every time the agent decides to click a button or type into a field, it's using function calling under the hood. The LLM receives a snapshot of the page (a numbered list of interactive elements) and a set of function declarations like:
| Function Name | Description | Example Call |
|---|---|---|
click_element | Click an element by its index | click_element(index=7) |
type_text | Type text into an input field | type_text(index=3, text="hello") |
scroll | Scroll the page | scroll(direction="down") |
extract_content | Get text from the page | extract_content(goal="find prices") |
The LLM looks at the page state, reasons: "I see a search bar at index 3 and I need to search for 'OLED TV'," then outputs:
{ "function": "type_text", "arguments": { "index": 3, "text": "65-inch OLED TV" } }
The Browser Use library catches this call, executes it against the real Chromium browser, takes a new snapshot, and loops back to the LLM for the next decision.
Common Misconceptions
"The LLM 'runs' the function."
It does not. The LLM proposes a function call; your code executes it. The model never sends network requests, opens files, or runs computations on its own. All it does is output structured text that describes what it wants done.
"Function calling is the same as plugins."
Plugins are a product abstraction built on top of function calling. The mechanism is the same: declare tools, let the model choose, execute, return results. Plugins add discoverability (users install them from a store) but the underlying architecture is identical.
"You need a complex framework to use function calling."
The core pattern is just: send a prompt with tool definitions, check if the response contains a function call, execute it, send the result back. This can be done in about 30 lines of Python with the OpenAI or Anthropic SDK. Frameworks like LangChain add orchestration and memory but the fundamental loop is simple.
"Function calling makes LLMs reliable."
It helps enormously, but the model can still hallucinate arguments (request
get_weatherwith a parameter that doesn't exist) or call the wrong function. Your code needs to validate function calls before executing them — never blindly trust the output.
Key Takeaways
- Function calling lets an LLM output structured commands (function name + arguments) instead of just free text.
- The LLM proposes the call; your code executes it and feeds the result back to the model.
- It requires models to be fine-tuned for this capability — not every LLM supports it. The major providers (OpenAI GPT-4, Anthropic Claude, Google Gemini) all do.
- This mechanism is the foundation of AI agents: browser automation (click, type, scroll), coding assistants (read file, run tests), research agents (web search, extract), and virtually every tool-using AI.
- Your application must validate function calls before executing them — the model can still hallucinate incorrect arguments.
References
- Prompt Engineering Guide — "Function Calling with LLMs" — promptingguide.ai
- Google Gemini Docs — "Introduction to Function Calling" — docs.cloud.google.com
- Apideck — "An Introduction to Function Calling and Tool Use" — apideck.com
- OpenAI Docs — "Function Calling" — platform.openai.com
- Anthropic Docs — "Tool Use" — docs.anthropic.com
Related
Agent Loops: Autonomous AI Coding Workflows
An agent loop is a system where an AI coding agent repeatedly works toward a specified goal without waiting for human approval between each step — you define the trigger and the stopping condition, and the agent runs autonomously.
Browser Use: Making Websites Accessible to AI Agents
Browser Use is a family of techniques and tools — led by the open-source Python library browser-use — that lets AI agents control a real web browser the way a human would, all described in natural language.
Headless Browsers: Browsers Without a Window
A headless browser is a full web browser (typically Chromium) that runs without any graphical user interface — no window, no tabs, no address bar — controllable entirely through code.