LLM Function Calling: Giving Language Models a Way to Act

Why This Matters

A raw LLM is a text generator with a hard ceiling: it knows nothing about the world after its training cutoff, it cannot run computations, it cannot press buttons, and it cannot look things up in real time. Ask a stock LLM "What's the weather in Tokyo?" and it either guesses from memory (often wrong) or apologizes and says it can't browse the internet.

Function calling removes that ceiling. It turns the LLM from a talk-only oracle into a reasoning dispatcher that can decide when to call out to external tools, which tool to use, and what arguments to pass — then incorporate the result back into a coherent answer. This is the single most important mechanism behind every modern AI agent: Browser Use, code assistants, automation pipelines, and research agents all depend on it.

Prerequisites

Prerequisites
You should have a basic sense of what a large language model is (a neural network trained to predict and generate text). No deep ML knowledge needed. Also helpful: the idea that software can communicate with other software through structured requests and responses (APIs). Function calling is essentially an LLM speaking the language of APIs.

Core Idea

Imagine you have a very smart assistant who is locked in a soundproof room and can only send you written notes. You can ask them anything, but they can't actually do anything — no phone, no computer, no access to the outside world.

Function calling is like sliding a set of buttons and dials under the door. The assistant can now write back: "Please press the blue button labeled 'weather' and dial in 'Tokyo'." You press it, get the result, and slide it back in. The assistant reads it and gives you a complete, accurate answer.

The LLM does not execute the function. The LLM requests the function call. Your application code is the one that actually runs it.

sequenceDiagram
    autonumber
    participant User as 👤 User
    participant App as 📱 Your Application
    participant LLM as 🧠 LLM
    participant Tool as ☁️ External Tool/API

    rect rgb(212, 168, 83, 0.1)
        User->>App: "What's the weather in Tokyo?"
        App->>LLM: Prompt + function definitions
    end
    rect rgb(167, 139, 250, 0.1)
        LLM->>App: Call get_weather(location="Tokyo")
        App->>Tool: Execute get_weather("Tokyo")
    end
    rect rgb(52, 211, 153, 0.1)
        Tool-->>App: {"temp": 22, "condition": "cloudy"}
        App->>LLM: Function result + original prompt
        LLM->>App: "It's 22°C and cloudy in Tokyo."
        App->>User: "It's 22°C and cloudy in Tokyo."
    end

How It Actually Works

The Three-Step Dance

Function calling follows a fixed choreography:

Step 1 — Declare the tools. You provide the LLM with JSON schemas describing each available function: its name, description, and expected parameters. For example:

{
  "name": "get_weather",
  "description": "Get current temperature and conditions for a city",
  "parameters": {
    "type": "object",
    "properties": {
      "location": { "type": "string", "description": "City name" },
      "unit": { "type": "string", "enum": ["celsius", "fahrenheit"] }
    },
    "required": ["location"]
  }
}

The LLM does not run this code. It receives the schema as part of its prompt (a system message describing what tools are available).

Step 2 — The LLM decides to call. When the user asks "What's the weather in Tokyo?", the LLM recognizes that a tool matches the intent. Instead of generating a plain text answer, it outputs a structured object — typically JSON — naming the function and its arguments:

{
  "function": "get_weather",
  "arguments": { "location": "Tokyo", "unit": "celsius" }
}

This is the critical distinction: the LLM doesn't execute anything. It produces a structured request for execution.

Step 3 — Your code executes and returns the result. Your application intercepts this structured output, calls the real API (e.g., a weather service), and sends the result back to the LLM as a follow-up message. The LLM then incorporates the tool's output into a final natural language response.

What Makes This Possible

LLMs are not inherently good at function calling — they have to be fine-tuned for it. The major model providers (OpenAI with GPT-4, Anthropic with Claude, Google with Gemini) all trained their models on datasets that pair natural language prompts with structured function call outputs. The model learns patterns like: when the user asks about real-time data, output a function call instead of a textual guess.

The fine-tuning also teaches the model to handle:

Ambiguity: "What's the weather?" — the model asks for a location via a parameter.
Simultaneous calls: Some tasks need multiple functions (search for flights, then check weather at the destination).
Error handling: If a function returns an error, the model can decide to retry, ask for clarification, or fall back to a different tool.

Worked Example: Browser Use

In Browser Use (covered in another Q4bits article), every time the agent decides to click a button or type into a field, it's using function calling under the hood. The LLM receives a snapshot of the page (a numbered list of interactive elements) and a set of function declarations like:

Function Name	Description	Example Call
`click_element`	Click an element by its index	`click_element(index=7)`
`type_text`	Type text into an input field	`type_text(index=3, text="hello")`
`scroll`	Scroll the page	`scroll(direction="down")`
`extract_content`	Get text from the page	`extract_content(goal="find prices")`

The LLM looks at the page state, reasons: "I see a search bar at index 3 and I need to search for 'OLED TV'," then outputs:

{ "function": "type_text", "arguments": { "index": 3, "text": "65-inch OLED TV" } }

The Browser Use library catches this call, executes it against the real Chromium browser, takes a new snapshot, and loops back to the LLM for the next decision.

Common Misconceptions

"The LLM 'runs' the function."
It does not. The LLM proposes a function call; your code executes it. The model never sends network requests, opens files, or runs computations on its own. All it does is output structured text that describes what it wants done.

"Function calling is the same as plugins."
Plugins are a product abstraction built on top of function calling. The mechanism is the same: declare tools, let the model choose, execute, return results. Plugins add discoverability (users install them from a store) but the underlying architecture is identical.

"You need a complex framework to use function calling."
The core pattern is just: send a prompt with tool definitions, check if the response contains a function call, execute it, send the result back. This can be done in about 30 lines of Python with the OpenAI or Anthropic SDK. Frameworks like LangChain add orchestration and memory but the fundamental loop is simple.

"Function calling makes LLMs reliable."
It helps enormously, but the model can still hallucinate arguments (request get_weather with a parameter that doesn't exist) or call the wrong function. Your code needs to validate function calls before executing them — never blindly trust the output.

Key Takeaways

Function calling lets an LLM output structured commands (function name + arguments) instead of just free text.
The LLM proposes the call; your code executes it and feeds the result back to the model.
It requires models to be fine-tuned for this capability — not every LLM supports it. The major providers (OpenAI GPT-4, Anthropic Claude, Google Gemini) all do.
This mechanism is the foundation of AI agents: browser automation (click, type, scroll), coding assistants (read file, run tests), research agents (web search, extract), and virtually every tool-using AI.
Your application must validate function calls before executing them — the model can still hallucinate incorrect arguments.

References

Prompt Engineering Guide — "Function Calling with LLMs" — promptingguide.ai
Google Gemini Docs — "Introduction to Function Calling" — docs.cloud.google.com
Apideck — "An Introduction to Function Calling and Tool Use" — apideck.com
OpenAI Docs — "Function Calling" — platform.openai.com
Anthropic Docs — "Tool Use" — docs.anthropic.com

LLM Function Calling: Giving Language Models a Way to Act

Why This Matters

Prerequisites

Core Idea

How It Actually Works

The Three-Step Dance

What Makes This Possible

Worked Example: Browser Use

Common Misconceptions

Key Takeaways

References

Related

Agent Loops: Autonomous AI Coding Workflows

Browser Use: Making Websites Accessible to AI Agents

Headless Browsers: Browsers Without a Window