Mastering AI Agents: Tool Use and Function Calling

1. Concept Introduction

At its core, an AI agent is a system that perceives its environment and takes actions to achieve its goals. Early AI agents were often limited to a predefined set of actions within a simulated world. However, the power of modern agents, especially those built on Large Language Models (LLMs), comes from their ability to use tools.

In simple terms: Imagine you ask a very smart assistant to tell you the weather. The assistant doesn’t inherently know the weather; it’s just a language expert. To answer your question, it needs to look at a weather app. That “weather app” is a tool. Tool use is the ability of an AI agent to select and use external resources—like APIs, databases, or other code functions—to acquire information or perform actions that it cannot do on its own.

Technically speaking: “Function calling” is the mechanism that enables this. An LLM, when prompted with a user request, can determine that it needs to execute a function to fulfill the request. Instead of just generating a text response, the model outputs a structured JSON object containing the name of the function to call and the arguments to pass to it. The agent’s host environment then executes this function, gets a result, and feeds that result back to the LLM to generate the final, informed response. This turns the LLM from a passive text generator into an active reasoner that can orchestrate external capabilities.

2. Historical & Theoretical Context

The idea of AI using external tools is not new. It has roots in classical AI and robotics:

Symbolic AI & Expert Systems: In the 1970s and 80s, expert systems were designed with a “knowledge base” (the rules) and an “inference engine” (the reasoner). The knowledge base could be seen as a primitive form of a toolset.
Situated Cognition: In robotics, the concept of “situatedness” emphasizes that intelligence arises from the interaction between an agent and its environment. A robot using its sensors and effectors is, in a sense, using tools to perceive and act.
The General Problem Solver (GPS): Developed by Newell and Simon in 1959, GPS was an early AI program that tried to solve problems by defining goals and a set of operators (actions) to transform the current state into the goal state. These operators are analogous to modern tools.

The recent explosion in this area is due to the remarkable reasoning capabilities of modern LLMs. Models like GPT-4 are trained not just on text, but also on code, which gives them an implicit understanding of function signatures and data structures. This makes them exceptionally good at determining when to call a function and what to pass to it.

3. Algorithms & Flow

The function calling process follows a clear, predictable loop. It’s less of a mathematical algorithm and more of a state-driven control flow.

Pseudocode for a Function-Calling Agent Loop:

function handle_user_request(request):
  // 1. Initial call to the LLM with user request and available tools
  response = llm.generate(
    prompt = request,
    tools = [get_weather, get_stock_price]
  )

  // 2. Check if the model wants to call a tool
  if response.has_tool_call():
    // 3. Execute the tool call
    tool_name = response.tool_call.name
    tool_args = response.tool_call.arguments
    
    // Find and run the actual function
    function_to_run = available_tools[tool_name]
    tool_result = function_to_run(**tool_args)

    // 4. Feed the result back to the LLM
    final_response = llm.generate(
      prompt = request,
      previous_context = [response, tool_result]
    )
    return final_response.text

  else:
    // The model answered directly
    return response.text

This loop is the foundation of many agent architectures.

4. Design Patterns & Architectures

Function calling is a key component in several agent design patterns:

ReAct (Reasoning and Acting): This pattern, popularized by Google Research, involves the LLM generating a “thought” (the reasoning step) and then an “action” (often a tool call). The result of the action is then observed, and the loop continues. Function calling is the “acting” part of ReAct.
Planner-Executor Loop: In this pattern, a “planner” LLM breaks down a complex goal into a series of steps. An “executor” agent then carries out each step, often using tools. For example, the plan might be: 1. Search for top AI papers in 2023. 2. Summarize the top 3. 3. Write a blog post. The executor would use a search tool, a summarization tool, and a writing tool in sequence.
Multi-Agent Systems: In systems with multiple agents (like CrewAI or AutoGen), agents often communicate and delegate tasks by calling functions that are exposed by other agents. One agent might have a “web_search” tool, which is actually another agent specialized in browsing the web.

5. Practical Application

Let’s see a simple Python example using the openai library.

import openai
import json

# Assume you have your OpenAI API key set up
client = openai.OpenAI()

# 1. Define the tool (a simple function)
def get_current_weather(location, unit="celsius"):
    """Get the current weather in a given location."""
    weather_info = {
        "location": location,
        "temperature": "22",
        "unit": unit,
        "forecast": ["sunny", "windy"],
    }
    return json.dumps(weather_info)

# 2. Make the first call to the model
messages = [{"role": "user", "content": "What's the weather like in Boston?"}]
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["location"],
            },
        },
    }
]

response = client.chat.completions.create(
    model="gpt-4-1106-preview",
    messages=messages,
    tools=tools,
    tool_choice="auto",
)

response_message = response.choices[0].message

# 3. Check if the model decided to call the tool
if response_message.tool_calls:
    # 4. Execute the function
    available_functions = {
        "get_current_weather": get_current_weather,
    }
    function_name = response_message.tool_calls[0].function.name
    function_to_call = available_functions[function_name]
    function_args = json.loads(response_message.tool_calls[0].function.arguments)
    
    function_response = function_to_call(
        location=function_args.get("location"),
        unit=function_args.get("unit"),
    )

    # 5. Send the info back to the model to get a natural language response
    messages.append(response_message)  # extend conversation with assistant's reply
    messages.append(
        {
            "tool_call_id": response_message.tool_calls[0].id,
            "role": "tool",
            "name": function_name,
            "content": function_response,
        }
    )
    second_response = client.chat.completions.create(
        model="gpt-4-1106-preview",
        messages=messages,
    )
    print(second_response.choices[0].message.content)

Frameworks like LangGraph are built almost entirely around this concept, representing the agent’s flow as a graph where nodes are functions and edges are the conditional logic that decides which tool to call next.

6. Comparisons & Tradeoffs

Method	Pros	Cons	Best For
Function Calling	Access to real-time, external data. Can perform actions. High reliability for structured tasks.	Higher latency (multiple LLM calls). Costlier. Requires coding the tools.	Tasks requiring up-to-date info, calculations, or interaction with other systems (e.g., booking a flight).
Fine-Tuning	Deeply embeds knowledge. Fast inference. Can change model’s style and tone.	Expensive to train. Can become outdated (static knowledge). Doesn’t enable actions.	Teaching an LLM a new, stable domain of knowledge (e.g., medical terminology, a specific coding style).
RAG (Retrieval-Augmented Generation)	Access to external knowledge. Cheaper than fine-tuning. Can be updated easily.	“Dumb” retrieval; no reasoning about the data source. Cannot perform actions.	Question-answering over a large corpus of documents (e.g., a customer support bot for your product docs).

7. Latest Developments & Research

The field is moving fast:

Toolformer (Meta AI, 2023): A model that learns to use tools by observing examples. The key insight was to teach the model to generate the API calls it needs, and then fine-tune it on these self-generated examples.
Gorilla (UC Berkeley, 2023): An LLM that is specifically fine-tuned to be a better tool-caller. It can write API calls for a massive number of APIs (over 1,600) with high accuracy, reducing hallucination.
AutoGen (Microsoft, 2023): This framework focuses on creating “conversable” agents that can solve tasks together. A key mechanism is one agent calling a “tool” that is actually another agent, enabling complex, multi-step workflows.
OpenAI’s Parallel Function Calling: A recent update allows some models to call multiple functions in a single turn, which can significantly speed up complex tasks that require gathering information from multiple sources.

An open problem is tool discovery: how can an agent find and learn to use a new tool it has never seen before? Another is robustness: ensuring agents fail gracefully when a tool call doesn’t work as expected.

8. Cross-Disciplinary Insight

The concept of tool use in AI has a fascinating parallel in cognitive science: affordance theory. Proposed by psychologist James J. Gibson, an “affordance” is what the environment offers an individual. A chair affords sitting; a knob affords turning.

When we provide an LLM with a set of tools, we are defining its “digital affordances.” The model’s reasoning process involves perceiving the user’s request and mapping it to the affordances provided by its tools. A well-designed toolset gives the agent the right affordances to effectively solve problems in its environment.

9. Daily Challenge / Thought Exercise

Your 30-Minute Challenge:

Write a simple Python agent that has two tools:

get_current_time(): Returns the current time in a specified timezone.
perform_calculation(expression): Takes a string like “5*8” and returns the result.

Your agent should be able to answer questions like:

“What time is it in New York?”
“What is 1024 divided by 64?”

Use the OpenAI function calling API or a similar library. Focus on the logic of selecting the right tool based on the user’s prompt.

10. References & Further Reading

Paper: Toolformer: Language Models That Teach Themselves to Use Tools
Paper: Gorilla: Large Language Model Connected with Massive APIs
Blog Post: OpenAI Function Calling Documentation
GitHub Repo: LangGraph - A library for building stateful, multi-agent applications with LLMs.
GitHub Repo: Microsoft AutoGen

2025-10-09

../