What is the cheapest LLM for AI agents?

Claude Haiku (~$0.25/million input tokens) and GPT-4o Mini are the most cost-efficient. Haiku can run thousands of agent loops per dollar.

Why does my agent call the same tool repeatedly?

Usually caused by a weak model or too-broad task. Fix by being more specific in your system prompt about when to stop searching.

How to Build an AI Agent from Scratch in Python (2026)

Quick Answer: To build an AI agent in Python: (1) install LangGraph + an LLM SDK, (2) define tools as Python functions with clear docstrings, (3) bind tools to your LLM with bind_tools(), (4) create the ReAct loop with create_react_agent(), and (5) call agent.invoke(). Full working code below — takes about 15 minutes.

Building an AI agent is simpler than most tutorials make it look. You need an LLM for reasoning, tools for the agent to call, and a loop connecting them. This tutorial builds a real research agent — from zero to a fully functional agent that searches the web, extracts information, and saves a report.

Prerequisites: Python 3.10+, basic Python knowledge, Anthropic API key (free tier works), Tavily API key (free at tavily.com).

Table of Contents

What You’ll Build

A research agent that: (1) takes a topic as input, (2) searches the web for current information, (3) extracts key facts from results, (4) writes a structured, cited summary, and (5) saves the output to a file. A real agent — not a toy example.

Step 1 — Install Dependencies

pip install langgraph langchain-anthropic langchain-community tavily-python

Set your API keys as environment variables:

export ANTHROPIC_API_KEY="sk-ant-..."
export TAVILY_API_KEY="tvly-..."

Step 2 — Define Your Tools

Tools are Python functions the agent calls. The LLM reads each function’s docstring to understand what it does and when to use it. Clear docstrings = correct tool selection.

Python AI agent tools — web search, file save, and API call functions with decorators — AI agent tools are Python functions — the LLM reads the docstring to decide when and how to call each tool

from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.tools import tool
from datetime import datetime
import os

# Tool 1: Web search
search = TavilySearchResults(max_results=5, search_depth="advanced")

# Tool 2: Get current date
@tool
def get_current_date() -> str:
    """Returns today's date. Use when you need to know the current date."""
    return datetime.now().strftime("%B %d, %Y")

# Tool 3: Save output to file
@tool
def save_to_file(filename: str, content: str) -> str:
    """
    Save text content to a file. Use to preserve research output.
    Args:
        filename: Name of the file (e.g., 'report.md')
        content: The text content to save
    Returns: Confirmation with file path
    """
    os.makedirs("./output", exist_ok=True)
    with open(f"./output/{filename}", "w") as f:
        f.write(content)
    return f"Saved to ./output/{filename}"

tools = [search, get_current_date, save_to_file]

Step 3 — Set Up Your LLM

from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(model="claude-sonnet-4-6", max_tokens=4096)

# bind_tools tells Claude what tools exist and how to call them
llm_with_tools = llm.bind_tools(tools)

bind_tools() is the critical step. Without it, the LLM only generates text. With it, Claude outputs structured “tool calls” that your code intercepts and executes.

Step 4 — Build the Agent Loop (ReAct Pattern)

The create_react_agent function implements the full Reasoning + Acting loop — it calls the LLM, parses tool calls, executes tools, feeds results back, and repeats until the task is complete.

LangGraph create_react_agent — complete ReAct agent loop Python implementation — create_react_agent handles the full loop: LLM call → tool detection → tool execution → feed results back → repeat

from langgraph.prebuilt import create_react_agent
from langchain_core.messages import SystemMessage

system_prompt = """You are a research assistant. For any topic:
1. Get today's date first
2. Search at least 3 different sources
3. Extract key facts and data points
4. Write a structured report with citations
5. Save the report as '{topic}-research.md'
Be thorough and cite specific sources."""

agent = create_react_agent(
    llm_with_tools,
    tools,
    state_modifier=SystemMessage(content=system_prompt)
)

Step 5 — Run Your Agent

from langchain_core.messages import HumanMessage

result = agent.invoke({
    "messages": [HumanMessage(content=
        "Research the top AI agent frameworks in 2026 and summarize each."
    )]
})

print(result["messages"][-1].content)

Step 6 — Add Streaming (See Agent Thinking Live)

for event in agent.stream(
    {"messages": [HumanMessage("Research LangGraph 2026")]},
    stream_mode="values"
):
    last_msg = event["messages"][-1]
    if hasattr(last_msg, "tool_calls") and last_msg.tool_calls:
        for tc in last_msg.tool_calls:
            print(f"Calling tool: {tc['name']} | Input: {tc['args']}")
    elif hasattr(last_msg, "name"):
        print(f"Tool result: {str(last_msg.content)[:150]}...")
    else:
        print(f"Agent: {last_msg.content[:200]}...")

Step 7 — Add Persistent Memory

from langgraph.checkpoint.sqlite import SqliteSaver

memory = SqliteSaver.from_conn_string("./agent_memory.db")

agent_with_memory = create_react_agent(
    llm_with_tools, tools,
    state_modifier=SystemMessage(content=system_prompt),
    checkpointer=memory
)

config = {"configurable": {"thread_id": "session_001"}}

# First run — researches the topic
agent_with_memory.invoke({"messages": [HumanMessage("Research CrewAI")]}, config)

# Second run — agent remembers prior session
agent_with_memory.invoke({"messages": [HumanMessage("Summarize what we found")]}, config)

Complete Working Agent — Copy-Paste Ready

import os
from datetime import datetime
from langchain_anthropic import ChatAnthropic
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.tools import tool
from langchain_core.messages import HumanMessage, SystemMessage
from langgraph.prebuilt import create_react_agent
from langgraph.checkpoint.sqlite import SqliteSaver

os.environ["ANTHROPIC_API_KEY"] = "your-key"
os.environ["TAVILY_API_KEY"] = "your-key"

search = TavilySearchResults(max_results=5, search_depth="advanced")

@tool
def get_current_date() -> str:
    """Returns today's date."""
    return datetime.now().strftime("%B %d, %Y")

@tool
def save_to_file(filename: str, content: str) -> str:
    """Save text to a file. Args: filename, content."""
    os.makedirs("./output", exist_ok=True)
    with open(f"./output/{filename}", "w") as f:
        f.write(content)
    return f"Saved: ./output/{filename}"

tools = [search, get_current_date, save_to_file]
llm = ChatAnthropic(model="claude-sonnet-4-6", max_tokens=4096)
llm_with_tools = llm.bind_tools(tools)

SYSTEM = """Research assistant. For any topic:
1. Check today's date  2. Search 3+ sources
3. Extract facts + data points  4. Write structured report
5. Save as topic-research.md"""

memory = SqliteSaver.from_conn_string("./memory.db")
agent = create_react_agent(llm_with_tools, tools,
    state_modifier=SystemMessage(content=SYSTEM), checkpointer=memory)

def research(topic: str, session: str = "default") -> str:
    config = {"configurable": {"thread_id": session}}
    result = agent.invoke({"messages": [HumanMessage(f"Research: {topic}")]}, config)
    return result["messages"][-1].content

if __name__ == "__main__":
    topic = input("Research topic: ")
    print(research(topic, session=f"sess_{topic[:20]}"))

Common Mistakes to Avoid

Vague tool docstrings: The LLM decides which tool to call based on the docstring. “Search the internet.” fails. “Search the web for current facts, news, and statistics. Returns top 5 results with URLs.” works.
No iteration limits: Always set max_iterations=15 — without limits, agents can loop forever and burn API credits.
Too many tools: 3–7 focused tools outperform 20 vague ones. Each tool is a decision the LLM must make correctly.
Not handling tool errors: Always return a string from tools, even on failure. Raising exceptions breaks the agent loop.
Using weak models: Never use small LLMs for complex agentic workflows. Claude Sonnet is the minimum for production research agents.

Frequently Asked Questions

How long does it take to build a working AI agent?

A basic working agent takes 15–30 minutes following this tutorial. A production-ready agent with error handling, memory, and proper testing takes 1–2 weeks.

Do I need LangGraph to build an AI agent in Python?

No. You can build agents with CrewAI, AutoGen, OpenAI Agents SDK, or even pure Python. LangGraph gives maximum control but isn’t required for simpler agents.

What is the cheapest LLM for running AI agents?

Claude Haiku (~$0.25/million input tokens) and GPT-4o Mini are the most cost-efficient capable models. For very simple agents, Haiku can run thousands of loops per dollar.

Why does my agent keep calling the same tool repeatedly?

Usually caused by a weak model, too-broad task definition, or a tool returning ambiguous results. Fix: be more specific in your system prompt about when the agent should stop searching.

Can AI agents run without internet access?

Yes. Use local LLMs (Ollama + Llama 3) and local tools (filesystem, local database). A fully local agent has no API costs but requires significant compute — GPU recommended for larger models.

Continue the AI Agents series:
What Are AI Agents? Complete Guide →
Best AI Agent Frameworks 2026 →
Claude MCP Tutorial →