The Anatomy of an Agent
Modern AI agents often appear mysterious. You give them a goal such as: “Research the impact of quantum computing on cryptography and produce a report.”
And somehow the system:
- searches the web
- reads documents
- calls tools
- writes code
- synthesizes information
But internally, the architecture of an agent is surprisingly structured.
A useful mental model is this:
| Component | Analogy |
|---|---|
| LLM | CPU |
| Agent Runtime | Operating System |
| Tools | System Calls |
| Memory | RAM + Storage |
| Planner | Scheduler |
| Environment | External World |
Understanding this architecture is essential if you want to:
- build your own agents
- debug agent failures
- optimize performance
- design safe autonomous systems
The Core Insight
The most important conceptual leap is this:
In other words:
Agent = LLM + Runtime + Tools + Memory + EnvironmentThe LLM provides reasoning ability.
The runtime provides control and execution.
The LLM as a CPU
Large Language Models function as the reasoning processor of an agent.
Just as a CPU executes machine instructions, an LLM executes reasoning instructions encoded in text.
What the LLM Actually Does
At every step of an agent loop, the LLM performs several tasks:
- Interpret the current state
- Reason about the goal
- Choose the next action
- Produce structured output
The output may include:
- a plan
- a tool call
- a final answer
- a request for more information
For example, the LLM might produce structured output like:
{ "thought": "I should search for recent research papers", "action": "web_search", "arguments": { "query": "post quantum cryptography NIST progress" }}This output becomes an instruction for the runtime.
The Limits of the LLM
Despite their intelligence, LLMs have important limitations.
They cannot:
- access the internet
- run code
- query databases
- store persistent memory
- control external systems
They only generate tokens.
Therefore something else must execute the real work.
That component is the agent runtime.
The Agent Runtime as an Operating System
If the LLM is the CPU, then the agent runtime is the operating system.
It orchestrates everything.
Responsibilities of the runtime include:
- managing the agent loop
- executing tools
- maintaining state
- enforcing guardrails
- storing memory
- preventing infinite loops
A typical runtime loop looks like this:
while not done:
observation = environment.get_state()
reasoning = LLM(prompt + observation)
action = parse(reasoning)
result = execute(action)
environment.update(result)This simple loop is the heartbeat of every autonomous agent.
A Minimal Agent Loop
Let us examine a minimal example.
Code Examples
from typing import Any, Callableimport jsonfrom ollama import chatfrom pydantic import BaseModel
MODEL = "qwen3.5:9b"MAX_STEPS = 6
class Action(BaseModel): type: str # "tool" | "final" name: str | None = None args: dict[str, Any] = {} answer: str | None = None
class Agent: def __init__(self, tools: dict[str, Callable[..., str]]): self.tools = tools self.history: list[dict[str, str]] = []
def build_messages(self, observation: str) -> list[dict[str, str]]: system_prompt = """You are an agent runtime planner.Return ONLY JSON with schema:{ "type": "tool" | "final", "name": string | null, "args": object, "answer": string | null}Use "tool" when you need external data. Use "final" only when done.""" return [{"role": "system", "content": system_prompt}, *self.history, {"role": "user", "content": observation}]
def step(self, observation: str) -> tuple[str, bool]: response = chat(model=MODEL, messages=self.build_messages(observation), options={"temperature": 0.2}) raw = response.message.content action = Action.model_validate_json(raw) self.history.append({"role": "assistant", "content": raw})
if action.type == "final": return action.answer or "No answer provided.", True
if action.type == "tool": if not action.name or action.name not in self.tools: tool_result = f"Tool '{action.name}' not available." else: try: tool_result = self.tools[action.name](**action.args) except Exception as exc: tool_result = f"Tool error: {exc}"
self.history.append({"role": "tool", "content": tool_result}) return tool_result, False
return f"Unknown action type: {action.type}", False
def run(self, goal: str) -> str: observation = goal for step_no in range(1, MAX_STEPS + 1): observation, done = self.step(observation) print(f"[step {step_no}/{MAX_STEPS}] completed") if done: return observation return "Stopped: reached max iterations without final answer."
def web_search(query: str) -> str: return f"[mock] top search results for: {query}"
agent = Agent(tools={"web_search": web_search})final_answer = agent.run("Research the current state of MCP and summarize key updates.")print(final_answer)use std::collections::HashMap;use ollama_rs::{ generation::chat::{request::ChatMessageRequest, ChatMessage}, Ollama,};use serde::{Deserialize, Serialize};use serde_json::Value;
const MODEL: &str = "qwen3.5:9b";const MAX_STEPS: usize = 6;
#[derive(Debug, Deserialize, Serialize)]struct Action { #[serde(rename = "type")] action_type: String, // "tool" | "final" name: Option<String>, #[serde(default)] args: Value, answer: Option<String>,}
type ToolFn = fn(Value) -> Result<String, String>;
struct Agent { ollama: Ollama, tools: HashMap<String, ToolFn>, history: Vec<ChatMessage>,}
impl Agent { fn new(tools: HashMap<String, ToolFn>) -> Self { Self { ollama: Ollama::default(), tools, history: vec![] } }
async fn step(&mut self, observation: String) -> Result<(String, bool), Box<dyn std::error::Error>> { let system = ChatMessage::system( "Return ONLY JSON: {\"type\":\"tool|final\",\"name\":string|null,\"args\":object,\"answer\":string|null}".to_string() ); let mut messages = vec![system]; messages.extend(self.history.clone()); messages.push(ChatMessage::user(observation));
let request = ChatMessageRequest::new(MODEL.to_string(), messages); let response = self.ollama.send_chat_messages(request).await?; let raw = response.message.content; let action: Action = serde_json::from_str(&raw)?; self.history.push(ChatMessage::assistant(raw.clone()));
if action.action_type == "final" { return Ok((action.answer.unwrap_or_else(|| "No answer provided.".to_string()), true)); }
if action.action_type == "tool" { let tool_name = action.name.unwrap_or_default(); let tool_result = if let Some(tool) = self.tools.get(&tool_name) { tool(action.args).unwrap_or_else(|e| format!("Tool error: {e}")) } else { format!("Tool '{tool_name}' not available.") }; self.history.push(ChatMessage::user(format!("Tool result: {tool_result}"))); return Ok((tool_result, false)); }
Ok((format!("Unknown action type: {}", action.action_type), false)) }
async fn run(&mut self, goal: &str) -> Result<String, Box<dyn std::error::Error>> { let mut observation = goal.to_string(); for step_no in 1..=MAX_STEPS { let (next_observation, done) = self.step(observation).await?; observation = next_observation; println!("[step {step_no}/{MAX_STEPS}] completed"); if done { return Ok(observation); } } Ok("Stopped: reached max iterations without final answer.".to_string()) }}
fn web_search(args: Value) -> Result<String, String> { let query = args.get("query").and_then(Value::as_str).unwrap_or("unknown query"); Ok(format!("[mock] top search results for: {query}"))}
#[tokio::main]async fn main() -> Result<(), Box<dyn std::error::Error>> { let mut tools: HashMap<String, ToolFn> = HashMap::new(); tools.insert("web_search".to_string(), web_search);
let mut agent = Agent::new(tools); let final_answer = agent.run("Research the current state of MCP and summarize key updates.").await?; println!("{final_answer}"); Ok(())}Even this tiny program already contains the essential ingredients of an agent.
Agent Architecture Overview
Modern agent systems typically include several layers.
+--------------------+ | User Goal | +---------+----------+ | v +----------------------+ | Agent Runtime | | (Control Loop) | +----------+-----------+ | +-----------------+----------------+ | | v v
+---------------+ +----------------+| LLM | | Tools || Reasoning | | APIs / Code |+-------+-------+ +-------+--------+ | | v v
+---------------+ +----------------+| Working Mem | | Environment || Scratchpad | | External Data |+---------------+ +----------------+In the next few articles we will examine each component in detail.
Responsibilities of the Agent Runtime
A production agent runtime performs many complex tasks.
1 — State Management
Agents maintain internal state such as:
- conversation history
- memory retrieval results
- task progress
- intermediate reasoning
Without state, agents cannot perform multi-step tasks.
2 — Tool Execution
The runtime executes external capabilities such as:
- web search
- code execution
- database queries
- filesystem operations
These are similar to system calls in operating systems.
3 — Context Construction
The runtime decides what information the LLM receives.
This may include:
- conversation history
- retrieved documents
- intermediate reasoning
- tool results
Because context windows are limited, this step is critical.
4 — Safety and Guardrails
The runtime enforces constraints such as:
- tool permission systems
- rate limits
- maximum iteration counts
- safe code execution
Without guardrails, autonomous agents can become dangerous or unstable.
The Agent Execution Cycle
Putting everything together, the lifecycle of an agent looks like this:
User Goal ↓Perception ↓Reasoning (LLM) ↓Planning ↓Tool Execution ↓Observation ↓Reflection ↓Termination or Next StepThis loop is often called the Agent Control Loop.
We introduced the conceptual version in Module 1:
observe → reason → plan → act → reflectNow we see how it is implemented inside a runtime.
Why This Architecture Matters
Understanding the internal architecture of agents has several practical benefits.
1 — Debugging Agents
Most agent failures occur in the runtime layer, not the LLM.
Examples include:
- incorrect tool execution
- missing context
- bad memory retrieval
- infinite loops
2 — Performance Optimization
Agent performance depends heavily on:
- prompt construction
- context compression
- tool latency
- reasoning depth
Optimizing the runtime can dramatically improve speed and cost.
3 — Building Custom Agents
Many frameworks exist:
- LangChain
- LangGraph
- CrewAI
- AutoGen
But all of them implement the same underlying architecture.
Once you understand the internals, you can:
- build your own agent frameworks
- extend existing ones
- debug complex systems
Looking Ahead
In this article we introduced the two central components of an agent:
- the LLM (reasoning processor)
- the runtime (control system)
In the upcoming articles we will explore the remaining components of agent architecture.
Next we will examine the Perception Layer, which transforms raw inputs such as:
- documents
- images
- screenshots
- audio
into machine-readable representations that agents can reason about.
This is where embeddings, parsing pipelines, and multimodal models enter the architecture.
→ Continue to 2.2 — The Perception Layer