The Anatomy of an Agent

Modern AI agents often appear mysterious. You give them a goal such as: “Research the impact of quantum computing on cryptography and produce a report.”

And somehow the system:

searches the web
reads documents
calls tools
writes code
synthesizes information

But internally, the architecture of an agent is surprisingly structured.

A useful mental model is this:

Component	Analogy
LLM	CPU
Agent Runtime	Operating System
Tools	System Calls
Memory	RAM + Storage
Planner	Scheduler
Environment	External World

Understanding this architecture is essential if you want to:

build your own agents
debug agent failures
optimize performance
design safe autonomous systems

The Core Insight

The most important conceptual leap is this:

In other words:

Agent = LLM + Runtime + Tools + Memory + Environment

The LLM provides reasoning ability.

The runtime provides control and execution.

The LLM as a CPU

Large Language Models function as the reasoning processor of an agent.

Just as a CPU executes machine instructions, an LLM executes reasoning instructions encoded in text.

What the LLM Actually Does

At every step of an agent loop, the LLM performs several tasks:

Interpret the current state
Reason about the goal
Choose the next action
Produce structured output

The output may include:

a plan
a tool call
a final answer
a request for more information

For example, the LLM might produce structured output like:

{
  "thought": "I should search for recent research papers",
  "action": "web_search",
  "arguments": {
    "query": "post quantum cryptography NIST progress"
  }
}

This output becomes an instruction for the runtime.

The Limits of the LLM

Despite their intelligence, LLMs have important limitations.

They cannot:

access the internet
run code
query databases
store persistent memory
control external systems

They only generate tokens.

Therefore something else must execute the real work.

That component is the agent runtime.

The Agent Runtime as an Operating System

If the LLM is the CPU, then the agent runtime is the operating system.

It orchestrates everything.

Responsibilities of the runtime include:

managing the agent loop
executing tools
maintaining state
enforcing guardrails
storing memory
preventing infinite loops

A typical runtime loop looks like this:

while not done:

    observation = environment.get_state()

    reasoning = LLM(prompt + observation)

    action = parse(reasoning)

    result = execute(action)

    environment.update(result)

This simple loop is the heartbeat of every autonomous agent.

A Minimal Agent Loop

Let us examine a minimal example.

from typing import Any, Callable
import json
from ollama import chat
from pydantic import BaseModel

MODEL = "qwen3.5:9b"
MAX_STEPS = 6


class Action(BaseModel):
    type: str  # "tool" | "final"
    name: str | None = None
    args: dict[str, Any] = {}
    answer: str | None = None


class Agent:
    def __init__(self, tools: dict[str, Callable[..., str]]):
        self.tools = tools
        self.history: list[dict[str, str]] = []

    def build_messages(self, observation: str) -> list[dict[str, str]]:
        system_prompt = """
You are an agent runtime planner.
Return ONLY JSON with schema:
{
  "type": "tool" | "final",
  "name": string | null,
  "args": object,
  "answer": string | null
}
Use "tool" when you need external data. Use "final" only when done.
"""
        return [{"role": "system", "content": system_prompt}, *self.history, {"role": "user", "content": observation}]

    def step(self, observation: str) -> tuple[str, bool]:
        response = chat(model=MODEL, messages=self.build_messages(observation), options={"temperature": 0.2})
        raw = response.message.content
        action = Action.model_validate_json(raw)
        self.history.append({"role": "assistant", "content": raw})

        if action.type == "final":
            return action.answer or "No answer provided.", True

        if action.type == "tool":
            if not action.name or action.name not in self.tools:
                tool_result = f"Tool '{action.name}' not available."
            else:
                try:
                    tool_result = self.tools[action.name](**action.args)
                except Exception as exc:
                    tool_result = f"Tool error: {exc}"

            self.history.append({"role": "tool", "content": tool_result})
            return tool_result, False

        return f"Unknown action type: {action.type}", False

    def run(self, goal: str) -> str:
        observation = goal
        for step_no in range(1, MAX_STEPS + 1):
            observation, done = self.step(observation)
            print(f"[step {step_no}/{MAX_STEPS}] completed")
            if done:
                return observation
        return "Stopped: reached max iterations without final answer."


def web_search(query: str) -> str:
    return f"[mock] top search results for: {query}"


agent = Agent(tools={"web_search": web_search})
final_answer = agent.run("Research the current state of MCP and summarize key updates.")
print(final_answer)

use std::collections::HashMap;
use ollama_rs::{
    generation::chat::{request::ChatMessageRequest, ChatMessage},
    Ollama,
};
use serde::{Deserialize, Serialize};
use serde_json::Value;

const MODEL: &str = "qwen3.5:9b";
const MAX_STEPS: usize = 6;

#[derive(Debug, Deserialize, Serialize)]
struct Action {
    #[serde(rename = "type")]
    action_type: String, // "tool" | "final"
    name: Option<String>,
    #[serde(default)]
    args: Value,
    answer: Option<String>,
}

type ToolFn = fn(Value) -> Result<String, String>;

struct Agent {
    ollama: Ollama,
    tools: HashMap<String, ToolFn>,
    history: Vec<ChatMessage>,
}

impl Agent {
    fn new(tools: HashMap<String, ToolFn>) -> Self {
        Self { ollama: Ollama::default(), tools, history: vec![] }
    }

    async fn step(&mut self, observation: String) -> Result<(String, bool), Box<dyn std::error::Error>> {
        let system = ChatMessage::system(
            "Return ONLY JSON: {\"type\":\"tool|final\",\"name\":string|null,\"args\":object,\"answer\":string|null}".to_string()
        );
        let mut messages = vec![system];
        messages.extend(self.history.clone());
        messages.push(ChatMessage::user(observation));

        let request = ChatMessageRequest::new(MODEL.to_string(), messages);
        let response = self.ollama.send_chat_messages(request).await?;
        let raw = response.message.content;
        let action: Action = serde_json::from_str(&raw)?;
        self.history.push(ChatMessage::assistant(raw.clone()));

        if action.action_type == "final" {
            return Ok((action.answer.unwrap_or_else(|| "No answer provided.".to_string()), true));
        }

        if action.action_type == "tool" {
            let tool_name = action.name.unwrap_or_default();
            let tool_result = if let Some(tool) = self.tools.get(&tool_name) {
                tool(action.args).unwrap_or_else(|e| format!("Tool error: {e}"))
            } else {
                format!("Tool '{tool_name}' not available.")
            };
            self.history.push(ChatMessage::user(format!("Tool result: {tool_result}")));
            return Ok((tool_result, false));
        }

        Ok((format!("Unknown action type: {}", action.action_type), false))
    }

    async fn run(&mut self, goal: &str) -> Result<String, Box<dyn std::error::Error>> {
        let mut observation = goal.to_string();
        for step_no in 1..=MAX_STEPS {
            let (next_observation, done) = self.step(observation).await?;
            observation = next_observation;
            println!("[step {step_no}/{MAX_STEPS}] completed");
            if done {
                return Ok(observation);
            }
        }
        Ok("Stopped: reached max iterations without final answer.".to_string())
    }
}

fn web_search(args: Value) -> Result<String, String> {
    let query = args.get("query").and_then(Value::as_str).unwrap_or("unknown query");
    Ok(format!("[mock] top search results for: {query}"))
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut tools: HashMap<String, ToolFn> = HashMap::new();
    tools.insert("web_search".to_string(), web_search);

    let mut agent = Agent::new(tools);
    let final_answer = agent.run("Research the current state of MCP and summarize key updates.").await?;
    println!("{final_answer}");
    Ok(())
}

Even this tiny program already contains the essential ingredients of an agent.

Agent Architecture Overview

Modern agent systems typically include several layers.

                +--------------------+
                |     User Goal      |
                +---------+----------+
                          |
                          v
               +----------------------+
               |    Agent Runtime     |
               |  (Control Loop)      |
               +----------+-----------+
                          |
        +-----------------+----------------+
        |                                  |
        v                                  v

+---------------+                 +----------------+
|      LLM      |                 |      Tools     |
|  Reasoning    |                 | APIs / Code    |
+-------+-------+                 +-------+--------+
        |                                 |
        v                                 v

+---------------+                 +----------------+
|  Working Mem  |                 |   Environment  |
| Scratchpad    |                 | External Data  |
+---------------+                 +----------------+

In the next few articles we will examine each component in detail.

Responsibilities of the Agent Runtime

A production agent runtime performs many complex tasks.

1 — State Management

Agents maintain internal state such as:

conversation history
memory retrieval results
task progress
intermediate reasoning

Without state, agents cannot perform multi-step tasks.

2 — Tool Execution

The runtime executes external capabilities such as:

web search
code execution
database queries
filesystem operations

These are similar to system calls in operating systems.

3 — Context Construction

The runtime decides what information the LLM receives.

This may include:

conversation history
retrieved documents
intermediate reasoning
tool results

Because context windows are limited, this step is critical.

4 — Safety and Guardrails

The runtime enforces constraints such as:

tool permission systems
rate limits
maximum iteration counts
safe code execution

Without guardrails, autonomous agents can become dangerous or unstable.

The Agent Execution Cycle

Putting everything together, the lifecycle of an agent looks like this:

User Goal
   ↓
Perception
   ↓
Reasoning (LLM)
   ↓
Planning
   ↓
Tool Execution
   ↓
Observation
   ↓
Reflection
   ↓
Termination or Next Step

This loop is often called the Agent Control Loop.

We introduced the conceptual version in Module 1:

observe → reason → plan → act → reflect

Now we see how it is implemented inside a runtime.

Why This Architecture Matters

Understanding the internal architecture of agents has several practical benefits.

1 — Debugging Agents

Most agent failures occur in the runtime layer, not the LLM.

Examples include:

incorrect tool execution
missing context
bad memory retrieval
infinite loops

2 — Performance Optimization

Agent performance depends heavily on:

prompt construction
context compression
tool latency
reasoning depth

Optimizing the runtime can dramatically improve speed and cost.

3 — Building Custom Agents

Many frameworks exist:

LangChain
LangGraph
CrewAI
AutoGen

But all of them implement the same underlying architecture.

Once you understand the internals, you can:

build your own agent frameworks
extend existing ones
debug complex systems

Looking Ahead

In this article we introduced the two central components of an agent:

the LLM (reasoning processor)
the runtime (control system)

In the upcoming articles we will explore the remaining components of agent architecture.

Next we will examine the Perception Layer, which transforms raw inputs such as:

documents
images
screenshots
audio

into machine-readable representations that agents can reason about.

This is where embeddings, parsing pipelines, and multimodal models enter the architecture.

→ Continue to 2.2 — The Perception Layer