Skip to content
AUTH

The Observation Processor

When an agent executes a tool, it receives raw output — JSON from APIs, HTML pages, database rows, log files, or search results. These outputs are often verbose, noisy, and contain far more information than the agent needs.

Feeding raw tool results directly into the LLM causes serious problems:

The Observation Processor solves this by transforming raw tool outputs into clean, concise, and structured observations that are optimized for the agent’s reasoning loop.


The Role of Observations in the Agent Loop

A typical agent follows this cycle:

Observe → Reason → Plan → Act → Observe

The Observation Processor powers the Observe step. It takes the Execution Engine’s raw output and produces a high-quality observation that is added to the agent’s working memory.

Raw Tool Output
Observation Processor (Parse → Filter → Summarize → Structure)
Structured Observation
Working Memory → Next Reasoning Step

High-quality observations are essential for reliable multi-step reasoning.


Raw Output vs Structured Observation

Raw Tool Output (Example)

{
"weather": {
"location": "Tokyo",
"temperature": 26.4,
"humidity": 78,
"pressure": 1008,
"wind_speed": 12.3,
"wind_direction": "NE",
"sunrise": "04:52",
"sunset": "18:37",
"condition": "partly cloudy"
}
}

Processed Observation

Observation: Current weather in Tokyo is 26°C with 78% humidity and partly cloudy conditions.

The processed version is dramatically shorter, focused, and reasoning-friendly.


Core Responsibilities

The Observation Processor typically performs four key operations, often in combination:


Parsing Structured Outputs

Many tools already return structured data. The processor extracts only what matters.

def process_weather(data: dict) -> str:
weather = data.get("weather", {})
return (
f"Observation: Current temperature in {weather.get('location')} "
f"is {weather.get('temperature')}°C with {weather.get('humidity')}% humidity."
)

Summarization and Compression

For large outputs (search results, long documents, web pages), simple extraction is not enough — the processor must summarize.

Modern systems often use a smaller/faster LLM (or the same model with a tight prompt) for summarization.

Example prompt:

Summarize the following tool output in 1-2 sentences,
focusing only on information relevant to the current goal: "{goal}"
def summarize_observation(raw_output: str, goal: str, llm) -> str:
prompt = f"""
Summarize this tool output in one concise paragraph.
Focus only on details relevant to: {goal}
Tool output:
{raw_output}
"""
return llm.generate(prompt)

Pro tip: Use token-aware compression and consider hierarchical summarization for very large outputs.


Filtering Noise

Tools often return debugging info, HTTP headers, metadata, or error traces alongside useful data. The processor must strip this away.

Example filtered observation:

Observation: Found 3 matching records. User 'alice' has admin privileges.

Many production systems go beyond plain text and use structured observations:

{
"type": "weather_report",
"location": "Tokyo",
"temperature_c": 26,
"humidity": 78,
"condition": "partly_cloudy",
"source": "weather_api"
}

Benefits:

You can use Pydantic (Python) or Serde + typed structs (Rust) to enforce observation schemas.


Integrating Observations into the Agent Loop

Processed observations are stored in working memory and become context for the next reasoning step.

Example trace:

Thought: I need current market trends for AI GPUs.
Action: web_search(query="AI GPU market 2025")
Observation: Market expected to grow 40% annually through 2028, driven by data center expansion.
Thought: Compare leading vendors...

This clean observe–reason–act cycle enables stable, long-horizon reasoning.


Why the Observation Processor Matters

Without proper observation processing, even powerful Tool Managers and Execution Engines fail because the LLM gets overwhelmed by noise. A good Observation Processor is what turns raw tool results into actionable intelligence.


Looking Ahead

The Observation Processor acts as the agent’s “interpreter”, converting noisy machine outputs into clear reasoning inputs.

→ Continue to 2.8 — Reflection and Termination: How agents evaluate progress, detect errors, and decide when to stop.