Adding Time-Travel Debugging

Traditional software debugging often uses breakpoints and step-by-step execution.

Example debugging process:

Step 1 → inspect variables
Step 2 → execute next line
Step 3 → inspect state again

Agent systems require similar debugging capabilities.

However, agent workflows involve:

reasoning steps
tool calls
external observations
evolving internal state

Instead of stepping through source code, developers must inspect the sequence of agent states.

Time-travel debugging enables this capability.

What Is Time-Travel Debugging?

Time-travel debugging allows developers to replay a previous execution of the agent.

Conceptually:

Execution Timeline

State 0 → State 1 → State 2 → State 3 → Final Answer

Developers can move backward and forward through the timeline to inspect agent behavior.

Example:

Step 2:
Thought → tool call

Step 3:
Observation → reasoning update

This provides insight into how the agent reached its conclusion.

Why Debugging Agents Is Difficult

Agent systems introduce several debugging challenges.

Non-Deterministic Outputs

The same prompt may produce different results across runs.

Multi-Step Reasoning

Errors may occur several steps before the final output.

Tool Interactions

Incorrect tool usage can lead to incorrect conclusions.

Without execution history, it can be difficult to identify where the reasoning failed.

Time-travel debugging solves this problem by recording every step.

Recording Agent States

To enable replay, the runtime must record the agent state after each step.

Example state log:

State 0:
Goal received

State 1:
Thought: search for GPU benchmarks

State 2:
Tool call: web_search

State 3:
Observation: benchmark results

Each state snapshot captures the agent’s internal context.

State Snapshot Structure

A state snapshot typically contains:

Field	Description
step	iteration number
thought	reasoning output
action	selected tool
observation	tool result
state	updated context

Example snapshot:

{
  "step": 2,
  "thought": "Search for GPU benchmarks",
  "action": "web_search",
  "observation": "Benchmark results retrieved"
}

These snapshots form a complete execution history.

Building the State History

The runtime can maintain a history list of state snapshots.

Example structure:

AgentHistory
 ├─ Step 1
 ├─ Step 2
 ├─ Step 3
 └─ Step N

This history enables time-travel debugging.

Python Implementation

Python

history = []

snapshot = {
    "step": state["step"],
    "thought": thought,
    "action": action,
    "observation": observation
}

history.append(snapshot)

Each iteration records a snapshot.

Rust Implementation

Rust

struct Snapshot {
    step: u32,
    thought: String,
    action: String,
    observation: String,
}

history.push(Snapshot {
    step: state.step,
    thought,
    action,
    observation,
});

Rust’s type system ensures snapshots are stored consistently.

Replaying Agent States

Once snapshots are recorded, developers can replay execution.

Example replay function:

Replay Step 1 → inspect reasoning
Replay Step 2 → inspect tool call
Replay Step 3 → inspect observation

This allows developers to analyze the decision path.

Example Replay Function

Python
Rust

def replay(history):

    for step in history:

        print("Step:", step["step"])
        print("Thought:", step["thought"])
        print("Action:", step["action"])
        print("Observation:", step["observation"])

fn replay(history: &Vec<Snapshot>) {

    for step in history {

        println!("Step: {}", step.step);
        println!("Thought: {}", step.thought);
        println!("Action: {}", step.action);
        println!("Observation: {}", step.observation);
    }
}

This function prints the execution history.

Debugging Decision Paths

Time-travel debugging allows developers to analyze decision paths.

Example investigation:

Final Answer: incorrect

Replay Step 2:
Agent selected wrong search query

Replay Step 3:
Observation misleading

Root cause:
Incorrect tool query

This insight allows developers to improve prompts or reasoning strategies.

Identifying Common Failure Patterns

Replay logs help detect recurring problems.

Examples include:

Failure Pattern	Example
repeated tool calls	search loops
incorrect assumptions	faulty reasoning
missing observations	incomplete context
premature termination	agent stops too early

Time-travel debugging makes these issues easier to detect.

Integrating with Observability Systems

In production systems, replay logs are often integrated with observability tools.

Example architecture:

Agent Execution
      ↓
State Snapshots
      ↓
Trace Storage
      ↓
Debugging Dashboard

Developers can explore execution history visually.

Time-Travel Debugging as a Development Tool

Time-travel debugging is particularly useful during:

prompt development
tool integration
reasoning strategy tuning
evaluation analysis

Instead of guessing why an agent failed, developers can inspect the exact reasoning path.

The Runtime So Far

At this stage, the minimal runtime now contains:

Agent Runtime
 ├─ State Machine
 ├─ Agent Loop
 ├─ Tool Calling
 ├─ State History
 └─ Replay Debugger

This system already resembles the internal architecture used in many modern agent frameworks.

Looking Ahead

In the final article of this module we will implement a minimal LangGraph-like runtime in roughly 300 lines of code.

This example will bring together everything we have built so far.

→ Continue to 11.5 — A 300-Line LangGraph Alternative