Working Memory and the Scratchpad
Reasoning rarely happens in a single step.
When humans solve problems, we often write intermediate thoughts:
- notes on paper
- calculations
- bullet-point reasoning
- diagrams
This temporary workspace is our working memory.
AI agents require the same capability.
Without it, they cannot perform multi-step reasoning.
What Is Working Memory in Agents?
Working memory is the short-term state used during a single task.
It stores information such as:
- intermediate reasoning
- tool outputs
- partial plans
- observations
- sub-goals
Unlike long-term memory systems (which we will study later), working memory is temporary.
It typically exists only for the duration of a task.
The Memory Hierarchy of Agents
Agent systems usually maintain three levels of memory.
Long-Term Memory ↓Retrieved Context ↓Working Memory ↓ScratchpadEach layer serves a different purpose.
| Memory Type | Purpose |
|---|---|
| Long-Term Memory | Persistent knowledge |
| Retrieved Context | Relevant external information |
| Working Memory | Current task state |
| Scratchpad | Reasoning workspace |
The scratchpad is the most dynamic component.
Why Working Memory Matters
Without working memory, agents behave like stateless chatbots.
Stateless systems can only respond to the latest prompt.
But agents must track evolving state:
Example task:
“Find the cheapest flight to Tokyo next month and summarize the best options.”
The agent must remember:
- search results
- price comparisons
- airline options
- user constraints
All of this information lives in working memory.
The Reasoning Scratchpad Pattern
One of the most important design patterns in agent systems is the scratchpad.
The scratchpad is a structured space where the agent writes its reasoning.
Example:
Goal: summarize research paper
Thought: I should extract the main sections first.
Action: parse_document
Observation: Document contains 5 sections.
Thought: Next I should summarize each section.This running log of thoughts and actions helps the agent:
- track progress
- avoid repeating steps
- maintain logical coherence
Scratchpad Structure
Most modern agents represent scratchpads as structured logs.
ThoughtActionObservationThis structure was popularized by the ReAct framework.
Example:
Thought: Need population data for FranceAction: web_search("France population 2024")Observation: 67 millionThe scratchpad becomes part of the prompt for the next reasoning step.
How Scratchpads Improve Reasoning
Scratchpads provide several important advantages.
1 — Multi-Step Problem Solving
Complex tasks require step-by-step reasoning.
Scratchpads allow the agent to build solutions incrementally.
2 — Tool Integration
When tools return results, those results must be remembered.
Example:
Tool: web_searchResult: GDP growth rate 3.2%The scratchpad records this information.
3 — Debuggability
Scratchpads make agent decisions transparent.
Instead of a mysterious output, we see the full reasoning chain.
Example:
Thought → Action → Observation → ThoughtThis is extremely useful when debugging agents.
Managing Short-Term Context
Working memory must be carefully managed because context windows are limited.
Modern models may support:
- 128k tokens
- 200k tokens
- 1M tokens
But large contexts still introduce problems:
- higher cost
- slower inference
- increased hallucination risk
Therefore agents must compress or prune working memory.
Context Management Strategies
Several strategies exist for managing short-term memory.
Strategy 1 — Sliding Window
The agent keeps only the most recent reasoning steps.
Example:
Step 12Step 13Step 14Step 15Older steps are discarded.
Strategy 2 — Summarization
Older reasoning steps are compressed into summaries.
Example:
Summary:The agent already collected three market reportsand extracted key economic indicators.This preserves knowledge while reducing token usage.
Strategy 3 — State Representation
Instead of storing raw text, the agent stores structured state.
Example:
{ "goal": "compare GPUs", "products": ["A100", "H100"], "benchmark_data": [...]}Structured state is often more efficient than raw conversation logs.
Implementing a Scratchpad
Let us implement a minimal scratchpad system.
from __future__ import annotations
import jsonfrom dataclasses import dataclassfrom ollama import chatfrom pydantic import BaseModel, ValidationError
MODEL = "qwen3.5:9b"MAX_STEPS = 6
@dataclassclass Step: thought: str action: str observation: str
class AgentAction(BaseModel): thought: str action: str # "web_search" | "finish" input: str
class Scratchpad: def __init__(self, max_steps: int = 8): self.max_steps = max_steps self.steps: list[Step] = []
def add(self, thought: str, action: str, observation: str) -> None: self.steps.append(Step(thought=thought, action=action, observation=observation)) # Sliding window to keep working memory bounded. self.steps = self.steps[-self.max_steps :]
def build_prompt(self, goal: str) -> str: memory_lines = [] for s in self.steps: memory_lines.append(f"Thought: {s.thought}") memory_lines.append(f"Action: {s.action}") memory_lines.append(f"Observation: {s.observation}") memory_lines.append("") memory_text = "\n".join(memory_lines) if memory_lines else "(empty)" return f"""You are an agent planner.Given GOAL and WORKING MEMORY, return ONLY JSON:{{ "thought": string, "action": "web_search" | "finish", "input": string}}
GOAL:{goal}
WORKING MEMORY:{memory_text}""".strip()
def render(self) -> str: if not self.steps: return "(empty)" lines = [] for i, s in enumerate(self.steps, start=1): lines.append(f"{i}. Thought: {s.thought}") lines.append(f" Action: {s.action}") lines.append(f" Observation: {s.observation}") return "\n".join(lines)
def web_search(query: str) -> str: # Replace with real tool integration if needed. return f"[mock-search-result] {query}: RTX 4090 has strong local value; H100 leads datacenter throughput."
def parse_action(raw: str) -> AgentAction: """Parse model output robustly even if it includes extra text.""" try: return AgentAction.model_validate_json(raw) except ValidationError: # Fallback: extract first JSON object from mixed output. start = raw.find("{") end = raw.rfind("}") if start == -1 or end == -1 or end <= start: raise ValueError(f"Model returned non-JSON output: {raw[:200]}") return AgentAction.model_validate_json(raw[start : end + 1])
def run_agent(goal: str) -> str: scratchpad = Scratchpad(max_steps=10)
for step_no in range(1, MAX_STEPS + 1): prompt = scratchpad.build_prompt(goal) response = chat( model=MODEL, messages=[{"role": "user", "content": prompt}], format=AgentAction.model_json_schema(), options={"temperature": 0.2}, )
try: action = parse_action(response.message.content) except Exception: scratchpad.add( "Parser failure", "finish", "Model output was not valid JSON. Retrying with stricter prompt is recommended.", ) print(f"[step {step_no}/{MAX_STEPS}] parse_error") print("[scratchpad]") print(scratchpad.render()) return "Stopped: model did not return valid structured output."
thought = action.thought action_name = action.action action_input = action.input
if action_name == "finish": scratchpad.add(thought, "finish", action_input) print(f"[step {step_no}/{MAX_STEPS}] finished") print("[scratchpad]") print(scratchpad.render()) return action_input
if action_name == "web_search": tool_result = web_search(action_input) scratchpad.add(thought, f"web_search({action_input})", tool_result) print(f"[step {step_no}/{MAX_STEPS}] tool=web_search") print("[scratchpad]") print(scratchpad.render()) continue
scratchpad.add(thought, action_name, "Unknown action. Ask model to finish.") print(f"[step {step_no}/{MAX_STEPS}] tool=unknown") print("[scratchpad]") print(scratchpad.render())
return "Stopped: reached max iterations without final answer."
if __name__ == "__main__": final_answer = run_agent("Find the best laptop GPU choice for local ML and explain why.") print("\nFinal answer:") print(final_answer)use ollama_rs::{ generation::chat::{request::ChatMessageRequest, ChatMessage}, generation::parameters::FormatType, Ollama,};use serde::{Deserialize, Serialize};
const MODEL: &str = "qwen3.5:9b";const MAX_STEPS: usize = 6;
#[derive(Clone, Debug)]struct Step { thought: String, action: String, observation: String,}
#[derive(Default)]struct Scratchpad { steps: Vec<Step>, max_steps: usize,}
impl Scratchpad { fn new(max_steps: usize) -> Self { Self { steps: Vec::new(), max_steps, } }
fn add(&mut self, thought: String, action: String, observation: String) { self.steps.push(Step { thought, action, observation, }); // Sliding window memory. if self.steps.len() > self.max_steps { let keep_from = self.steps.len() - self.max_steps; self.steps = self.steps.split_off(keep_from); } }
fn build_prompt(&self, goal: &str) -> String { let mut memory = String::new(); if self.steps.is_empty() { memory.push_str("(empty)\n"); } else { for step in &self.steps { memory.push_str(&format!( "Thought: {}\nAction: {}\nObservation: {}\n\n", step.thought, step.action, step.observation )); } }
format!( r#"You are an agent planner.Return ONLY JSON:{{ "thought": string, "action": "web_search" | "finish", "input": string}}
GOAL:{}
WORKING MEMORY:{}"#, goal, memory ) }
fn render(&self) -> String { if self.steps.is_empty() { return "(empty)".to_string(); }
let mut text = String::new(); for (i, step) in self.steps.iter().enumerate() { text.push_str(&format!( "{}. Thought: {}\n Action: {}\n Observation: {}\n", i + 1, step.thought, step.action, step.observation )); } text }}
#[derive(Debug, Serialize, Deserialize)]struct AgentAction { thought: String, action: String, input: String,}
fn parse_agent_action(raw: &str) -> Result<AgentAction, Box<dyn std::error::Error>> { // First try direct JSON parsing. if let Ok(action) = serde_json::from_str::<AgentAction>(raw) { return Ok(action); }
// Fallback: extract first JSON object from mixed output. if let (Some(start), Some(end)) = (raw.find('{'), raw.rfind('}')) { if end > start { let candidate = &raw[start..=end]; let action = serde_json::from_str::<AgentAction>(candidate)?; return Ok(action); } }
Err(format!("Model returned non-JSON output: {}", &raw.chars().take(200).collect::<String>()).into())}
fn web_search(query: &str) -> String { format!( "[mock-search-result] {query}: RTX 4090 has strong local value; H100 leads datacenter throughput." )}
async fn run_agent(goal: &str) -> Result<String, Box<dyn std::error::Error>> { let ollama = Ollama::default(); let mut scratchpad = Scratchpad::new(10);
for step_no in 1..=MAX_STEPS { let prompt = scratchpad.build_prompt(goal); let request = ChatMessageRequest::new(MODEL.to_string(), vec![ChatMessage::user(prompt)]) .format(FormatType::Json); let response = ollama.send_chat_messages(request).await?; let action = match parse_agent_action(&response.message.content) { Ok(action) => action, Err(_) => { scratchpad.add( "Parser failure".to_string(), "finish".to_string(), "Model output was not valid JSON. Retrying with stricter prompt is recommended.".to_string(), ); println!("[step {step_no}/{MAX_STEPS}] parse_error"); println!("[scratchpad]\n{}", scratchpad.render()); return Ok("Stopped: model did not return valid structured output.".to_string()); } };
if action.action == "finish" { scratchpad.add(action.thought, "finish".to_string(), action.input.clone()); println!("[step {step_no}/{MAX_STEPS}] finished"); println!("[scratchpad]\n{}", scratchpad.render()); return Ok(action.input); }
if action.action == "web_search" { let tool_result = web_search(&action.input); scratchpad.add( action.thought, format!("web_search({})", action.input), tool_result, ); println!("[step {step_no}/{MAX_STEPS}] tool=web_search"); println!("[scratchpad]\n{}", scratchpad.render()); continue; }
scratchpad.add( action.thought, action.action, "Unknown action. Ask model to finish.".to_string(), ); println!("[step {step_no}/{MAX_STEPS}] tool=unknown"); println!("[scratchpad]\n{}", scratchpad.render()); }
Ok("Stopped: reached max iterations without final answer.".to_string())}
#[tokio::main]async fn main() -> Result<(), Box<dyn std::error::Error>> { let final_answer = run_agent("Find the best laptop GPU choice for local ML and explain why.").await?; println!("\nFinal answer:\n{final_answer}"); Ok(())}This scratchpad becomes part of the agent prompt at every step.
Example Agent Reasoning
Below is what a full reasoning cycle might look like.
Goal: Find best laptop for machine learning
Thought: I should compare GPUs firstAction: web_search("best GPU for ML training")
Observation: RTX 4090 and H100 commonly used
Thought: Now compare benchmarksAction: web_search("RTX 4090 vs H100 ML benchmark")
Observation: H100 significantly fasterEach step updates the working memory.
Common Mistakes in Working Memory Design
Designing working memory incorrectly can break agent reasoning.
Common mistakes include:
Storing Too Much Context
Too many reasoning steps cause context overflow.
Storing Too Little Context
If earlier steps are lost, the agent may repeat work.
Mixing State With Reasoning
Separating:
reasoningstatetool resultsimproves clarity.
Working Memory vs Long-Term Memory
Working memory should not be confused with long-term memory systems.
| Feature | Working Memory | Long-Term Memory |
|---|---|---|
| Lifetime | task duration | persistent |
| Storage | prompt context | database/vector store |
| Purpose | reasoning | knowledge |
Long-term memory will be explored in Module 5 — Memory Systems & RAG 2.0.
The Central Role of Working Memory
In practice, working memory is the glue connecting all agent components.
It integrates:
- perception outputs
- reasoning steps
- tool results
- planning state
Without working memory, agents cannot perform complex multi-step tasks.
Looking Ahead
In this article we explored working memory and the reasoning scratchpad, which allow agents to maintain state across reasoning steps.
We examined:
- the role of working memory in agents
- the scratchpad reasoning pattern
- strategies for managing short-term context
In the next article we will explore the Planner / Reasoner, the component responsible for transforming goals into executable plans.
→ Continue to 2.4 — The Planner / Reasoner