The Tool Manager
Large language models are powerful reasoning systems. However, they are fundamentally isolated, which means they do not have an understanding of the real world. They only know the data they were trained on.
Technically, they cannot:
- access real-time data
- execute code
- query databases
- interact with files
- call APIs
To act on the world, they need tools. The system responsible for enabling this is the Tool Manager.
But before we dive deeper, let’s first understand what tools are.
Tools
A tool is any external function or system that an agent can call to perform an action or retrieve information. Examples include:
- APIs (e.g., weather API, search API)
- Code execution (e.g., Python REPL)
- Database queries (e.g., SQL)
- File operations (e.g., read/write files)
- System commands (e.g., shell commands)
How does it all work?
When you create an agent, you define a set of tools it can use. The agent then informs the LLM about the tools available to it—for example, the availability of a weather API or a database connection.
The LLM, acting as the reasoning engine, attempts to respond to the user’s query. If it can solve the problem on its own, it does not need any tools. However, if it determines that it needs external information or capabilities, it requests the Tool Manager (or the agent runtime) to call a tool and return the result.
This additional information allows the LLM to produce a more accurate and useful response.
One way to think about this is:
Reasoning (LLM)↓Tool Manager↓Execution (APIs / Code / Systems)The Tool Manager is the bridge between thought and action.
Why Tools Change Everything
Without tools:
Agent = reasoning onlyWith tools:
Agent = reasoning + actionExample:
“What is the weather in Tokyo right now?”
A model alone can only guess.
With a tool:
Thought: I need real-time dataAction: weather_api(city="Tokyo")Now the agent becomes grounded in reality.
Responsibilities of the Tool Manager
The Tool Manager controls the full lifecycle:
- Discovery — What tools exist?
- Selection — Which tool should be used?
- Validation — Are the arguments correct?
- Execution Control — Timeouts, retries, policies
- Observability — Logging and metrics
Tool Discovery & Registration
Agents can only use tools they are aware of. These tools are registered when creating the agent using structured metadata. This process is called tool registration.
Here is an example of a tool definition for a web search tool:
{ "name": "web_search", "description": "Search the web for current information", "parameters": { "query": "string" }}This metadata is stored with the agent and injected into the model’s context.
Runtime Registration
tools = { "web_search": web_search, "calculator": calculator, "read_file": read_file,}let mut tools = HashMap::new();
tools.insert("web_search".to_string(), web_search);tools.insert("calculator".to_string(), calculator);Tool Selection
The agent must decide which tool to use. There are multiple ways to make this decision:
-
Rule-based selection A rule-based system triggers specific tools based on patterns in the query. For example, if the query contains the word “weather,” it triggers the weather API. This approach is deterministic but can fail when queries are ambiguous or contain errors (e.g., “wether” instead of “weather”).
-
LLM-driven selection The model decides which tool to call based on the query and context. This is more flexible but can result in incorrect tool selection if the model misunderstands the query.
-
Hybrid approach Combines rule-based filtering with LLM-based selection. Rules narrow down the options, and the LLM selects from the remaining tools.
Most modern systems rely on LLM-driven selection:
Thought: Need current weatherAction: weather_apiArguments: {"city": "Tokyo"}The Tool Manager parses this and routes execution.
Selection Strategies
| Strategy | Use Case |
|---|---|
| LLM-driven | Flexible reasoning |
| Rule-based | Deterministic workflows |
| Hybrid | Complex systems |
⚠️ The Tool Explosion Problem
As systems scale, the number of tools grows:
3 tools → easy10 tools → manageable50+ tools → chaosProblems:
- context bloat
- incorrect tool selection
- increased hallucination
Solution: Tool Retrieval
Before selection:
- Rank tools by relevance
- Provide only the top-k tools (3–5)
This significantly improves performance.
If this is not sufficient, you can re-architect the system:
- use multiple agents with smaller toolsets
- introduce a meta-agent that selects a sub-agent first
Many systems are moving toward modular agent architectures to address this.
However, there is a trade-off: excessive modularity can lead to over-engineering and increased complexity. The goal is to find the right balance between modularity and simplicity.
Schema Validation
LLMs are not perfect, and they can generate invalid tool calls.
Example:
LLM requested the following tool call:
{ "tool": "weather_api", "arguments": { "temperature": "Tokyo" }}But the tool expects:
{ "city": "Tokyo" }To handle this, the Tool Manager should include a validation layer that checks tool calls before execution. This can be implemented using JSON Schema or custom validation logic.
Validation Layer
from jsonschema import validate, ValidationError
def validate_args(schema, args): try: validate(instance=args, schema=schema) return True except ValidationError as e: return str(e)use jsonschema::JSONSchema;
let compiled = JSONSchema::compile(&schema).unwrap();let result = compiled.validate(&args);Reliability: Real Systems Fail
In agentic systems, LLMs are not the only unreliable component—tools can fail as well.
Failures include:
- network errors
- timeouts
- rate limits
- partial responses
Software systems have always been imperfect, and agent systems are no exception. Therefore, we must design systems that handle failures gracefully.
Retry with Backoff
import time
def retry(tool, args, retries=3): for i in range(retries): try: return tool(**args) except Exception: time.sleep(2 ** i) raise RuntimeError("Tool failed")Timeouts
Every tool must have limits:
| Tool Type | Timeout |
|---|---|
| Web search | 5s |
| DB query | 10s |
| Code execution | 30s+ |
Advanced Patterns
- Fallback tools
- Circuit breakers
- Sandboxing
- Least privilege access
Observability
In traditional systems, you can debug using logs, metrics, and traces. In most cases, if you know the input that caused an issue, you can reproduce it.
However, LLMs are not deterministic. They may produce different outputs for the same input, making debugging more difficult.
Because of this, observability becomes critical.
You need visibility into:
- what the model was reasoning
- which tools were called
- how long they took
- whether they succeeded or failed
You should track:
- tool usage frequency
- latency
- error rates
- retries
Additionally, depending on your needs, you may log:
- raw tool inputs and outputs
- model reasoning steps (or summaries)
- Reasoning traces
Example:
{ "tool": "web_search", "duration_ms": 420, "success": true}The Big Picture
The Tool Manager is not a helper. It is a critical system boundary.
It determines whether your agent is a demo or a production system.
→ Next: Execution Engine