Designing Reliable Tools
In modern agent systems, tools serve as the critical interface between LLM reasoning and real-world actions.
Common examples include:
- Web search and browsing APIs
- Database queries and mutations
- Filesystem or code execution environments
- External services (payments, email, calendars)
- MCP servers for standardized tool discovery
Unlike traditional software where humans write precise code, agents generate tool calls probabilistically via large language models. This introduces unique challenges:
- Invalid or ambiguous parameters
- Repeated or out-of-order calls
- Sensitivity to flaky external services
Poorly designed tools amplify these issues, leading to unreliable agents. Reliable agent systems therefore demand carefully engineered tools that are predictable, safe, and forgiving.
The Three Core Principles of Reliable Tools
Production-grade agent systems consistently follow three foundational principles:
- Clear, rich schemas – for inputs and outputs
- Idempotent (and safe) actions
- Robust, structured error handling
These principles make tool interactions predictable even when the underlying model is stochastic.
Tool Schemas
A tool schema tells the language model exactly what a tool does and how to call it. Modern tool-calling systems (OpenAI, Anthropic, and others) expect schemas based on JSON Schema, including detailed descriptions, types, constraints, and enums.
Good schemas dramatically reduce hallucinated or malformed calls.
Example Tool Schema (Updated for 2026)
Here’s an improved schema for a weather tool that models actually perform well with:
{ "name": "get_weather", "description": "Get the current weather and conditions for a city. Use this tool when the user asks about temperature, weather, forecast, or climate in a specific location.", "parameters": { "type": "object", "properties": { "city": { "type": "string", "description": "The name of the city or location (e.g., 'Tokyo', 'New York, NY', or 'Mumbai, Maharashtra'). Be as specific as possible." }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "Preferred temperature unit. Defaults to celsius if not specified." }, "include_forecast": { "type": "boolean", "description": "Whether to include a 3-day forecast (optional)." } }, "required": ["city"], "additionalProperties": false }}Best practices for descriptions:
- Start with what the tool does.
- Explain when to use it.
- Include examples and constraints.
- Keep language clear and concise.
Why Schemas Matter
Without strong schemas, agents often produce calls like this (bad):
{ "location": "Tokyo", "temp": "celsius"}A proper schema guides the model toward correct usage and enables automatic validation.
Schema Validation
Always validate incoming tool calls before execution. This catches errors early and returns helpful feedback to the agent.
Typical flow:
Tool request ↓JSON Schema validation ↓(If valid) Execute tool ↓(If invalid) Return clear, structured errorExample Validation
from jsonschema import validate, ValidationErrorfrom pydantic import BaseModel # Recommended for stronger typing
class WeatherInput(BaseModel): city: str unit: str = "celsius" include_forecast: bool = False
# Or use raw jsonschema for framework-agnostic validationschema = { ... } # as shown above
try: validate(instance=tool_call_args, schema=schema)except ValidationError as e: return {"status": "error", "message": f"Invalid parameters: {e.message}"}use jsonschema::JSONSchema;
let compiled = JSONSchema::compile(&schema_json).unwrap();if let Err(errors) = compiled.validate(&instance_json) { // Return structured error to agent}Libraries like Pydantic (Python) or equivalent typed systems make this even more ergonomic while providing excellent error messages.
Structured Output Schemas
In addition to input schemas, define output schemas. This helps agents parse results reliably and enables better tool chaining.
Example successful response envelope:
{ "status": "success", "data": { "city": "Tokyo", "temperature": 18, "unit": "celsius", "condition": "cloudy", "humidity": 65, "forecast": [ ... ] // optional }, "metadata": { "timestamp": "2026-04-01T22:00:00Z", "source": "reliable-weather-api" }}Consistent envelopes (status, data, metadata) make reasoning loops far more stable.
Idempotent Actions
An operation is idempotent if repeating it with the same inputs produces the same result without unintended side effects.
Read-only tools (like get_weather) are naturally idempotent. For tools that modify state, design them to be safe to retry.
Why Idempotency Matters for Agents
Agents frequently retry calls due to:
- Lost observations in long contexts
- Framework-level retries
- Reasoning loops that re-evaluate the same step
Non-idempotent actions (e.g., plain send_email(), charge_credit_card(), create_user()) can cause duplicate emails, double charges, or data corruption.
Designing Idempotent Tools
Preferred patterns in 2026:
- Read-only or deterministic outputs by default
- For mutations: Use idempotency keys (often passed as
idempotency_keyor generated from tool call ID) - Check-then-act with atomic operations:
create_user_if_not_exists(),upsert_record() - Return the existing resource if it already exists (with a note like
"already_exists": true)
Example schema addition for a mutation tool:
"idempotency_key": { "type": "string", "description": "Unique client-generated key to ensure this operation is performed only once. Use the same key on retries."}This pattern, combined with server-side deduplication, prevents most retry-related issues.
Tool Error Handling
External tools and APIs are inherently unreliable (network issues, rate limits, downtime, permission changes). Agents must handle failures gracefully without breaking the reasoning loop.
Common Error Types
| Error Type | Example | Recommended Action |
|---|---|---|
| Transient | Network timeout, rate limit | Retry with backoff |
| Validation | Invalid parameter | No retry, clear message |
| Permanent/API | Service unavailable, auth | No retry, suggest alternative |
| Permission/Security | Unauthorized | Escalate or stop |
Structured Error Responses
Always return machine-readable errors:
{ "status": "error", "error_type": "rate_limit_exceeded", "message": "API rate limit exceeded. Try again in 60 seconds.", "retry_after": 60, "suggestion": "Consider using cached data or a different data source."}Structured errors allow the agent to reason intelligently (e.g., “I should wait and retry” or “switch to backup tool”).
Retry Strategies & Resilience
Implement retries only for transient errors, using exponential backoff with jitter. Consider adding circuit breakers for persistently failing tools.
Example retry implementation (Python):
import timeimport random
def retry_with_backoff(tool, args, max_retries=3, base_delay=1): for attempt in range(max_retries): try: return tool(**args) except TransientError: # e.g., network, rate limit if attempt == max_retries - 1: raise delay = (base_delay * (2 ** attempt)) + random.uniform(0, 1) time.sleep(delay) except PermanentError: raise # Do not retrySimilar patterns apply in Rust and other languages.
Designing Tools for Agents vs. Humans
Traditional human-facing APIs often assume perfect inputs and clear intent. Agent-friendly tools need:
- Flexible yet strictly validated schemas
- Forgiving input parsing where safe
- Rich, natural-language descriptions
- Consistent structured outputs
- Built-in safety (idempotency, least privilege, rate limiting)
Best Practices for Agent Tools (2026)
| Principle | Description |
|---|---|
| Clear, rich schemas | Full JSON Schema with detailed descriptions, enums, and constraints |
| Structured outputs | Consistent success/error envelopes with metadata |
| Idempotency | Safe retries via keys or check-then-act patterns |
| Structured errors | Machine-readable with error_type and suggestions |
| Deterministic outputs | Same inputs → same outputs (when possible) |
| Observability | Logging, tracing, metrics for every call |
| Security & least privilege | Input sanitization, scoped auth, rate limits |
Following these practices significantly improves agent reliability, reduces token waste from failed loops, and makes debugging far easier.
Tools as the Backbone of Agent Systems
As agents grow more capable, the quality and standardization of the tool ecosystem often becomes the primary limiter of performance.
Modern architectures look like:
LLM Reasoning + Memory +Robust Tool Layer (often via MCP)High-quality tools turn capable models into dependable systems.
Looking Ahead
In this article we covered how to design reliable tools using clear schemas, idempotent actions, structured responses, and resilient error handling.
Next, we’ll explore the Model Context Protocol (MCP) — the open standard (now governed by the Linux Foundation’s Agentic AI Foundation) that enables standardized tool discovery, secure connections, and seamless integration across different agents and platforms.
→ Continue to 4.3 — The Model Context Protocol (MCP)