Skip to content
AUTH

Designing Reliable Tools

In modern agent systems, tools serve as the critical interface between LLM reasoning and real-world actions.

Common examples include:

Unlike traditional software where humans write precise code, agents generate tool calls probabilistically via large language models. This introduces unique challenges:

Poorly designed tools amplify these issues, leading to unreliable agents. Reliable agent systems therefore demand carefully engineered tools that are predictable, safe, and forgiving.


The Three Core Principles of Reliable Tools

Production-grade agent systems consistently follow three foundational principles:

  1. Clear, rich schemas – for inputs and outputs
  2. Idempotent (and safe) actions
  3. Robust, structured error handling

These principles make tool interactions predictable even when the underlying model is stochastic.


Tool Schemas

A tool schema tells the language model exactly what a tool does and how to call it. Modern tool-calling systems (OpenAI, Anthropic, and others) expect schemas based on JSON Schema, including detailed descriptions, types, constraints, and enums.

Good schemas dramatically reduce hallucinated or malformed calls.


Example Tool Schema (Updated for 2026)

Here’s an improved schema for a weather tool that models actually perform well with:

{
"name": "get_weather",
"description": "Get the current weather and conditions for a city. Use this tool when the user asks about temperature, weather, forecast, or climate in a specific location.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The name of the city or location (e.g., 'Tokyo', 'New York, NY', or 'Mumbai, Maharashtra'). Be as specific as possible."
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Preferred temperature unit. Defaults to celsius if not specified."
},
"include_forecast": {
"type": "boolean",
"description": "Whether to include a 3-day forecast (optional)."
}
},
"required": ["city"],
"additionalProperties": false
}
}

Best practices for descriptions:


Why Schemas Matter

Without strong schemas, agents often produce calls like this (bad):

{
"location": "Tokyo",
"temp": "celsius"
}

A proper schema guides the model toward correct usage and enables automatic validation.


Schema Validation

Always validate incoming tool calls before execution. This catches errors early and returns helpful feedback to the agent.

Typical flow:

Tool request
JSON Schema validation
(If valid) Execute tool
(If invalid) Return clear, structured error

Example Validation

from jsonschema import validate, ValidationError
from pydantic import BaseModel # Recommended for stronger typing
class WeatherInput(BaseModel):
city: str
unit: str = "celsius"
include_forecast: bool = False
# Or use raw jsonschema for framework-agnostic validation
schema = { ... } # as shown above
try:
validate(instance=tool_call_args, schema=schema)
except ValidationError as e:
return {"status": "error", "message": f"Invalid parameters: {e.message}"}

Libraries like Pydantic (Python) or equivalent typed systems make this even more ergonomic while providing excellent error messages.


Structured Output Schemas

In addition to input schemas, define output schemas. This helps agents parse results reliably and enables better tool chaining.

Example successful response envelope:

{
"status": "success",
"data": {
"city": "Tokyo",
"temperature": 18,
"unit": "celsius",
"condition": "cloudy",
"humidity": 65,
"forecast": [ ... ] // optional
},
"metadata": {
"timestamp": "2026-04-01T22:00:00Z",
"source": "reliable-weather-api"
}
}

Consistent envelopes (status, data, metadata) make reasoning loops far more stable.


Idempotent Actions

An operation is idempotent if repeating it with the same inputs produces the same result without unintended side effects.

Read-only tools (like get_weather) are naturally idempotent. For tools that modify state, design them to be safe to retry.


Why Idempotency Matters for Agents

Agents frequently retry calls due to:

Non-idempotent actions (e.g., plain send_email(), charge_credit_card(), create_user()) can cause duplicate emails, double charges, or data corruption.


Designing Idempotent Tools

Preferred patterns in 2026:

Example schema addition for a mutation tool:

"idempotency_key": {
"type": "string",
"description": "Unique client-generated key to ensure this operation is performed only once. Use the same key on retries."
}

This pattern, combined with server-side deduplication, prevents most retry-related issues.


Tool Error Handling

External tools and APIs are inherently unreliable (network issues, rate limits, downtime, permission changes). Agents must handle failures gracefully without breaking the reasoning loop.


Common Error Types

Error TypeExampleRecommended Action
TransientNetwork timeout, rate limitRetry with backoff
ValidationInvalid parameterNo retry, clear message
Permanent/APIService unavailable, authNo retry, suggest alternative
Permission/SecurityUnauthorizedEscalate or stop

Structured Error Responses

Always return machine-readable errors:

{
"status": "error",
"error_type": "rate_limit_exceeded",
"message": "API rate limit exceeded. Try again in 60 seconds.",
"retry_after": 60,
"suggestion": "Consider using cached data or a different data source."
}

Structured errors allow the agent to reason intelligently (e.g., “I should wait and retry” or “switch to backup tool”).


Retry Strategies & Resilience

Implement retries only for transient errors, using exponential backoff with jitter. Consider adding circuit breakers for persistently failing tools.

Example retry implementation (Python):

import time
import random
def retry_with_backoff(tool, args, max_retries=3, base_delay=1):
for attempt in range(max_retries):
try:
return tool(**args)
except TransientError: # e.g., network, rate limit
if attempt == max_retries - 1:
raise
delay = (base_delay * (2 ** attempt)) + random.uniform(0, 1)
time.sleep(delay)
except PermanentError:
raise # Do not retry

Similar patterns apply in Rust and other languages.


Designing Tools for Agents vs. Humans

Traditional human-facing APIs often assume perfect inputs and clear intent. Agent-friendly tools need:


Best Practices for Agent Tools (2026)

PrincipleDescription
Clear, rich schemasFull JSON Schema with detailed descriptions, enums, and constraints
Structured outputsConsistent success/error envelopes with metadata
IdempotencySafe retries via keys or check-then-act patterns
Structured errorsMachine-readable with error_type and suggestions
Deterministic outputsSame inputs → same outputs (when possible)
ObservabilityLogging, tracing, metrics for every call
Security & least privilegeInput sanitization, scoped auth, rate limits

Following these practices significantly improves agent reliability, reduces token waste from failed loops, and makes debugging far easier.


Tools as the Backbone of Agent Systems

As agents grow more capable, the quality and standardization of the tool ecosystem often becomes the primary limiter of performance.

Modern architectures look like:

LLM Reasoning + Memory
+
Robust Tool Layer (often via MCP)

High-quality tools turn capable models into dependable systems.


Looking Ahead

In this article we covered how to design reliable tools using clear schemas, idempotent actions, structured responses, and resilient error handling.

Next, we’ll explore the Model Context Protocol (MCP) — the open standard (now governed by the Linux Foundation’s Agentic AI Foundation) that enables standardized tool discovery, secure connections, and seamless integration across different agents and platforms.

→ Continue to 4.3 — The Model Context Protocol (MCP)