Designing Reliable Tools

In modern agent systems, tools serve as the critical interface between LLM reasoning and real-world actions.

Common examples include:

Web search and browsing APIs
Database queries and mutations
Filesystem or code execution environments
External services (payments, email, calendars)
MCP servers for standardized tool discovery

Unlike traditional software where humans write precise code, agents generate tool calls probabilistically via large language models. This introduces unique challenges:

Invalid or ambiguous parameters
Repeated or out-of-order calls
Sensitivity to flaky external services

Poorly designed tools amplify these issues, leading to unreliable agents. Reliable agent systems therefore demand carefully engineered tools that are predictable, safe, and forgiving.

The Three Core Principles of Reliable Tools

Production-grade agent systems consistently follow three foundational principles:

Clear, rich schemas – for inputs and outputs
Idempotent (and safe) actions
Robust, structured error handling

These principles make tool interactions predictable even when the underlying model is stochastic.

Tool Schemas

A tool schema tells the language model exactly what a tool does and how to call it. Modern tool-calling systems (OpenAI, Anthropic, and others) expect schemas based on JSON Schema, including detailed descriptions, types, constraints, and enums.

Good schemas dramatically reduce hallucinated or malformed calls.

Example Tool Schema (Updated for 2026)

Here’s an improved schema for a weather tool that models actually perform well with:

{
  "name": "get_weather",
  "description": "Get the current weather and conditions for a city. Use this tool when the user asks about temperature, weather, forecast, or climate in a specific location.",
  "parameters": {
    "type": "object",
    "properties": {
      "city": {
        "type": "string",
        "description": "The name of the city or location (e.g., 'Tokyo', 'New York, NY', or 'Mumbai, Maharashtra'). Be as specific as possible."
      },
      "unit": {
        "type": "string",
        "enum": ["celsius", "fahrenheit"],
        "description": "Preferred temperature unit. Defaults to celsius if not specified."
      },
      "include_forecast": {
        "type": "boolean",
        "description": "Whether to include a 3-day forecast (optional)."
      }
    },
    "required": ["city"],
    "additionalProperties": false
  }
}

Best practices for descriptions:

Start with what the tool does.
Explain when to use it.
Include examples and constraints.
Keep language clear and concise.

Why Schemas Matter

Without strong schemas, agents often produce calls like this (bad):

{
  "location": "Tokyo",
  "temp": "celsius"
}

A proper schema guides the model toward correct usage and enables automatic validation.

Schema Validation

Always validate incoming tool calls before execution. This catches errors early and returns helpful feedback to the agent.

Typical flow:

Tool request
     ↓
JSON Schema validation
     ↓
(If valid) Execute tool
     ↓
(If invalid) Return clear, structured error

from jsonschema import validate, ValidationError
from pydantic import BaseModel  # Recommended for stronger typing

class WeatherInput(BaseModel):
    city: str
    unit: str = "celsius"
    include_forecast: bool = False

# Or use raw jsonschema for framework-agnostic validation
schema = { ... }  # as shown above

try:
    validate(instance=tool_call_args, schema=schema)
except ValidationError as e:
    return {"status": "error", "message": f"Invalid parameters: {e.message}"}

use jsonschema::JSONSchema;

let compiled = JSONSchema::compile(&schema_json).unwrap();
if let Err(errors) = compiled.validate(&instance_json) {
    // Return structured error to agent
}

Libraries like Pydantic (Python) or equivalent typed systems make this even more ergonomic while providing excellent error messages.

Structured Output Schemas

In addition to input schemas, define output schemas. This helps agents parse results reliably and enables better tool chaining.

Example successful response envelope:

{
  "status": "success",
  "data": {
    "city": "Tokyo",
    "temperature": 18,
    "unit": "celsius",
    "condition": "cloudy",
    "humidity": 65,
    "forecast": [ ... ]  // optional
  },
  "metadata": {
    "timestamp": "2026-04-01T22:00:00Z",
    "source": "reliable-weather-api"
  }
}

Consistent envelopes (status, data, metadata) make reasoning loops far more stable.

Idempotent Actions

An operation is idempotent if repeating it with the same inputs produces the same result without unintended side effects.

Read-only tools (like get_weather) are naturally idempotent. For tools that modify state, design them to be safe to retry.

Why Idempotency Matters for Agents

Agents frequently retry calls due to:

Lost observations in long contexts
Framework-level retries
Reasoning loops that re-evaluate the same step

Non-idempotent actions (e.g., plain send_email(), charge_credit_card(), create_user()) can cause duplicate emails, double charges, or data corruption.

Designing Idempotent Tools

Preferred patterns in 2026:

Read-only or deterministic outputs by default
For mutations: Use idempotency keys (often passed as idempotency_key or generated from tool call ID)
Check-then-act with atomic operations: create_user_if_not_exists(), upsert_record()
Return the existing resource if it already exists (with a note like "already_exists": true)

Example schema addition for a mutation tool:

"idempotency_key": {
  "type": "string",
  "description": "Unique client-generated key to ensure this operation is performed only once. Use the same key on retries."
}

This pattern, combined with server-side deduplication, prevents most retry-related issues.

Tool Error Handling

External tools and APIs are inherently unreliable (network issues, rate limits, downtime, permission changes). Agents must handle failures gracefully without breaking the reasoning loop.

Common Error Types

Error Type	Example	Recommended Action
Transient	Network timeout, rate limit	Retry with backoff
Validation	Invalid parameter	No retry, clear message
Permanent/API	Service unavailable, auth	No retry, suggest alternative
Permission/Security	Unauthorized	Escalate or stop

Structured Error Responses

Always return machine-readable errors:

{
  "status": "error",
  "error_type": "rate_limit_exceeded",
  "message": "API rate limit exceeded. Try again in 60 seconds.",
  "retry_after": 60,
  "suggestion": "Consider using cached data or a different data source."
}

Structured errors allow the agent to reason intelligently (e.g., “I should wait and retry” or “switch to backup tool”).

Retry Strategies & Resilience

Implement retries only for transient errors, using exponential backoff with jitter. Consider adding circuit breakers for persistently failing tools.

Example retry implementation (Python):

import time
import random

def retry_with_backoff(tool, args, max_retries=3, base_delay=1):
    for attempt in range(max_retries):
        try:
            return tool(**args)
        except TransientError:  # e.g., network, rate limit
            if attempt == max_retries - 1:
                raise
            delay = (base_delay * (2 ** attempt)) + random.uniform(0, 1)
            time.sleep(delay)
        except PermanentError:
            raise  # Do not retry

Similar patterns apply in Rust and other languages.

Designing Tools for Agents vs. Humans

Traditional human-facing APIs often assume perfect inputs and clear intent. Agent-friendly tools need:

Flexible yet strictly validated schemas
Forgiving input parsing where safe
Rich, natural-language descriptions
Consistent structured outputs
Built-in safety (idempotency, least privilege, rate limiting)

Best Practices for Agent Tools (2026)

Principle	Description
Clear, rich schemas	Full JSON Schema with detailed descriptions, enums, and constraints
Structured outputs	Consistent success/error envelopes with metadata
Idempotency	Safe retries via keys or check-then-act patterns
Structured errors	Machine-readable with error_type and suggestions
Deterministic outputs	Same inputs → same outputs (when possible)
Observability	Logging, tracing, metrics for every call
Security & least privilege	Input sanitization, scoped auth, rate limits

Following these practices significantly improves agent reliability, reduces token waste from failed loops, and makes debugging far easier.

Tools as the Backbone of Agent Systems

As agents grow more capable, the quality and standardization of the tool ecosystem often becomes the primary limiter of performance.

Modern architectures look like:

LLM Reasoning + Memory
          +
Robust Tool Layer (often via MCP)

High-quality tools turn capable models into dependable systems.

Looking Ahead

In this article we covered how to design reliable tools using clear schemas, idempotent actions, structured responses, and resilient error handling.

Next, we’ll explore the Model Context Protocol (MCP) — the open standard (now governed by the Linux Foundation’s Agentic AI Foundation) that enables standardized tool discovery, secure connections, and seamless integration across different agents and platforms.

→ Continue to 4.3 — The Model Context Protocol (MCP)