Setting Up Your Local Agentic AI Environment

In this tutorial series, every example is provided in both Python and Rust.

We strongly recommend running everything locally with Ollama.
Cloud APIs (even free tiers like Gemini) quickly hit rate limits when you experiment, debug, or run many iterations. Ollama gives you unlimited local inference, zero cost, and full privacy — ideal for learning agentic AI by trial and error.

Note: Local Ollama will be significantly slower than cloud APIs, especially on older hardware. If you need more speed, you can switch to a cloud API (like Gemini) at any time — just be mindful of rate limits and costs.

1. Programming Language Setup

Choose one path (you can switch later):

Python → Fast prototyping and huge ecosystem.
Rust → Production-grade performance and memory safety.

If you’re unsure, start with Python.

Official install links:

2. Install Ollama (Local LLM Runtime)

Ollama lets you run powerful open-source models (like Qwen 3.5) directly on your laptop.

macOS (Recommended: Homebrew)

# Install Ollama via Homebrew (easiest way on macOS)
brew install ollama

# Start the Ollama service
brew services start ollama

Windows / Linux / Other macOS methods

Visit the official download page → https://ollama.com/download

Best multi-platform video (Mac + Windows + Linux) – “Install Ollama on Mac, Windows & Linux (Step-by-Step)” (Dec 2025)

Additional excellent videos:

Installing Ollama is EASY Everywhere (macOS, Windows, Linux)
How to Install Ollama on macOS (Apple Silicon) – great for M1–M5 Macs
Ollama on Windows in 5 minutes

3. Pull the Model We Will Use

# Pull Qwen 3.5 (9B) — excellent balance of speed and reasoning
ollama pull qwen3.5:9b

This model runs well on most recent laptops (8GB+ RAM recommended).

4. Quick Test – Is Ollama Working?

Run the interactive chat:

ollama run qwen3.5:9b

Then type:

Explain how a ReAct agent works in one paragraph.

You should get a coherent answer instantly.

5. (Optional) Cloud Alternative – Gemini

Only use this if you prefer a hosted model and accept occasional rate limits.

Get Gemini API Key →

SDK Installation

Python
Rust

pip install google-genai python-dotenv

[dependencies]
gemini-client-api = "7.4.5"
tokio = { version = "1", features = ["full"] }
dotenvy = "0.15"

We will not use cloud APIs in the core tutorials — everything runs locally with Ollama.

6. Verifying Your Setup (Ollama + Python/Rust)

Gemini based setup code is provided in the sample code github repo as well as in the first article of this series (What is an Agent?).

Python
Rust

from ollama import chat
from pydantic import BaseModel

response = chat(
    model="qwen3.5:9b",
    messages=[{"role": "user", "content": "Say hello from Ollama!"}]
)

print(response.message.content)

use ollama_rs::{
    generation::chat::{request::ChatMessageRequest, ChatMessage},
    Ollama,
};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    println!("Connecting to Ollama...");

    let ollama = Ollama::default();

    let request = ChatMessageRequest::new(
        "qwen3.5:9b".to_string(),
        vec![ChatMessage::user("Say hello from Ollama and tell me you're ready for agent development!".to_string())],
    );

    let response = ollama.send_chat_messages(request).await
        .map_err(|e| format!("Failed to connect to Ollama. Is `ollama serve` running?\nError: {}", e))?;

    println!("\n✅ Ollama is working!\n");
    println!("Response:\n{}", response.message.content);

    Ok(())
}

Majority of the examples in this series will use Ollama (both Rust and Python). You can switch to Gemini or another API at any time — just be mindful of rate limits and costs.

When to Use What

Use Case	Recommended Setup	Reason
Learning / experimentation	Ollama (local)	Unlimited trials, no cost
Production / high-scale	OpenAI / Gemini	Higher speed & scale
Offline / privacy-first	Ollama	Runs 100% locally

You’re now fully set up!

→ Next: What is an Agent?