Agentic RAG
Traditional Retrieval-Augmented Generation (RAG) dramatically improved LLM reliability by injecting external knowledge into the prompt. However, classic RAG performs retrieval only once, before reasoning begins.
User Query ↓Embedding ↓Vector / Hybrid Search ↓Retrieve Documents ↓LLM Generates AnswerThis single-pass approach works well for simple queries but falls short on complex, multi-step, or ambiguous tasks.
Agentic RAG (also called adaptive or iterative retrieval) turns retrieval into a dynamic part of the agent’s reasoning loop. The agent actively controls when and how to search, critiques results, and decides whether more information is needed.
From Static RAG to Agentic RAG
| Aspect | Traditional RAG | Agentic RAG |
|---|---|---|
| Retrieval timing | One-time, before reasoning | Iterative, inside the reasoning loop |
| Query control | Fixed user query | Agent can rewrite, decompose, or refine |
| Evaluation | Blind use of retrieved docs | Self-critique and relevance scoring |
| Stopping condition | Always one pass | Agent decides when enough context exists |
This shift makes retrieval a procedural skill the agent can learn and improve over time.
Iterative Retrieval
In Agentic RAG, the agent interleaves reasoning and retrieval:
Thought: I need information about Nvidia H100 architecture.Action: Search vector databaseObservation: H100 includes a Transformer Engine.
Thought: I still need details on how the Transformer Engine accelerates training.Action: Refine query → "Nvidia H100 Transformer Engine architecture 2026"Each iteration builds richer, more targeted context.
Self-Reflection and Retrieval Critique
Modern Agentic RAG systems include explicit self-evaluation steps (inspired by Self-RAG and later frameworks). The agent asks itself:
- Is the retrieved context relevant and sufficient?
- Are there gaps or contradictions?
- Should I retrieve more specific information?
Example critique:
Retrieved: General GPU overview articleCritique: Too broad. Missing specifics on Transformer Engine.Decision: Generate refined query and retrieve again.This critique step is guided by procedural memory (the agent’s learned retrieval strategy) and dramatically improves answer quality.
Example Agentic RAG Loop
User Question ↓Initial Hybrid Search (vector + keyword + metadata) ↓Retrieve & Rerank Documents ↓Agent Critiques Relevance ↓If insufficient → Refine query / Decompose / Retrieve again ↓Synthesize Final Context ↓Generate AnswerThe loop repeats until the agent is confident or a maximum iteration limit is reached.
Example Implementation
from langgraph.graph import StateGraph, END
def agentic_rag(state): query = state["query"] context = state.get("context", [])
# Step 1: Retrieve docs = retriever.hybrid_search(query) # vector + BM25 + metadata context.extend(docs)
# Step 2: Critique critique_prompt = f"Question: {query}\nContext: {context}\nIs this enough? Yes/No + reason." critique = llm.invoke(critique_prompt)
if "yes" in critique.lower(): state["context"] = context return state # enough information
# Step 3: Refine and loop state["query"] = llm.invoke(f"Refine this query: {query}\nCritique: {critique}") return state
# The graph runs the loop until the stopping condition is metasync fn agentic_rag( mut query: String, retriever: &dyn Retriever, llm: &LLMClient, max_iterations: usize,) -> String { let mut context = vec![];
for _ in 0..max_iterations { let docs = retriever.hybrid_search(&query).await; context.extend(docs);
let critique: String = llm .invoke(format!("Question: {}\nContext: {:?}\nEnough? Yes/No + reason", query, context)) .await;
if critique.to_lowercase().contains("yes") { break; }
// Refine query query = llm .invoke(format!("Refine query. Critique: {}", critique)) .await; }
llm.invoke(format!("Answer using this context: {:?}", context)).await}These patterns integrate seamlessly with the procedural memory layer (the retrieval strategy) and semantic memory (the knowledge being retrieved).
Benefits of Agentic RAG
- Higher retrieval quality — dynamic query refinement and critique dramatically improve relevance.
- Better coverage of complex topics — multi-step research becomes possible.
- Reduced hallucinations — the agent verifies and expands context as needed.
- Adaptive behavior — the same agent can handle both simple and deeply technical queries effectively.
Challenges and Best Practices
Agentic RAG introduces new trade-offs:
- Increased latency and cost — multiple retrieval + LLM calls add overhead.
- Risk of retrieval loops — the agent may keep searching indefinitely without good stopping criteria.
- Evaluation difficulty — measuring whether the final context is “good enough” requires careful prompt design.
Best practices in 2026:
- Use hybrid search (vector + keyword + metadata) as the default retriever.
- Set hard limits on iterations and token budget.
- Combine with procedural memory to encode reliable stopping rules and critique templates.
- Add reflection steps that update episodic memory so the agent improves its retrieval strategy over time.
- Monitor retrieval effectiveness with metrics (relevance score, coverage, user feedback).
Agentic RAG as Intelligent Knowledge Exploration
Agentic RAG transforms retrieval from a static preprocessing step into an active, agent-driven exploration process — exactly how a skilled human researcher would gather information.
It perfectly illustrates how semantic memory (the knowledge) and procedural memory (the retrieval strategy) work together inside the agent’s reasoning loop.
Looking Ahead
In this article we explored Agentic RAG, where agents actively control, critique, and iterate on retrieval to achieve higher-quality, more reliable answers.
In the next article we will examine Multi-Hop Retrieval, a powerful extension that allows agents to connect and synthesize information across multiple related documents and knowledge sources.
→ Continue to 5.6 — Multi-Hop Retrieval