Skip to content
AUTH

Agentic RAG

Traditional Retrieval-Augmented Generation (RAG) dramatically improved LLM reliability by injecting external knowledge into the prompt. However, classic RAG performs retrieval only once, before reasoning begins.

User Query
Embedding
Vector / Hybrid Search
Retrieve Documents
LLM Generates Answer

This single-pass approach works well for simple queries but falls short on complex, multi-step, or ambiguous tasks.

Agentic RAG (also called adaptive or iterative retrieval) turns retrieval into a dynamic part of the agent’s reasoning loop. The agent actively controls when and how to search, critiques results, and decides whether more information is needed.


From Static RAG to Agentic RAG

AspectTraditional RAGAgentic RAG
Retrieval timingOne-time, before reasoningIterative, inside the reasoning loop
Query controlFixed user queryAgent can rewrite, decompose, or refine
EvaluationBlind use of retrieved docsSelf-critique and relevance scoring
Stopping conditionAlways one passAgent decides when enough context exists

This shift makes retrieval a procedural skill the agent can learn and improve over time.


Iterative Retrieval

In Agentic RAG, the agent interleaves reasoning and retrieval:

Thought: I need information about Nvidia H100 architecture.
Action: Search vector database
Observation: H100 includes a Transformer Engine.
Thought: I still need details on how the Transformer Engine accelerates training.
Action: Refine query → "Nvidia H100 Transformer Engine architecture 2026"

Each iteration builds richer, more targeted context.


Self-Reflection and Retrieval Critique

Modern Agentic RAG systems include explicit self-evaluation steps (inspired by Self-RAG and later frameworks). The agent asks itself:

Example critique:

Retrieved: General GPU overview article
Critique: Too broad. Missing specifics on Transformer Engine.
Decision: Generate refined query and retrieve again.

This critique step is guided by procedural memory (the agent’s learned retrieval strategy) and dramatically improves answer quality.


Example Agentic RAG Loop

User Question
Initial Hybrid Search (vector + keyword + metadata)
Retrieve & Rerank Documents
Agent Critiques Relevance
If insufficient → Refine query / Decompose / Retrieve again
Synthesize Final Context
Generate Answer

The loop repeats until the agent is confident or a maximum iteration limit is reached.


Example Implementation

from langgraph.graph import StateGraph, END
def agentic_rag(state):
query = state["query"]
context = state.get("context", [])
# Step 1: Retrieve
docs = retriever.hybrid_search(query) # vector + BM25 + metadata
context.extend(docs)
# Step 2: Critique
critique_prompt = f"Question: {query}\nContext: {context}\nIs this enough? Yes/No + reason."
critique = llm.invoke(critique_prompt)
if "yes" in critique.lower():
state["context"] = context
return state # enough information
# Step 3: Refine and loop
state["query"] = llm.invoke(f"Refine this query: {query}\nCritique: {critique}")
return state
# The graph runs the loop until the stopping condition is met

These patterns integrate seamlessly with the procedural memory layer (the retrieval strategy) and semantic memory (the knowledge being retrieved).


Benefits of Agentic RAG


Challenges and Best Practices

Agentic RAG introduces new trade-offs:

Best practices in 2026:


Agentic RAG as Intelligent Knowledge Exploration

Agentic RAG transforms retrieval from a static preprocessing step into an active, agent-driven exploration process — exactly how a skilled human researcher would gather information.

It perfectly illustrates how semantic memory (the knowledge) and procedural memory (the retrieval strategy) work together inside the agent’s reasoning loop.


Looking Ahead

In this article we explored Agentic RAG, where agents actively control, critique, and iterate on retrieval to achieve higher-quality, more reliable answers.

In the next article we will examine Multi-Hop Retrieval, a powerful extension that allows agents to connect and synthesize information across multiple related documents and knowledge sources.

→ Continue to 5.6 — Multi-Hop Retrieval