Skip to content
AUTH

Semantic Memory

Humans use different types of long-term memory:

Memory TypeExample
Episodic“Yesterday I compared the H100 and RTX 4090 GPUs.”
Semantic“The H100 GPU has 80GB HBM3 memory and excels at AI training workloads.”

Modern AI agents implement a similar distinction. While episodic memory captures the agent’s own history, semantic memory serves as the agent’s external knowledge base — storing documents, research papers, technical specifications, company data, and domain expertise.

Instead of relying solely on the LLM’s fixed parametric knowledge, agents store this information externally and retrieve it at runtime. This approach scales far beyond context windows and stays up-to-date.


Why Semantic Memory Is Essential

Large language models have several fundamental limitations:

Semantic memory addresses these gaps by providing a dynamic, updatable knowledge layer. The agent retrieves only the most relevant information needed for the current task and injects it into working memory.

This retrieval process is the foundation of Retrieval-Augmented Generation (RAG), which has evolved into Agentic RAG in 2026 — where the agent itself can iteratively refine queries, critique results, and decide when enough context has been gathered.


Knowledge Bases and Document Processing

A semantic memory system starts with a knowledge base — a collection of documents that can include:

Raw documents are typically chunked into smaller pieces, enriched with metadata (source, timestamp, user_id, category), and converted into vector embeddings before storage.


Embeddings: Capturing Semantic Meaning

An embedding is a high-dimensional numerical vector that represents the semantic meaning of text. Texts with similar meanings end up close together in vector space, even if their wording differs.

Example pipeline:

Document / Chunk
Embedding Model (e.g. text-embedding-3-large, voyage-3, snowflake-arctic-embed)
Dense Vector (e.g. 1024 or 3072 dimensions)

Modern embedding models produce high-quality vectors optimized for retrieval quality and cost.


Embeddings are stored in vector databases, which support fast similarity search. In 2026, pure vector search is rarely used alone. Production systems use hybrid search that combines:

Popular vector databases include:

DatabaseStrengths
QdrantExcellent hybrid search and filtering
WeaviateStrong graph + vector capabilities
PineconeFully managed, serverless scale
pgvectorPostgres-based, easy integration
ChromaLightweight and developer-friendly

Storing Knowledge in Semantic Memory

Here’s how documents are typically added to semantic memory:

from mem0 import Memory
memory = Memory()
memory.add(
data="The Nvidia H100 GPU features 80GB HBM3 memory and delivers up to 2.8x better training performance than the RTX 4090 for large language models.",
user_id="ravi_001",
metadata={
"type": "semantic",
"source": "nvidia_h100_architecture_2026",
"category": "hardware",
"timestamp": "2026-03-15"
}
)

Good semantic memory systems also support automatic chunking, metadata extraction, and periodic re-embedding when documents are updated.


Retrieval Process

When the agent needs information:

  1. The query is converted into an embedding.
  2. Hybrid search is performed (vector + keyword + metadata filters).
  3. Results are reranked for relevance.
  4. Top-k documents (or chunks) are returned and inserted into the agent’s working memory.

This retrieved context grounds the LLM’s reasoning and significantly reduces hallucinations.


Retrieval-Augmented Generation (RAG) and Agentic RAG

Basic RAG follows a simple retrieve-then-generate pattern. In 2026, most advanced agents use Agentic RAG, where the agent can:

This makes semantic memory far more powerful and adaptive.


Advantages of Semantic Memory


Challenges and Best Practices

Semantic memory systems face several practical challenges:

Best practices in 2026:


Semantic Memory in the Broader Architecture

Semantic memory works best when integrated with the full memory hierarchy:

Together with reliable tools (via MCP) and procedural memory (covered next), these layers enable agents to reason with both what they know and what they have experienced.


Looking Ahead

In this article we explored semantic memory — how agents store and retrieve factual knowledge using embeddings, vector/hybrid databases, and modern RAG techniques.

In the next article we will examine Procedural Memory, which allows agents to store, reuse, and improve workflows, strategies, and operational “how-to” knowledge.

→ Continue to 5.4 — Procedural Memory