Skip to content

Landscape of RAG Solutions for LLM Applications ​

Last Updated: 2025-08-24
Version: 2.0.0 (AGI-memory essentials)

Why this doc exists ​

A concise, opinionated guide for building production AGI memory. No vendor catalog, no encyclopedias—only what matters to ship: stable primitives for long-term memory, retrieval quality, feedback learning, and observability.


AGI-memory essentials (August 2025) ​

  • Memory persistence primitives: per-agent, per-identity stores with TTL/decay and reinforcement.
  • Retrieval stack: hybrid vector + sparse + graph hops; cheap first, heavy second.
  • Knowledge topology: local subgraph-on-demand > global monolith graphs.
  • Temporal awareness: recency, episodic windows, and time-weighted scoring.
  • Feedback learning: write-backs from conversations and tasks; guard for privacy and secure fields.
  • Observability and eval: end-to-end traces, attribution, quality metrics, cost/latency budgets.
  • Large context bridge: Combine 200K-1M token windows (Claude Sonnet 4) with targeted retrieval for attribution and cost control.

What to use now (short picks) ​

Orchestration / Memory framework ​

  • LangGraph (>=0.6.1): Context API replaces config patterns; type-safe Runtime[Context]; persistent memory, HITL, resumable graphs.
    Use when you need multi-step tools + memory. Mind the complexity and external state store.
  • LlamaIndex Agents (>=0.13.0): NEW AgentWorkflow system replaces legacy agents (breaking migration); PropertyGraphIndex, solid doc parsing.
    Use when you need fast shipping with strong ingestion/evaluation. Graph part is basic.
  • Microsoft GraphRAG (Production 2025): graph-first RAG with community detection; Azure Discovery platform.
    Use for multi-hop reasoning and domain graphs; cost heavy—prefer subgraph-on-demand.

Vector/Graph storage ​

  • Qdrant: self-hosted, strong payload filters, HNSW + quantization; 4x RPS gains in latest benchmarks; great default for agent memory.
  • Weaviate: GraphQL, hybrid search, multi-tenancy; MUVERA multi-vector embeddings; good for cloud/self-host projects needing flexibility.
  • Pinecone: managed, sparse+dense hybrid, serverless 2.0 with auto-config; best when you want zero-ops and predictable SLOs.
  • Milvus: extreme scale and GPU acceleration; CAGRA index for 10x batch performance; use when your recall/latency budget requires it.

Models & embeddings ​

  • Text embeddings: Voyage AI voyage-3-large (top retrieval quality) or OpenAI text-embedding-3-large/3-small (5x cost reduction via Matryoshka).
  • Re-ranking: Cohere Rerank or Voyage for improved MRR on small k.
  • LLM context windows: Claude Sonnet 4 (1M tokens) eliminates retrieval for many use cases—still keep retrieval for attribution, freshness, and cost control.
  • Multi-modal: voyage-multimodal-3, ColPali for documents; only when it moves product KPIs; otherwise keep text-first for memory.

Production patterns that work ​

  1. Memory–retrieval fusion
  • Write explicit memory events (facts, skills, preferences, tasks) with typed schemas.
  • Retrieve from memory and documents jointly; merge by recency×relevance×confidence.
  • Keep a small "working set" cache per session; refill from stores on demand.
  1. Subgraph-on-demand
  • Build local knowledge subgraphs per query/task via entity/link extraction; expire quickly.
  • Use GraphRAG patterns for multi-hop reasoning; avoid whole-corpus graph construction except for narrow domains.
  1. Large context + targeted retrieval
  • Use 1M token context (Claude Sonnet 4) for document analysis; targeted retrieval for attribution and real-time updates.
  • Balance cost: large context for reasoning, retrieval for facts and freshness.
  1. Temporal scoring and decay
  • Score = alpha·semantic + beta·recency + gamma·reinforcement; apply soft decay per identity.
  • Promote items on explicit confirmations or repeated usage; demote noisy snippets.
  1. Safety and privacy guards
  • Never log raw sensitive text; mask secure fields.
  • Maintain allow/deny lists for write-backs; require consent for cross-identity joins.
  1. Observability and evaluation
  • LangSmith v2 / LlamaTrace with request_id, agent_id, user_id, dataset_id; log top-k, scores, write-backs.
  • Track latency budgets: p95 retrieval < 50 ms, total thought loop < 1–2 s.
  • Evaluate with retrieval precision/recall, answer faithfulness, and memory hit quality.

Minimal checklist (map to picks) ​

  • Persistent memory per agent/identity → LangGraph Context API store | LlamaIndex AgentWorkflow memory | Qdrant/Weaviate.
  • Hybrid retrieval (dense+sparse) → Pinecone sparse-dense | Weaviate hybrid | BM25+vector via your search.
  • Graph hops when needed → LlamaIndex PropertyGraphIndex (basic) | GraphRAG subgraph-on-demand | Microsoft Discovery platform.
  • Large context strategy → Claude Sonnet 4 (1M tokens) for analysis + targeted retrieval for attribution/freshness.
  • Best embeddings → Voyage AI voyage-3-large for quality or OpenAI text-embedding-3 for cost optimization.
  • Temporal + reinforcement → implement in app layer with simple weights and TTLs.
  • Observability → LangSmith v2 / LlamaTrace; structured logs with attribution.
  • Cost/latency control → small k + rerank, cache working set, favor subgraphs over global graphs.

What we intentionally ignore (for core AGI memory) ​

  • Prototyping DBs (e.g., Chroma) for production memory stores.
  • All-in-one stacks with low adoption (txtai, Marqo, Jina) unless a concrete requirement demands them.
  • Heavy enterprise search engines (Vespa) unless you operate at that scale already.
  • Giant comparison matrices—signal gets lost; we keep a living checklist instead.

Our bets for Mnemoverse ​

  • Graph–Vector Hybrid Memory: fuse vector recall with on-demand local graphs for reasoning.
  • Memory-Aware Retrieval Engine: retrieval priorities adapt from agent’s accumulated experience.
  • Hyperbolic Embeddings (R&D): hierarchical spaces for topics/skills; pilot behind a flag.
  • GPU-Native Indexing: accelerate ingestion, re-index, and online updates.
  • Spatial UI (3D): practical, collaborative navigation for memory curation and debugging.


See also:

  • Memory Solutions Landscape (complement)
  • Vector–Graph Experience RAG pattern
  • Spatial Memory Design Language
  • Core Mathematical Theory (spatial retrieval)