Graph-RAG Memory: Agent Experience Storage
TL;DR: Store what your agent learns — successful prompts, context that worked, mistakes to avoid. Retrieve relevant experience to make future tasks faster and more reliable.
Why Agent Memory Matters
Every time your agent tackles a task, it generates valuable experience:
- Prompts that worked for specific contexts
- Tool combinations that solved similar problems
- Mistakes and dead ends to avoid repeating
- Context patterns that led to success
Without memory, agents start from zero every time. With Graph-RAG memory, they get smarter with experience.
What We Store (Agent Experience Types)
1. Episodes: "What Happened"
Store compact records of agent sessions:
Task: "Analyze quarterly sales data"
Context: "CSV with 10K rows, need trends and outliers"
Approach: ["Load pandas", "Plot trends", "Statistical analysis"]
Outcome: "Success - found 3 key insights"
Tools Used: ["python", "matplotlib", "pandas"]
Duration: "2 minutes"
2. Prompts: "What We Said to the LLM"
Save effective prompting strategies:
Situation: "Code review request"
Effective Prompt: "Review this code for: 1) bugs, 2) performance, 3) readability.
Prioritize critical issues. Use bullet points."
Context Needed: ["file_type", "language", "complexity"]
Success Rate: 85%
3. Mistakes: "What Didn't Work"
Learn from failures:
Failed Approach: "Asked LLM to write entire 500-line script in one shot"
Problem: "Generated buggy, untestable code"
Better Approach: "Break into functions, implement one at a time"
Context: "Large coding tasks"
4. Skills: "Reusable Patterns"
Extract repeatable strategies:
Skill: "Data Analysis Workflow"
Steps: ["Validate data", "Explore distributions", "Test hypotheses", "Visualize results"]
When To Use: "Numerical datasets > 1K rows"
Success Rate: 92%
Prerequisites: ["pandas available", "clean data format"]
Simple Implementation
Storage Structure
Agent Memory Store:
├── Episodes (what happened)
├── Prompts (what we said)
├── Mistakes (what failed)
└── Skills (what works reliably)
Connections:
- Episode → derives → Prompt (this episode produced this useful prompt)
- Mistake → corrects → Episode (this mistake helps avoid this situation)
- Episodes → generalize → Skill (multiple episodes become a pattern)
Basic Retrieval
- Query comes in: "Help me analyze customer data"
- Search memory: Find similar episodes, relevant prompts, applicable skills
- Rank results: Recent + successful + similar context = higher score
- Compose response: Use retrieved experience to inform approach
- Store new experience: After task completion, save what worked/failed
Practical Example
# Simplified retrieval logic
def get_relevant_experience(task_description, context):
# Find similar episodes
episodes = vector_search(task_description, collection="episodes", k=5)
# Get connected prompts and skills
related = graph_expand(episodes, hops=1, types=["prompts", "skills"])
# Rank by relevance + success rate + recency
ranked = rerank_by_utility(related, context)
return ranked[:3] # Top 3 most relevant experiences
# Usage
experience = get_relevant_experience(
"Create a sales dashboard",
context={"data_type": "CSV", "size": "large"}
)
# Returns: [similar_episode, effective_prompt, proven_skill]
Key Patterns That Work
1. Progressive Skill Building
- Start with Episodes (raw experience)
- Extract Prompts (what worked in LLM interaction)
- Identify Mistakes (what to avoid)
- Generalize to Skills (reusable patterns)
2. Context-Aware Retrieval
- Match task type ("data analysis", "code review", "research")
- Consider tools available (Python, databases, APIs)
- Factor in complexity level (simple, medium, complex)
- Weight recent experience higher
3. Continuous Learning Loop
Task Request → Retrieve Experience → Execute → Store Results → Improve
↑ ↓
←←←←←←←←←←←← Better Performance Next Time ←←←←←←←←←←←←
Simple Mermaid Diagram
Production Tips
What to Store
- Do: Task summaries, effective prompts, tool sequences, outcome quality
- Don't: Raw conversation logs, sensitive data, overly specific details
How to Score Experience
- Recency: Recent experience weighs more (exponential decay)
- Success: Track what actually worked vs. what failed
- Similarity: Context matching (task type, domain, complexity)
- Utility: How often this experience gets reused
Privacy & Security
- Hash or anonymize sensitive context
- Store patterns, not raw data
- Separate personal vs. shared experience stores
- Implement access controls for team memories
Why This Works
- Agents learn from experience instead of starting fresh each time
- Successful strategies get reused automatically
- Mistakes get avoided through negative examples
- Complex skills emerge from repeated successful patterns
- Performance improves over time as memory grows
Getting Started
- Start simple: Store task summaries and outcomes
- Add prompts: Save LLM interactions that worked well
- Track mistakes: Record failed approaches with context
- Extract skills: Identify patterns across multiple successes
- Improve retrieval: Better ranking and context matching
The goal: Make your agent smarter with every task it handles.
References
- LangGraph Memory Documentation
- Reflexion: Self-Correction in Language Agents
- Memory-Augmented LLM Agents Survey
For advanced topics like novelty detection, zero‑shot fallback, consolidation, and corrective feedback, see the companion article: Novelty, Zero‑Shot, and Reflexion: From Episodes to Skills.
7. Discussion and Practical Notes
- Zero-shot is not a failure mode but an acquisition mode—instrument it.
- Consolidations must be small and precise; noisy dumps sabotage future retrieval.
- Labeling failures pays compound interest via corrective edges and penalties.
- Use domain/task filters early to reduce lexical confusions ("bow" vs "bow").
8. Conclusion
Graph-RAG memory operationalizes novelty, learning-by-mistake, and consolidation into a unified system. By structuring episodes, reflections, and skills as a graph augmented with vector search—and by explicitly modeling corrective feedback—agents steadily convert zero-shot improvisations into reusable knowledge. The approach yields measurable benefits: lower false-hit rates, higher retrieval precision, and decreasing reliance on zero-shot over time.
References
[1] Survey on LLM-based Autonomous Agents (Aug 2023). “LLMs as planners; challenges for domain-specific planning.” arXiv:2308.11432. https://arxiv.org/abs/2308.11432
[2] LangChain Blog (2024). “Memory for Agents.” https://blog.langchain.dev/memory-for-agents/ — and LangChain Docs: Memory Overview. https://python.langchain.com/docs/modules/memory/
[3] Oudeyer, P.-Y. et al. (2016). Intrinsic motivation, curiosity, and learning: Theory and applications. Progress in Brain Research, 229, 257–284. https://doi.org/10.1016/bs.pbr.2016.05.005
[4] Park et al. (2023). Generative Agents: Interactive Simulacra of Human Behavior. arXiv:2304.03442. https://arxiv.org/abs/2304.03442
[5] LangGraph (2024). Long-term memory and semantic search. https://blog.langchain.dev/langgraph-memory/ and https://langchain-ai.github.io/langgraph/concepts/memory/
[6] Shinn et al. (2023). Reflexion: Language Agents with Verbal Reinforcement Learning. arXiv:2303.11366. https://arxiv.org/abs/2303.11366
[7] Barnett, T. (2025). The Importance of Being Erroneous: Are AI Mistakes a Feature, Not a Bug? Jackson Lewis P.C. https://www.jacksonlewis.com/insights/importance-being-erroneous-are-ai-mistakes-feature-not-bug