Skip to content

Graph-RAG Memory: Agent Experience Storage

TL;DR: Store what your agent learns — successful prompts, context that worked, mistakes to avoid. Retrieve relevant experience to make future tasks faster and more reliable.


Why Agent Memory Matters

Every time your agent tackles a task, it generates valuable experience:

  • Prompts that worked for specific contexts
  • Tool combinations that solved similar problems
  • Mistakes and dead ends to avoid repeating
  • Context patterns that led to success

Without memory, agents start from zero every time. With Graph-RAG memory, they get smarter with experience.


What We Store (Agent Experience Types)

1. Episodes: "What Happened"

Store compact records of agent sessions:

Task: "Analyze quarterly sales data"
Context: "CSV with 10K rows, need trends and outliers"
Approach: ["Load pandas", "Plot trends", "Statistical analysis"]  
Outcome: "Success - found 3 key insights"
Tools Used: ["python", "matplotlib", "pandas"]
Duration: "2 minutes"

2. Prompts: "What We Said to the LLM"

Save effective prompting strategies:

Situation: "Code review request"
Effective Prompt: "Review this code for: 1) bugs, 2) performance, 3) readability. 
                   Prioritize critical issues. Use bullet points."
Context Needed: ["file_type", "language", "complexity"]
Success Rate: 85%

3. Mistakes: "What Didn't Work"

Learn from failures:

Failed Approach: "Asked LLM to write entire 500-line script in one shot"
Problem: "Generated buggy, untestable code"
Better Approach: "Break into functions, implement one at a time"
Context: "Large coding tasks"

4. Skills: "Reusable Patterns"

Extract repeatable strategies:

Skill: "Data Analysis Workflow"  
Steps: ["Validate data", "Explore distributions", "Test hypotheses", "Visualize results"]
When To Use: "Numerical datasets > 1K rows"
Success Rate: 92%
Prerequisites: ["pandas available", "clean data format"]

Simple Implementation

Storage Structure

Agent Memory Store:
├── Episodes (what happened)
├── Prompts (what we said)  
├── Mistakes (what failed)
└── Skills (what works reliably)

Connections:
- Episode → derives → Prompt (this episode produced this useful prompt)
- Mistake → corrects → Episode (this mistake helps avoid this situation)  
- Episodes → generalize → Skill (multiple episodes become a pattern)

Basic Retrieval

  1. Query comes in: "Help me analyze customer data"
  2. Search memory: Find similar episodes, relevant prompts, applicable skills
  3. Rank results: Recent + successful + similar context = higher score
  4. Compose response: Use retrieved experience to inform approach
  5. Store new experience: After task completion, save what worked/failed

Practical Example

python
# Simplified retrieval logic
def get_relevant_experience(task_description, context):
    # Find similar episodes
    episodes = vector_search(task_description, collection="episodes", k=5)
    
    # Get connected prompts and skills
    related = graph_expand(episodes, hops=1, types=["prompts", "skills"])
    
    # Rank by relevance + success rate + recency  
    ranked = rerank_by_utility(related, context)
    
    return ranked[:3]  # Top 3 most relevant experiences

# Usage
experience = get_relevant_experience(
    "Create a sales dashboard", 
    context={"data_type": "CSV", "size": "large"}
)
# Returns: [similar_episode, effective_prompt, proven_skill]

Key Patterns That Work

1. Progressive Skill Building

  • Start with Episodes (raw experience)
  • Extract Prompts (what worked in LLM interaction)
  • Identify Mistakes (what to avoid)
  • Generalize to Skills (reusable patterns)

2. Context-Aware Retrieval

  • Match task type ("data analysis", "code review", "research")
  • Consider tools available (Python, databases, APIs)
  • Factor in complexity level (simple, medium, complex)
  • Weight recent experience higher

3. Continuous Learning Loop

Task Request → Retrieve Experience → Execute → Store Results → Improve
     ↑                                                            ↓
     ←←←←←←←←←←←← Better Performance Next Time ←←←←←←←←←←←←

Simple Mermaid Diagram


Production Tips

What to Store

  • Do: Task summaries, effective prompts, tool sequences, outcome quality
  • Don't: Raw conversation logs, sensitive data, overly specific details

How to Score Experience

  • Recency: Recent experience weighs more (exponential decay)
  • Success: Track what actually worked vs. what failed
  • Similarity: Context matching (task type, domain, complexity)
  • Utility: How often this experience gets reused

Privacy & Security

  • Hash or anonymize sensitive context
  • Store patterns, not raw data
  • Separate personal vs. shared experience stores
  • Implement access controls for team memories

Why This Works

  1. Agents learn from experience instead of starting fresh each time
  2. Successful strategies get reused automatically
  3. Mistakes get avoided through negative examples
  4. Complex skills emerge from repeated successful patterns
  5. Performance improves over time as memory grows

Getting Started

  1. Start simple: Store task summaries and outcomes
  2. Add prompts: Save LLM interactions that worked well
  3. Track mistakes: Record failed approaches with context
  4. Extract skills: Identify patterns across multiple successes
  5. Improve retrieval: Better ranking and context matching

The goal: Make your agent smarter with every task it handles.


References

For advanced topics like novelty detection, zero‑shot fallback, consolidation, and corrective feedback, see the companion article: Novelty, Zero‑Shot, and Reflexion: From Episodes to Skills.


7. Discussion and Practical Notes

  • Zero-shot is not a failure mode but an acquisition mode—instrument it.
  • Consolidations must be small and precise; noisy dumps sabotage future retrieval.
  • Labeling failures pays compound interest via corrective edges and penalties.
  • Use domain/task filters early to reduce lexical confusions ("bow" vs "bow").

8. Conclusion

Graph-RAG memory operationalizes novelty, learning-by-mistake, and consolidation into a unified system. By structuring episodes, reflections, and skills as a graph augmented with vector search—and by explicitly modeling corrective feedback—agents steadily convert zero-shot improvisations into reusable knowledge. The approach yields measurable benefits: lower false-hit rates, higher retrieval precision, and decreasing reliance on zero-shot over time.


References

[1] Survey on LLM-based Autonomous Agents (Aug 2023). “LLMs as planners; challenges for domain-specific planning.” arXiv:2308.11432. https://arxiv.org/abs/2308.11432

[2] LangChain Blog (2024). “Memory for Agents.” https://blog.langchain.dev/memory-for-agents/ — and LangChain Docs: Memory Overview. https://python.langchain.com/docs/modules/memory/

[3] Oudeyer, P.-Y. et al. (2016). Intrinsic motivation, curiosity, and learning: Theory and applications. Progress in Brain Research, 229, 257–284. https://doi.org/10.1016/bs.pbr.2016.05.005

[4] Park et al. (2023). Generative Agents: Interactive Simulacra of Human Behavior. arXiv:2304.03442. https://arxiv.org/abs/2304.03442

[5] LangGraph (2024). Long-term memory and semantic search. https://blog.langchain.dev/langgraph-memory/ and https://langchain-ai.github.io/langgraph/concepts/memory/

[6] Shinn et al. (2023). Reflexion: Language Agents with Verbal Reinforcement Learning. arXiv:2303.11366. https://arxiv.org/abs/2303.11366

[7] Barnett, T. (2025). The Importance of Being Erroneous: Are AI Mistakes a Feature, Not a Bug? Jackson Lewis P.C. https://www.jacksonlewis.com/insights/importance-being-erroneous-are-ai-mistakes-feature-not-bug