Graph-RAG Memory: Agent Experience Storage

TL;DR: Store what your agent learns — successful prompts, context that worked, mistakes to avoid. Retrieve relevant experience to make future tasks faster and more reliable.

Why Agent Memory Matters

Every time your agent tackles a task, it generates valuable experience:

Prompts that worked for specific contexts
Tool combinations that solved similar problems
Mistakes and dead ends to avoid repeating
Context patterns that led to success

Without memory, agents start from zero every time. With Graph-RAG memory, they get smarter with experience.

What We Store (Agent Experience Types)

1. Episodes: "What Happened"

Store compact records of agent sessions:

Task: "Analyze quarterly sales data"
Context: "CSV with 10K rows, need trends and outliers"
Approach: ["Load pandas", "Plot trends", "Statistical analysis"]  
Outcome: "Success - found 3 key insights"
Tools Used: ["python", "matplotlib", "pandas"]
Duration: "2 minutes"

2. Prompts: "What We Said to the LLM"

Save effective prompting strategies:

Situation: "Code review request"
Effective Prompt: "Review this code for: 1) bugs, 2) performance, 3) readability. 
                   Prioritize critical issues. Use bullet points."
Context Needed: ["file_type", "language", "complexity"]
Success Rate: 85%

3. Mistakes: "What Didn't Work"

Learn from failures:

Failed Approach: "Asked LLM to write entire 500-line script in one shot"
Problem: "Generated buggy, untestable code"
Better Approach: "Break into functions, implement one at a time"
Context: "Large coding tasks"

4. Skills: "Reusable Patterns"

Extract repeatable strategies:

Skill: "Data Analysis Workflow"  
Steps: ["Validate data", "Explore distributions", "Test hypotheses", "Visualize results"]
When To Use: "Numerical datasets > 1K rows"
Success Rate: 92%
Prerequisites: ["pandas available", "clean data format"]

Simple Implementation

Storage Structure

Agent Memory Store:
├── Episodes (what happened)
├── Prompts (what we said)  
├── Mistakes (what failed)
└── Skills (what works reliably)

Connections:
- Episode → derives → Prompt (this episode produced this useful prompt)
- Mistake → corrects → Episode (this mistake helps avoid this situation)  
- Episodes → generalize → Skill (multiple episodes become a pattern)

Basic Retrieval

Query comes in: "Help me analyze customer data"
Search memory: Find similar episodes, relevant prompts, applicable skills
Rank results: Recent + successful + similar context = higher score
Compose response: Use retrieved experience to inform approach
Store new experience: After task completion, save what worked/failed

Practical Example

python

# Simplified retrieval logic
def get_relevant_experience(task_description, context):
    # Find similar episodes
    episodes = vector_search(task_description, collection="episodes", k=5)
    
    # Get connected prompts and skills
    related = graph_expand(episodes, hops=1, types=["prompts", "skills"])
    
    # Rank by relevance + success rate + recency  
    ranked = rerank_by_utility(related, context)
    
    return ranked[:3]  # Top 3 most relevant experiences

# Usage
experience = get_relevant_experience(
    "Create a sales dashboard", 
    context={"data_type": "CSV", "size": "large"}
)
# Returns: [similar_episode, effective_prompt, proven_skill]

Key Patterns That Work

1. Progressive Skill Building

Start with Episodes (raw experience)
Extract Prompts (what worked in LLM interaction)
Identify Mistakes (what to avoid)
Generalize to Skills (reusable patterns)

2. Context-Aware Retrieval

Match task type ("data analysis", "code review", "research")
Consider tools available (Python, databases, APIs)
Factor in complexity level (simple, medium, complex)
Weight recent experience higher

3. Continuous Learning Loop

Task Request → Retrieve Experience → Execute → Store Results → Improve
     ↑                                                            ↓
     ←←←←←←←←←←←← Better Performance Next Time ←←←←←←←←←←←←

Simple Mermaid Diagram

Production Tips

What to Store

Do: Task summaries, effective prompts, tool sequences, outcome quality
Don't: Raw conversation logs, sensitive data, overly specific details

How to Score Experience

Recency: Recent experience weighs more (exponential decay)
Success: Track what actually worked vs. what failed
Similarity: Context matching (task type, domain, complexity)
Utility: How often this experience gets reused

Privacy & Security

Hash or anonymize sensitive context
Store patterns, not raw data
Separate personal vs. shared experience stores
Implement access controls for team memories

Why This Works

Agents learn from experience instead of starting fresh each time
Successful strategies get reused automatically
Mistakes get avoided through negative examples
Complex skills emerge from repeated successful patterns
Performance improves over time as memory grows

Getting Started

Start simple: Store task summaries and outcomes
Add prompts: Save LLM interactions that worked well
Track mistakes: Record failed approaches with context
Extract skills: Identify patterns across multiple successes
Improve retrieval: Better ranking and context matching

The goal: Make your agent smarter with every task it handles.

References

For advanced topics like novelty detection, zero‑shot fallback, consolidation, and corrective feedback, see the companion article: Novelty, Zero‑Shot, and Reflexion: From Episodes to Skills.

7. Discussion and Practical Notes

Zero-shot is not a failure mode but an acquisition mode—instrument it.
Consolidations must be small and precise; noisy dumps sabotage future retrieval.
Labeling failures pays compound interest via corrective edges and penalties.
Use domain/task filters early to reduce lexical confusions ("bow" vs "bow").

8. Conclusion

Graph-RAG memory operationalizes novelty, learning-by-mistake, and consolidation into a unified system. By structuring episodes, reflections, and skills as a graph augmented with vector search—and by explicitly modeling corrective feedback—agents steadily convert zero-shot improvisations into reusable knowledge. The approach yields measurable benefits: lower false-hit rates, higher retrieval precision, and decreasing reliance on zero-shot over time.

References

[1] Survey on LLM-based Autonomous Agents (Aug 2023). “LLMs as planners; challenges for domain-specific planning.” arXiv:2308.11432. https://arxiv.org/abs/2308.11432

[2] LangChain Blog (2024). “Memory for Agents.” https://blog.langchain.dev/memory-for-agents/ — and LangChain Docs: Memory Overview. https://python.langchain.com/docs/modules/memory/

[3] Oudeyer, P.-Y. et al. (2016). Intrinsic motivation, curiosity, and learning: Theory and applications. Progress in Brain Research, 229, 257–284. https://doi.org/10.1016/bs.pbr.2016.05.005

[4] Park et al. (2023). Generative Agents: Interactive Simulacra of Human Behavior. arXiv:2304.03442. https://arxiv.org/abs/2304.03442

[5] LangGraph (2024). Long-term memory and semantic search. https://blog.langchain.dev/langgraph-memory/ and https://langchain-ai.github.io/langgraph/concepts/memory/

[6] Shinn et al. (2023). Reflexion: Language Agents with Verbal Reinforcement Learning. arXiv:2303.11366. https://arxiv.org/abs/2303.11366

[7] Barnett, T. (2025). The Importance of Being Erroneous: Are AI Mistakes a Feature, Not a Bug? Jackson Lewis P.C. https://www.jacksonlewis.com/insights/importance-being-erroneous-are-ai-mistakes-feature-not-bug

Graph-RAG Memory: Agent Experience Storage ​

Why Agent Memory Matters ​

What We Store (Agent Experience Types) ​

1. Episodes: "What Happened" ​

2. Prompts: "What We Said to the LLM" ​

3. Mistakes: "What Didn't Work" ​

4. Skills: "Reusable Patterns" ​

Simple Implementation ​

Storage Structure ​

Basic Retrieval ​

Practical Example ​

Key Patterns That Work ​

1. Progressive Skill Building ​

2. Context-Aware Retrieval ​

3. Continuous Learning Loop ​

Simple Mermaid Diagram ​

Production Tips ​

What to Store ​

How to Score Experience ​

Privacy & Security ​

Why This Works ​

Getting Started ​

References ​

7. Discussion and Practical Notes ​

8. Conclusion ​

References ​

Graph-RAG Memory: Agent Experience Storage

Why Agent Memory Matters

What We Store (Agent Experience Types)

1. Episodes: "What Happened"

2. Prompts: "What We Said to the LLM"

3. Mistakes: "What Didn't Work"

4. Skills: "Reusable Patterns"

Simple Implementation

Storage Structure

Basic Retrieval

Practical Example

Key Patterns That Work

1. Progressive Skill Building

2. Context-Aware Retrieval

3. Continuous Learning Loop

Simple Mermaid Diagram

Production Tips

What to Store

How to Score Experience

Privacy & Security

Why This Works

Getting Started

References

7. Discussion and Practical Notes

8. Conclusion

References