Skip to content

Rescorla-Wagner for agent memory: why retrieval should learn from outcomes

TL;DR

  • Rescorla-Wagner is a classic 1972 prediction-error learning rule: update strength from the gap between expected and actual outcome.
  • In Mnemoverse that idea drives the feedback step — outcome feedback nudges a memory's valence after recall (/api/overview).
  • Valence changes ranking on later queries, so memories that helped move up and memories that misled get suppressed.
  • This is the feedback half of learning; concept associations are the Hebbian half — see the companion article.

Static retrieval never learns from what happened next. An embedding match that once worked can fail repeatedly with no mechanism to lower its future rank. Most retrieval systems stop at recall: they find what looks relevant, return it, and end the loop — leaving out the one signal that matters most, what happened after the memory was used. A memory that led to a good answer should not be treated the same on the next query as one that caused a failure — yet plain retrieval has no way to tell them apart. The Rescorla-Wagner rule supplies exactly that missing mechanism.

Rescorla-Wagner is an error-driven learning rule: it changes strength in proportion to prediction error — the gap between what was expected and what actually happened. Bigger surprise, bigger correction.

What Rescorla-Wagner actually is

Rescorla and Wagner published their model in 1972 to explain conditioning results earlier theories could not. The core insight: learning happens when events violate expectation, and the size of the update equals the size of the surprise. The same principle appears in machine learning as the Widrow-Hoff delta rule — in both, the update is proportional to the error: large errors produce large adjustments, near-zero errors produce almost none.

That idea fits agent memory because outcome feedback is often the only clean signal available after retrieval. You may not know the full internal reason an answer failed, but you usually do know whether using a recalled memory helped or hurt.

How Mnemoverse uses it: the feedback step

In Mnemoverse the rule is used narrowly and on purpose. It is not the model for the whole memory system — it is the lineage of the feedback step that updates a memory's valence after a recalled memory proves helpful or unhelpful. The API overview describes this as "a prediction-error feedback signal (Rescorla-Wagner-style) updates each memory's valence so memories that led to good outcomes rank higher next time," a learning layer added on top of retrieval (/api/overview).

The documented lifecycle is write → read → feedback → consolidate, and feedback is the learning step (/api/overview). When you report an outcome, Mnemoverse performs two documented updates:

  • a Rescorla-Wagner-style prediction-error update to memory valence;
  • strengthening of Hebbian concept associations (and co-activation links between the query and result concepts).

That split separates two different questions — did this memory help? (outcome feedback) and which concepts should become more strongly linked? (association structure). Keep one clean boundary in mind: Rescorla-Wagner answers how a memory is updated after an outcome; Hebbian learning answers which concepts associate. Mnemoverse uses both, for different jobs — the association half is covered in Hebbian memory for AI agents. So "Mnemoverse learns from feedback" is true; "Rescorla-Wagner is the core model for the whole memory graph" would be wrong.

What valence does in recall

Valence is the outcome-polarity signal attached to a memory; updating it changes future ranking during recall. The system does not rewrite embeddings or edit text — it changes a scalar that influences future retrieval probability. The docs state the behavior plainly (/api/overview): memories that lead to good outcomes rank higher in future queries, memories that consistently fail get suppressed, and the system improves with use. Valence is observable as avg_valence in stats (/api/getting-started, /api/python-sdk).

Because valence rides on top of semantic similarity, a moderately-similar but historically-reliable memory can outrank a closer match that has consistently disappointed. Similarity gets a memory into the candidate set; outcome feedback changes how it is treated next time.

The feedback call

The outcome-feedback API is direct: feedback() takes an outcome float from -1.0 (failure) to +1.0 (success) for a set of recalled atom_ids (/api/python-sdk).

python
from mnemoverse import MnemoClient
client = MnemoClient(api_key="mk_live_YOUR_KEY")

# Recall, then act on a memory…
memories = client.read("how to handle timeouts?")

# …then report what happened. A Rescorla-Wagner-style prediction-error update
# nudges each memory's valence by the outcome, so good memories rank higher next time.
client.feedback(
    atom_ids=[item.atom_id for item in memories.items],
    outcome=1.0,            # -1.0 (failure) … +1.0 (success)
    query_concepts=memories.query_concepts,
)

atom_ids says which recalled memories were involved; outcome reports whether they helped; query_concepts preserves the concept context from the read. One call performs both halves — the prediction-error valence update and the Hebbian association strengthening (which also creates co-activation links between the query and result concepts). The full write-read-feedback walkthrough is in Getting Started under "Report Outcomes."

Rescorla-Wagner vs vector database retrieval

MnemoverseVector database
RetrievalSemantic similarity (the floor)Semantic similarity (the whole system)
Learns from outcomesYes — prediction-error update to valenceNo — ranking is static unless vectors are recomputed
Effect over timeRecall re-ranks by what proved usefulSame query, same result

A vector database returns nearest neighbors by a fixed distance. If two memories are similarly relevant but one repeatedly helps and the other repeatedly misleads, static retrieval has no native reason to prefer the better one next time. Outcome-aware memory does. Retrieval remains the performance floor; feedback supplies the learning signal that raises it. That is also why feedback belongs in production, not just evaluation — skip it and you skip the part where the memory learns from experience.

What this does not mean

A precise claim is a stronger claim. This article does not say Mnemoverse implements the full competitive multi-cue Rescorla-Wagner model — no cue competition, blocking, or overshadowing. The public description supports a narrower statement: a per-memory, single-cue, delta-rule-style update on valence from the reported outcome. That narrower claim is enough to explain why outcome feedback changes ranking over time, and why memories improve or degrade with actual use. For benchmark numbers, link the benchmarks page rather than quoting scores out of context; for why memory needs more than plain retrieval at scale, see building memory that scales.

Common questions

What is Rescorla-Wagner in agent memory?

Rescorla-Wagner is an error-driven learning rule from 1972: update strength in proportion to prediction error, the gap between expected and actual outcome. In Mnemoverse that idea drives the outcome-feedback step — after recall, a memory's valence is nudged by the reported outcome, so useful memories rank higher later.

How does prediction-error feedback work in Mnemoverse?

After a memory is recalled and acted on, you report an outcome from -1.0 (failure) to +1.0 (success) via feedback(). That outcome acts as a prediction-error signal that moves each recalled memory's valence, which changes future ranking.

Does Rescorla-Wagner control the associations in Mnemoverse?

No. Rescorla-Wagner is the lineage for the valence/feedback update, not the concept-association model. The association half is Hebbian, covered in the companion article.

What does valence do in recall?

Valence is the outcome-polarity signal attached to a memory. Updating it re-ranks future recall: memories tied to good outcomes rank higher, memories that consistently fail get suppressed. It is observable as avg_valence in stats.

How is this different from a vector database?

A vector database returns what is similar to the query, the same way every time; retrieval itself does not learn from what happened next. Mnemoverse adds a feedback step after recall, so outcome data updates valence and improves future ranking with use.

The point is modest and practical: if your agent can report whether a recalled memory helped, the memory system should use that signal — that is what turns recall into a system that learns.

— Edward Izgorodin · Last updated 2026-06-18