Working memory is not settled science — and that is exactly why AI engineers should care
If you build AI agents, you already face a working-memory problem.
A model has a bounded active set. Some information is live in the context window. Some sits outside it. The system must decide what to keep active, what to compress, and what to retrieve back in. That is close enough to the psychology of working memory to be useful, but only if the analogy stays honest.
Working memory is the capacity-limited system that holds and manipulates the small amount of information currently in use for ongoing cognition. It is the active workspace, not the long-term store.
The basic definition is not the interesting part. The interesting part is that the field still disagrees about three core questions:
- what working memory is made of
- how much it can hold
- whether the real bottleneck is storage or control
Those same questions recur, almost claim for claim, when you design an agent's active working set. An engineer who treats them as settled risks repeating decades of psychological controversy in code.
TL;DR
- Working memory is a bounded active workspace for cognition, but psychology still debates its structure, capacity, and the role of attention.
- The main structural accounts differ sharply: Baddeley and Hitch's multicomponent model, Cowan's embedded-processes view, and Oberauer's concentric states model.
- Capacity is not one settled number. Miller's 7±2 depends on chunking, while Cowan argues for about 4 when chunking and rehearsal are blocked.
- For AI agents, the lesson is less "make the window bigger" than "manage the active set better": retrieval gating, compression, and context budgeting matter as much as raw context size.
Working memory vs short-term memory
Working memory and short-term memory are close, but not identical.
Short-term memory usually means temporary passive storage. Working memory adds active use. You hold information in mind and do something with it. That difference matters because some theories treat the bottleneck as storage, while others treat it as control. Engle (2002) is explicit that working memory capacity is separable from short-term memory.
The distinction maps well to AI systems. A large transcript buffer is not the same thing as an active working set. What matters is not only what the system can store somewhere, but what it can keep live for the next operation.
The working memory model debate: Baddeley, Cowan, and Oberauer
The clean textbook story says there is a model of working memory. The literature says otherwise. There are at least three live ones, and they disagree about something basic: whether working memory is a structure or a state.
Baddeley working memory model: components and control
The most influential account is the multicomponent model from Baddeley and Hitch (1974).
Central executive is the attentional control system in Baddeley and Hitch's model that coordinates the subsystems of working memory.
In this view, working memory is not one box. It includes a central executive plus two specialized subsystems: the phonological loop for verbal and acoustic material, and the visuospatial sketchpad for visual and spatial material. The component names were standardized later, but that is the modern form of the model.
Baddeley later added a fourth component. In Baddeley (2000), the episodic buffer is defined as "a limited capacity system that provides temporary storage of information held in a multimodal code, which is capable of binding information from the subsidiary systems, and from long-term memory, into a unitary episodic representation."
The model lasts because it separates storage from control and treats working memory as structured. For AI engineers the analogy is direct: different live representations can play different roles, while a controller decides what enters the active set.
But this is not the only live view.
Cowan embedded-processes: working memory as activated long-term memory
Cowan argues that working memory is not a set of separate boxes at all.
Focus of attention is the subset of currently activated information that is most available for ongoing cognitive processing.
In the embedded-processes view, working memory is activated long-term memory plus a focus of attention. The model treats working memory as a state, not a box. Information does not move into a dedicated store; it becomes more highly activated within long-term memory and enters the focus of attention when needed. The citable capacity argument appears in Cowan (2001), but the structural idea is the basis of his broader model.
For AI, this shifts the engineering picture. Instead of asking how many separate buffers an agent needs, you ask which parts of a larger memory base should be activated and kept available right now. That is much closer to retrieval-augmented systems than to a simple scratchpad metaphor.
Oberauer concentric states: one selected item, a few directly accessible
Oberauer (2002) sharpens the debate by splitting active memory into three concentric states: the activated part of long-term memory, a capacity-limited region of direct access, and the focus of attention.
The sharp point is the innermost region. In Oberauer's account, the focus of attention holds "the one chunk that is actually selected as the object of the next cognitive operation."
That single-item claim matters, because the literature is split here. Cowan describes a broader focus of about four items. Other work supports a narrow single-object focus. Oberauer adopts the narrow version. The disagreement is not cosmetic: it asks whether the live core is a few jointly active elements or a broader live region plus one currently selected object. That difference changes how you would design planners, scratchpads, and retrieval policies.
How many items can working memory hold? Miller vs Cowan
Capacity is where most people think they know the answer. It is also where the literature is least forgiving of simplification.
Miller's 7±2 and the role of chunking
Miller (1956) argued that the span of immediate memory is about seven plus or minus two items. But he did not present this as a hard law. He opened the paper by writing, "I have been persecuted by an integer," and later suggested the recurrence of seven might be "only a pernicious, Pythagorean coincidence."
The real contribution is subtler. Miller distinguished limits on information from limits on items. Immediate memory is limited by the number of items, while chunking changes how much information each item carries.
Chunking is the recoding of multiple elements into a larger meaningful unit, so more information fits within the same item limit.
That idea should sound familiar. Chunking is the cognitive version of compression and summarization. You do not remove the limit. You pack more meaning into each active unit.
Cowan's about 4 when chunking is blocked
Cowan (2001) does not simply reject Miller. He reframes the estimate.
When chunking and rehearsal are blocked, Cowan argues that pure working-memory capacity is about four chunks, in the range of three to five. On this reading, Miller's seven includes gains from chunking and rehearsal; strip those supports away, and the underlying capacity is closer to four.
This is why the honest answer is neither "working memory holds seven items" nor "working memory holds four items." Miller's 7±2 and Cowan's about 4 refer to different conditions and different interpretations of what counts as an item.
For AI, the analogy is direct. A bigger context window resembles Miller's more generous estimate only if the system can effectively recode, compress, and group information. If it cannot, the useful active set behaves much more like Cowan's smaller estimate. The engineering question becomes: do you trust a large in-window state, or do you keep the live set small and retrieve from a longer-term store on demand?
Slots vs resources in working memory
The second capacity debate is not about seven versus four. It is about how failure happens.
Slots: a hard limit with all-or-none storage
Zhang and Luck (2008) argue for discrete fixed-resolution representations in visual working memory. The system holds a small fixed number of items, about three to four, each at full resolution. Beyond that, an item is either stored or not. Their phrasing is clear: observers "store a high-resolution representation of a subset of the objects and retain no information about the others."
This is a cliff model. Stay within capacity and the item is intact. Go beyond it and some items drop out entirely.
Resources: graceful degradation, not a cliff
Bays and Husain (2008) argue for the opposite picture. Visual working memory is a limited resource shared dynamically across items. Add more items and precision declines continuously. They report "no evidence for any discontinuity in the region of four items."
This is a graceful-degradation model. Nothing magical happens at a single boundary. Performance worsens as the resource is spread thinner.
The field has not settled the dispute, and both sides have strong evidence. That unresolved status is the useful part.
For AI, slots versus resources maps onto two different failure modes. A fixed token budget behaves like slots: over the limit, content is excluded. But attention over a long context often behaves more like a shared resource, where recall and precision degrade as more material competes for relevance. That is close to the practical intuition behind long-context failures and "lost in the middle" behavior.
The design implication is straightforward. If your system fails like slots, you need hard context budgeting. If it fails like resources, you also need ordering, weighting, and selective refresh, because performance decays before the hard limit.
Is working memory storage or attention? Engle's executive attention view
The deepest debate is whether working memory capacity is really about storage at all.
Executive attention is the ability to maintain relevant information in an active, retrievable state while suppressing distraction and interference.
Engle (2002) makes the claim directly: "WM capacity is not directly about memory — it is about using attention to maintain or suppress information." He also writes that it is "not about individual differences in how many items can be stored per se but is about differences in the ability to control attention to maintain information in an active, quickly retrievable state."
That is a major shift. The bottleneck is not how much fits. The bottleneck is whether the system can keep the right thing active and the wrong thing suppressed. Engle ties this to complex span measures and to attention tasks such as antisaccade and Stroop, and argues that working memory capacity is a major component of general fluid intelligence.
One caution matters for AI readers. Not every task labeled "working memory" measures the same construct. The n-back task, often used as a working memory measure, correlates only weakly with complex span (rs ≤ .22; Kane et al., 2007). The two appear to tap different abilities, and Engle's claim centers on the complex span construct.
This is the cleanest bridge to agent design. In practice, many failures that look like context-window failures are really gating failures. The system had access to the right information somewhere. It did not bring it into the active set at the right time, or it failed to suppress distractors already in context. That is closer to executive attention than to storage size.
Working memory and the AI context window
The brain is not a transformer, and an agent memory stack is not a cognitive model. The most productive stance is to treat these analogies as design heuristics, not as claims that large language models implement any specific psychological architecture. Held that way, the analogy is strong enough to clarify engineering choices.
A bounded context window is the obvious analogue of a capacity-limited active store. The richer lessons come from the debates:
- Baddeley's model suggests active memory may not be one undifferentiated pool. Different live representations can serve different functions, while a controller coordinates them.
- Cowan's model suggests the active set is best understood as selected activation over a larger memory base, not as a separate container.
- Oberauer's model suggests a distinction between what is broadly accessible and what is the one object of the next operation.
- Miller's chunking says compression is not optional. It is how you keep useful structure inside a fixed budget.
- Cowan's lower estimate says that once chunking and rehearsal are stripped away, the genuinely usable active set may be smaller than it looks.
- Zhang and Luck versus Bays and Husain ask whether overload causes abrupt dropping or gradual blurring.
- Engle says the hardest part may be control: what gets pulled into context, what stays there, and what gets suppressed.
There is also a longer-term half of the picture. Working memory sits on top of long-term memory, and the question of what gets consolidated into durable structure is a separate problem, covered in schema formation and consolidation. The associative side of how a partial cue retrieves a stored pattern — the mechanism that later became attention — is the subject of Hopfield networks.
That is why evaluating agent memory cannot be reduced to context length or retrieval recall alone. It has to ask how well a system manages a bounded active set over time. The same point connects to building memory that scales, where capacity and structure become explicit engineering constraints rather than abstract theory, and to the broader 2026 AI agent memory landscape.
The practical conclusion is modest but important. The active working set of an agent is not just "whatever fits in the window." It is the subset of information the system can keep relevant, retrievable, and interference-resistant for the next reasoning step.
Common questions
What is working memory?
Working memory is the capacity-limited system that holds and manipulates the small amount of information currently in use for reasoning, comprehension, and problem-solving. It is an active workspace, not the long-term store. In psychology, it is related to short-term memory but not identical to it, because working memory includes control and manipulation, not just passive holding.
What is the Baddeley working memory model?
The Baddeley and Hitch model treats working memory as a multicomponent system rather than a single short-term store. It includes a central executive that directs attention plus specialized subsystems for verbal and visuospatial material. Baddeley later added the episodic buffer, a limited-capacity multimodal store that binds information from those subsystems and long-term memory into a unitary episodic representation.
How many items can working memory hold?
The answer depends on the theory and the task. Miller's 1956 paper argued that immediate memory spans about seven plus or minus two chunks, especially when chunking and recoding help. Cowan's 2001 review argued that when chunking and rehearsal are blocked, the pure capacity is closer to about four chunks, in the range of three to five. The field treats these as competing but partly reconcilable estimates, not a single settled constant.
What is the difference between working memory and short-term memory?
Short-term memory usually refers to temporary passive storage. Working memory refers to temporary storage plus the active manipulation and control of information for ongoing cognition. Engle's account goes further and argues that working memory capacity is separable from short-term memory because it reflects executive attention, not just how much can be held.
How is working memory like an AI context window?
The link is an analogy, not an identity. Working memory is a bounded active set for the mind, while an AI agent's context window and live working set are the bounded active set for the model. In both cases, the system must decide what stays active, what gets compressed, and what should be retrieved from longer-term storage when needed.
Is working memory mainly storage or attention?
That is one of the central debates. Some theories emphasize storage structure and capacity, while Engle argues that the bottleneck is better understood as executive or controlled attention: the ability to maintain relevant information and suppress distraction. For AI engineering, that maps closely to retrieval gating and context selection rather than window size alone.
Sources
Structure
- Baddeley, A.D., & Hitch, G.J. (1974). "Working Memory." In G.H. Bower (Ed.), The Psychology of Learning and Motivation, Vol. 8, pp. 47-89. Academic Press. (Book chapter, no DOI.)
- Baddeley, A.D. (2000). "The episodic buffer: a new component of working memory?" Trends in Cognitive Sciences 4(11):417-423. doi:10.1016/S1364-6613(00)01538-2
- Cowan, N. (2001). "The magical number 4 in short-term memory: a reconsideration of mental storage capacity." Behavioral and Brain Sciences 24(1):87-114. doi:10.1017/S0140525X01003922
- Oberauer, K. (2002). "Access to information in working memory: exploring the focus of attention." Journal of Experimental Psychology: Learning, Memory, and Cognition 28(3):411-421. doi:10.1037/0278-7393.28.3.411
Capacity
- Miller, G.A. (1956). "The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information." Psychological Review 63(2):81-97. doi:10.1037/h0043158
- Zhang, W., & Luck, S.J. (2008). "Discrete fixed-resolution representations in visual working memory." Nature 453:233-235. doi:10.1038/nature06860
- Bays, P.M., & Husain, M. (2008). "Dynamic shifts of limited working memory resources in human vision." Science 321:851-854. doi:10.1126/science.1158023
Control
- Engle, R.W. (2002). "Working Memory Capacity as Executive Attention." Current Directions in Psychological Science 11(1):19-23. doi:10.1111/1467-8721.00160
- Kane, M.J., Conway, A.R.A., Miura, T.K., & Colflesh, G.J.H. (2007). "Working memory, attention control, and the n-back task: A question of construct validity." Journal of Experimental Psychology: Learning, Memory, and Cognition 33(3):615-622. doi:10.1037/0278-7393.33.3.615
Related
- Schema Formation and Memory Consolidation — how the active set becomes durable long-term structure
- Hopfield Networks and Associative Memory — pattern completion, the model that became attention
- Jeff Hawkins: Hierarchical Temporal Memory — memory exists to predict
- Geoffrey Hinton: The Boltzmann Machine — memory that learns and generates
- Bernard Widrow: From Adaptive Filters to Cognitive Memory — the other founder of associative memory
- How to Evaluate AI Agent Memory — measuring what a memory system keeps active and recalls
- AI Agent Memory: The 2026 Landscape — where active context and long-term memory sit among today's approaches
- Building Memory That Scales — capacity, interference, and structure as engineering problems
Mnemoverse is a persistent-memory API for AI agents. The honest connection to this article is narrow: a memory layer manages a bounded active set against a consolidated long-term store, keeping the working set small and relevant while retrieving the rest on demand — the engineering echo of working memory sitting on top of long-term memory, not a claim to reproduce brain mechanisms. By Edward Izgorodin. Free key: console.mnemoverse.com · Docs: Getting Started
