Jeff Hawkins: Memory Exists to Predict

Q: Who is Jeff Hawkins?

Jeff Hawkins (b. 1957) built the PalmPilot and Treo at Palm Computing and Handspring, then turned to brain theory — founding the Redwood Neuroscience Institute (2002) and Numenta (2005). He is the author of On Intelligence (2004) and A Thousand Brains (2021) and the originator of Hierarchical Temporal Memory.

Q: What is the memory-prediction framework?

It is Hawkins's theory, set out in On Intelligence (2004), that the core operation of the cortex is prediction: the brain is a memory system that constantly predicts its next input from stored sequences, and successful prediction is the test of understanding.

Q: What is Hierarchical Temporal Memory (HTM)?

HTM is a biologically constrained model of the neocortex developed at Numenta. It learns streaming data unsupervised through three core properties: sparse distributed representations, sequence memory, and continual (online) learning — and it naturally supports prediction and anomaly detection.

Q: What are sparse distributed representations (SDRs)?

SDRs encode information as high-dimensional binary vectors in which only a small fraction of bits (around 2%) are active at once. They are robust to noise, have very high capacity, and let overlap between two SDRs represent semantic similarity (Ahmad & Hawkins, 2015).

Q: What is the Thousand Brains Theory?

From Hawkins's 2021 book A Thousand Brains, it proposes that the neocortex is not one model of the world but roughly 150,000 cortical columns, each building its own model using reference frames; their votes combine into a single perception.

Here is the idea, in one line: your brain does not store the world so it can play it back — it stores the world so it can predict it. Jeff Hawkins has spent two decades arguing that prediction is the core operation of the cortex, that the cortex is essentially a memory system, and that getting this right is one proposed route to more robust machine intelligence. It is a different starting point from almost everyone else who built a memory into a network — and it came from the man who built the PalmPilot.

TL;DR
Hawkins's memory-prediction framework (On Intelligence, 2004): the cortex is a memory system whose core job is prediction — it constantly anticipates its next input from stored sequences.
Hierarchical Temporal Memory (HTM) builds this on three pillars: sparse distributed representations, sequence memory, and continual learning.
SDRs — high-dimensional binary vectors with only ~2% of bits active — are robust, high-capacity, and encode similarity through overlap.
A Thousand Brains (2021): not one model of the world but ~150,000 cortical columns, each using reference frames, combined by voting.
A distinct, neuroscience-first path — influential in ideas, not the benchmark-dominant paradigm.

The bet: prediction is the whole game

Most founders of network memory came from physics or engineering and asked a mechanical question — how do you store a pattern in weights and get it back? Hawkins, an engineer turned neuroscientist, asked a functional one: what is all that memory for?

In 2004, with Sandra Blakeslee, he answered in On Intelligence. The neocortex is a memory system, and its core operation is prediction: it continuously forecasts its next input from stored sequences, and notices the instant a forecast fails.

The memory-prediction framework: the cortex constantly predicts its next input from stored sequences; to understand something is to be able to predict it.

You run this loop constantly — the next note of a familiar song, the next word of a sentence, the feel of the next stair. When prediction holds, the world feels continuous; when it breaks, attention snaps to the surprise. Hawkins's claim is that this is not something intelligence does with memory. It is what memory is.

HTM: the theory, made buildable

To make the idea concrete, Hawkins and Numenta built Hierarchical Temporal Memory (HTM) — a model constrained by the biology of the neocortex. Three properties carry it.

Sparse distributed representations. Everything in HTM is an SDR: a long binary vector with only ~2% of bits active, mirroring how few cortical neurons fire at once.

A sparse distributed representation is a high-dimensional binary vector with very few active bits; which bits are on is the meaning.

Sparsity is not a detail. As Ahmad and Hawkins set out in Properties of Sparse Distributed Representations (2015), it buys robustness (drop a few bits, keep the meaning), enormous capacity, and similarity-as-overlap — two SDRs that share active bits are, by construction, related. In an SDR, similarity is part of the representation rather than something computed afterward by a separate distance metric.

Sequence memory. HTM learns sequences over time, so prediction falls out for free — see part of a sequence, predict the rest, and treat a violated prediction as an anomaly. The mechanism is set out in Hawkins and Ahmad's Why Neurons Have Thousands of Synapses (Frontiers in Neural Circuits, 2016), a theory of sequence memory in the neocortex.

Continual learning. HTM learns from the stream as it arrives, with no separate training phase and without wiping what it knew when the data shifts — demonstrated for streaming data in Cui, Ahmad & Hawkins, 2016. That is the behavior an always-on system actually needs.

A Thousand Brains

In 2021, A Thousand Brains sharpened the picture with a surprise: the cortex does not hold one model of the world but thousands. The Thousand Brains Theory proposes that the cortex is built from many near-identical cortical columns — on the order of 150,000 — each learning a full model of objects using reference frames — coordinate systems anchored to the thing being modeled, so a column can predict features relative to position and movement. Nothing is in charge; the columns vote, and the consensus is your single perception. Memory, in this view, is redundant and parallel on purpose.

Where it actually stands

Be straight about it: HTM did not win the benchmark race. The field scaled dense networks and attention, and Numenta's neuroscience-first program has stayed off to the side. But three of its bets have aged well — sparse representations, continual learning, and prediction as a first-class operation are all live concerns again. Hawkins's contribution is less a deployable product than a theory of what memory is for, plus a set of properties worth borrowing; the open-source Thousand Brains Project (2024) carries it forward.

What a memory builder should take from it

Three things transfer directly.

Judge a memory by what it predicts. Recall is table stakes; the harder, more useful test is whether a memory lets the system anticipate what comes next. A memory that predicts well has captured the structure of its domain; one that only retrieves has merely filed things away.

Store sequences, not just snapshots, and weigh sparse codes. Temporal structure is where anticipation comes from. And sparse, high-dimensional representations trade dense-vector convenience for robustness, capacity, and a clean overlap-equals-similarity property worth considering in a memory layer.

Learn from the stream. Continual, online learning — absorb new experience as it arrives, without a retraining cycle and without erasing the past — is exactly what a long-lived agent memory needs.

Hawkins belongs with the other pioneers because he asked the question their mechanisms answer in pieces: not how to store a memory, but what a memory is for. His answer — to predict — is a good compass for building one.

Common questions

Who is Jeff Hawkins?

The creator of the PalmPilot who turned to brain theory — founder of the Redwood Neuroscience Institute (2002) and Numenta (2005), author of On Intelligence and A Thousand Brains, originator of HTM.

What is the memory-prediction framework?

Hawkins's theory (2004) that the cortex's core operation is predicting its next input from stored sequences; understanding is successful prediction.

What is Hierarchical Temporal Memory (HTM)?

A biologically constrained model of the neocortex — sparse distributed representations, sequence memory, and continual learning — that learns streaming data unsupervised and predicts.

What are sparse distributed representations (SDRs)?

High-dimensional binary vectors with ~2% of bits active — robust, high-capacity, with overlap encoding similarity (Ahmad & Hawkins, 2015).

What is the Thousand Brains Theory?

The idea (2021) that the cortex is ~150,000 cortical columns, each modeling the world with reference frames, combined by voting.

Sources

Hopfield Networks: The Memory Model That Became Attention — energy-based associative memory
Geoffrey Hinton: The Boltzmann Machine and Generative Memory — learned, generative memory
Bernard Widrow: From the LMS Rule to Cognitive Memory — the learning-rule lineage
Schema Formation: How Memory Builds Reusable Structure — how episodes consolidate into reusable structure, and what that means for agent memory
Building Memory That Scales — sparse, sequential, continually-learning memory as an engineering problem
AI Agent Memory: The 2026 Landscape — where these ideas sit today

— Mnemoverse is a persistent-memory API for AI agents. Free key: console.mnemoverse.com · Docs: Getting Started

Jeff Hawkins: Memory Exists to Predict ​

The bet: prediction is the whole game ​

HTM: the theory, made buildable ​

A Thousand Brains ​

Where it actually stands ​

What a memory builder should take from it ​

Common questions ​

Who is Jeff Hawkins? ​

What is the memory-prediction framework? ​

What is Hierarchical Temporal Memory (HTM)? ​

What are sparse distributed representations (SDRs)? ​

What is the Thousand Brains Theory? ​

Sources ​

Related ​