Claude Code has two memories. Yours probably should too.

The part of the leak nobody talked about

If you spent last weekend scrolling the Claude Code source leak, you probably saw the same three takes: the regex that logs when you swear at it, the undercover mode prompt, and the 3,167-line print.ts function everyone is still laughing at.

Here is the part that barely got mentioned and actually matters if you use Claude Code for real work: it doesn't have one memory. It has three. They run on different clocks, live in different places, and do completely different jobs. And once you see the split, a lot of Claude Code's behavior — the things that feel magical, and the things that feel infuriating — stops being mysterious.

This post walks through all three systems, the exact trigger mechanics for each, why the design is interesting, and what to copy if you're building your own agent. If you just want the one-line summary: fast scratchpad in context, durable topic files on disk, consolidation pass in the background. That's it. Everything else is rate limits and locks.

Memory #1: the scratchpad that survives compaction

The first system is called Session Memory. It's the short-lived one, and it's clever in a way that's easy to miss.

Here's the setup from the leaked source, for anyone who wants the exact numbers:

Gated behind a feature flag called tengu_session_memory.
12K token hard cap.
Fixed sections — not free-form notes, but structured slots the model writes into.
Fires on a cadence: every 5,000 tokens consumed, and every 3 tool calls since the last write.

What it does is simple and clever. The model is constantly dumping the important bits of the current session — decisions made, constraints the user mentioned, the shape of the plan, the thing you told it not to touch — into this fixed scratchpad. When Claude Code eventually has to compact the conversation (drop older messages to make room), the scratchpad survives.

Session Memory: the scratchpad captures information from older turns before they get evicted by compaction, writing on a cadence of every 5,000 tokens or every 3 tool calls

This is the "why did it still remember I said don't modify this file?" mechanism. The raw turn where you said it is long gone from the context window. The scratchpad entry isn't.

Two details are worth dwelling on because most agent memory systems get them wrong.

Fixed sections, not free-form. The scratchpad isn't "a place to write notes." It's a small form with designated fields — plan, constraints, open questions, recent decisions, whatever the exact schema is. When the model goes to update it, it isn't inventing the structure. It's filling in slots. This matters because free-form memory drifts. Every write is a chance to change the format, change the voice, change what "important" means. Fixed sections anchor the model: whatever the schema is today will be the schema two hours from now.

Write on a cadence, not every turn. Writing memory every turn is the obvious design and it's the wrong one — it's expensive, and most turns don't contain anything worth remembering. Claude Code waits for either 5,000 tokens of context to accumulate OR 3 tool calls, whichever hits first. The token counter catches "the user said a lot of stuff." The tool-call counter catches "the agent did a lot of stuff." Both are good proxies for "something worth capturing probably happened." Neither fires on a single short exchange.

It also explains the opposite frustration — the moment where you mentioned something in passing and the agent just... forgot. If you dropped a constraint mid-tool-storm, and the scratchpad cycle didn't land between your mention and the next compaction, your constraint was evicted along with the turn that contained it. The model sounds confident, but the original message is gone and nothing stood in for it.

The lesson builders can use tomorrow: if you want Claude Code to reliably remember a constraint across a long session, say it once, then ask it to add it to the plan or write it down. That forces it into the scratchpad on the next write cycle. Drop-in constraints made during heavy tool use are the ones that vanish.

Memory #2: the durable one, with individual files

The second system is Auto Memory, and it's completely different.

Gated behind tengu_passport_quail. (Yes, really.)
Lives on disk in a memory directory, one file per topic, each with YAML frontmatter.
Hard cap of 5 turns per extraction run.
Skips entirely if the main agent already wrote to memory that turn.

Session Memory is a scratchpad that dies with the session. Auto Memory is a small filing cabinet that lives across them. The key design choice worth noticing is one file per topic. Not a single monolithic memory blob. Not a vector store. Individual markdown-style files with frontmatter, each scoped to a thing — "the user prefers Tailwind," "this repo uses yarn not npm," "don't touch the legacy auth flow."

Auto Memory: one markdown file per topic with YAML frontmatter, written by a rate-limited extractor that skips if the main agent already wrote that turn, and overwrites facts in place when they change

Three things make this design stand out.

One file per topic beats one blob or one vector store. Topic files are retrievable, editable, and diff-able by the model itself. The agent can read "what do I know about this user's repo" as a set of discrete facts instead of trying to recall a soup. A vector DB forces you to trust similarity search to find the right memory — which works until it doesn't, and when it doesn't you have no way to tell why. A topic file has a filename. The model can just open it. You can too, if you want to see what it thinks it knows about you.

Overwrites, not appends. When a fact becomes wrong, the whole point is that you replace it. The user used to prefer Tailwind v3 and now prefers v4 — great, open tailwind.md, write the new version, done. Append-only memory sounds safer but it's the thing that rots fastest: old facts pile up next to new ones, the model doesn't know which is current, and retrieval starts returning both. Claude Code treats topic files the way you'd treat a notebook: open, update, close.

Rate-limited writes, with a skip-if-busy check. The extractor is capped at 5 turns per run, and — this is the subtle bit — if the main agent already wrote to memory during the same turn, extraction is skipped entirely. That "skip-if-the-agent-already-wrote" check is the kind of thing you only add after running into the failure: two writers racing to describe the same fact in slightly different words, one overwriting the other, and the memory file ending up in a worse state than before anyone wrote. The skip is a mutex by another name. Most home-rolled agent memory systems fire every turn, never coordinate, and end up either bankrupting you on tokens or silently overwriting themselves.

If you're building your own agent, the takeaway is almost insultingly simple: stop trying to build one giant memory. Build topic files. Let the agent pick which file to read. Let it overwrite. Rate-limit the writer. Add a skip-if-the-agent-already-wrote check. You are now 80% of the way to Claude Code's design.

Picture what a single topic file actually looks like on disk:

~/.claude/memory/
├── tailwind.md
├── package-manager.md
└── legacy-auth.md

$ cat ~/.claude/memory/tailwind.md
---
topic: styling preference
scope: user · global
updated: 2026-04-04
---
User prefers Tailwind v4. Uses the new @theme directive
and avoids @apply in component files. Do not suggest
styled-components, CSS modules, or sx props.

One fact. One file. Human-readable, agent-editable, trivially overwritable. That's the whole trick.

Memory #3: the one that runs while you sleep

Here's where it gets weirder, and genuinely interesting.

There's a third system in the leaked code called the Dream System, gated behind tengu_onyx_plover. It's a background memory consolidation pass — and the trigger gate is worth reading carefully.

Three conditions, all of which must hold before Dream runs:

≥ 24 hours since the last run (cooldown)
≥ 5 sessions since the last run (activity proof — "did anything even happen?")
An advisory lock acquired (single-runner guarantee — "nobody else is already doing this")

Each one is there for a reason. The 24-hour cooldown stops Dream from running all the time and burning tokens consolidating nothing. The 5-session threshold stops it from running when you've been dormant for a week and there's nothing new to consolidate. The advisory lock stops two Claude Code processes from stepping on each other if you happen to have two terminals open on the same machine. Miss any of those and the system either wastes money or corrupts its own memory.

The Dream System: the triple trigger gate must be fully satisfied before the four-phase loop runs. Orient to scan what's new, Gather to load topic files, Consolidate to merge related facts, Prune to drop stale ones

When the gate passes, the four-phase loop runs: Orient → Gather → Consolidate → Prune.

Orient — scan the inbox of recent activity. What's new since last time? Which topic files got touched? Which sessions ended in a weird state?
Gather — load the relevant topic files from disk. Not all of them. The ones that might need an update.
Consolidate — the hard phase. Merge related facts, reconcile contradictions, summarize the bits that got too long.
Prune — drop the stale ones. Facts from a repo the user hasn't touched in months; half-finished plans that never went anywhere; memory that's simply no longer true.

This is the thing that explains the spooky Claude Code moments. You come back after a week on a different project, open it on the old repo, and it already knows things in a suspiciously compressed way — not verbatim from a past conversation, but summarized, like someone reread their own notes and tidied up. Now we know why. Something ran overnight and actually reread them.

The architectural idea is stolen from how brains are believed to consolidate memory during sleep — write fast during the day, consolidate slowly in the background, prune what's stale. Calling it "Dream" is the kind of thing you roll your eyes at until you notice the triple trigger gate (time + activity + lock) is the same pattern you'd use for a well-behaved cron job, and it actually makes the fast-path writes cheaper because they don't have to be perfect. Auto Memory can capture sloppy, duplicated, occasionally contradictory facts on the hot path. Dream cleans up later.

If you're building anything stateful with an LLM, this is the pattern to copy. The expensive, thoughtful memory work doesn't have to happen on the user's turn. It can happen later, batched, with a real lock so you don't run it twice.

Why most agent memory systems get this wrong

The default agent memory approaches fall into two buckets, and both of them are worse than what Claude Code does.

Agent memory: the usual wrong answers vs. three specialized memories. Stuffing everything in the prompt bleeds tokens; dumping everything in a vector DB makes retrieval opaque. Three specialized memories — fast scratchpad, durable topic files, background consolidator — beat both.

Option one: stuff everything in the prompt. This is the "just expand the context window" approach. It works until the context window fills up, at which point you either truncate (losing stuff you care about) or pay increasingly absurd prices for tokens that mostly don't matter. It also makes every turn slower, because the model is rereading 40 pages of history to answer "what time is it."

Option two: dump everything in a vector database. This is the "RAG it" approach. It sounds more sophisticated, and it's often worse in practice. Vector search finds memories by similarity to the current turn, which means (a) you can't tell why a particular memory was retrieved, (b) you can't tell why a particular memory wasn't retrieved, and (c) you can't easily overwrite a memory when the underlying fact changes. You're at the mercy of the embedding model's idea of what's similar to what. And the moment you try to evict stale memories, you're building a second system on top of the first just to manage it.

Claude Code's answer is neither. It's three specialized memories, each with a clear job, each rate-limited, and a background pass that tidies up while nobody is looking.

Session Memory is fast, fixed-size, and in context. It catches the things you need right now, in this session.
Auto Memory is durable, per-topic, and on disk. It catches the things you need across sessions, in a format you can actually read and overwrite.
Dream is slow, batched, and gated by three independent conditions. It catches the things that require thought — merging, pruning, summarizing — and it does them when the user isn't looking.

None of these three would be sufficient on its own. Session Memory without Auto Memory forgets everything between sessions. Auto Memory without Dream accumulates contradictions and stale facts. Dream without the other two has nothing to consolidate. The design insight is that memory is not one problem — it's three problems on three different timescales, and the right answer is three different systems.

Why none of this is in the docs

Claude Code is a black box you run in your terminal. Anthropic never published the memory architecture because they didn't have to — it's an implementation detail. But once you can see the shape of it, you can feel it in how the product behaves. The session that got 40 turns deep and somehow still remembered the thing from turn 3. The week-later session that knew weirdly compressed facts about your repo. The constraint you mentioned once in the middle of a tool storm that got dropped like it never existed.

None of that is magic. It's three systems — a fast fixed-size scratchpad, a filing cabinet of topic files, and a background consolidator — doing their assigned jobs on their own clocks.

The reason this is worth paying attention to, even if you never look at the source yourself, is that every agent you build in 2026 is going to run into the same problem. Your context window is finite. Your users expect the thing to remember stuff across days and weeks. And the easy wrong answers — stuff it in the prompt, dump it in a vector DB — have been the easy wrong answers for two years now. Claude Code's design is a working, production, battle-tested alternative, and it's been hiding in plain sight the whole time.

Steal it. They can't sue you for stealing an architecture.

What to actually do with this

If you use Claude Code:

Long sessions are more reliable when you ask it to write down the plan and the constraints explicitly once. That routes them through Session Memory. Whispered asides don't.
If you drop a constraint mid-tool-storm and it gets forgotten, that's the scratchpad cycle missing it. Re-state it and ask the model to note it.
If you're coming back after a gap, expect Auto Memory to know compressed, summarized things about the repo — not exact quotes. Don't rely on it for precise rules.
If you want a rule to stick across sessions, tell Claude Code it should remember it. That's the trigger for the durable path.

If you're building your own agent:

Split memory into layers. A fast scratchpad (in-context, fixed size, overwritten on a cadence), a durable store (on-disk, one file per topic, rate-limited writer), and a background consolidator (gated by time, activity, and a lock).
Fixed schemas beat free-form notes. Give the scratchpad designated slots. The model will fill them. Drift is your enemy.
Write on a cadence, not every turn. Every N tokens or every N tool calls, whichever hits first. Both matter.
One file per topic, and let the agent overwrite. Retrievable, diff-able, replaceable. The idea of "immutable memory" sounds safe and is actually the thing that rots fastest.
Skip-if-busy. If the main agent already wrote to memory this turn, don't have the extractor write again. Two writers, one fact, no coordination — that's how you corrupt your own store.
Consolidate in the background, behind a triple-gate. Time since last run, activity since last run, advisory lock. Miss any of those and you either waste money or race yourself.

The frustration regex will get all the tweets. This is the part of the leak worth copying.