May 22, 2026 | Engineering

Surviving Claude Compaction: How to Make Your Sessions Outlive Their Context Window

Surviving Claude Compaction hero image


Agent amnesia is the productivity problem I keep coming back to. Every fresh session starts cold. Every compaction inside a long session severs the chain of reasoning the model had built up. Every hand-off across days, machines, or developers loses what the previous session learned. The work each new session needs is the same work the previous session already did. Multiply by the number of sessions you run in a week and the cost is real.

The bigger story is that amnesia is the root source of the hallucination problem. As developers, we accumulate institutional knowledge over years on a codebase. The product’s quirks, the team’s conventions, the historical reason behind a weird pattern, the gotchas nobody bothered to write down. Agents accumulate none of that. Every session starts at zero institutional grounding. We expect them to know, especially in a repo we work on every day, where the principles feel obvious to us. So we re-explain. Every session. And on the session where we forget to explain one of them, because we have taken it for granted ourselves, the agent operates without it. It writes something that looks plausible and is wrong. That is the hallucination. The agent did not know your specific product or your internal process, and it could not, because nobody told it this session.

I have been chipping away at this for a while. The first version of my fix was a per-project SQLite store that captured each session’s reasoning to a local database keyed by working directory. It worked. It also bound the memory to one project, one machine, one developer, and the schema needed maintaining by hand. The system I run now is the evolution of that pattern, factored to be token-efficient, retrieval-driven, and decoupled from any particular project or developer.

What follows is the architecture as I deploy it, in five tiers. The first tier is free and uses files Claude Code already writes to your disk. The fifth is the team version I am rolling out for a contract engineering team. The pattern is the same at every level. The right tier for you depends on how much continuity you need across sessions, machines, and people, and how much infrastructure you want to absorb to get it.

The pattern, briefly

Compaction destroys context because the context window is the only memory the model has. The fix is to give it a second memory. That memory lives outside the context window, persists across sessions, and gets retrieved at the start of each new turn.

flowchart LR
    A["Tier 1<br/>Existing JSONLs<br/>on disk"] --> B["Tier 2<br/>Gitignored mirror<br/>in your repo"]
    B --> C["Tier 3<br/>Scrubbed mirror<br/>secrets stripped"]
    C --> D["Tier 4<br/>Personal brain<br/>retrievable"]
    D --> E["Tier 5<br/>Team brain<br/>shared continuity"]

Each tier solves a slightly bigger version of the same problem. Tier 1 makes sure the conversation is not lost when Claude Code’s local cleanup runs. Tier 2 makes the history portable with the repo. Tier 3 makes it safer to keep. Tier 4 makes it searchable. Tier 5 makes it shared with collaborators.

Pick whichever tier matches the cost you are willing to absorb. Do not feel obliged to climb the ladder.

Tier 1: Use the chat logs you already have

Claude Code already writes every session to disk. On my Windows laptop they live at:

C:\Users\<you>\.claude\projects\<encoded-cwd>\<session-uuid>.jsonl

On macOS and Linux it is ~/.claude/projects/.... Each project gets a folder whose name is the absolute path of the working directory with the slashes encoded. Inside that folder is one JSON-lines file per session, named after the session UUID. Each line is a single turn: the user message, the assistant response, every tool call, every tool result.

If you compact mid-session and want to recover what was forgotten, the source is sitting there. You can open the file in a text editor. You can grep it. You can read the recent assistant turns to remind yourself what you decided to rule out.

So tier 1 is, honestly, just “know where it lives”. Most people I have shown this to had no idea Claude Code was writing it down at all.

There is one catch worth mentioning, and it is the reason most people end up at tier 2.

The 30-day gotcha

Claude Code defaults to deleting session transcripts older than 30 days. The setting is cleanupPeriodDays in ~/.claude/settings.json. If you do not look at it, the default fires silently, your older sessions disappear, and you only notice when you go looking for a decision you made six weeks ago and the file is gone.

You can raise the limit. You can also set it to a very large number. But raising the limit does not solve the second problem, which is that the JSONLs live in your global Claude config folder, not next to the repo they belong to. If you move machines, switch laptops, or pull the project on a fresh checkout, the history does not travel.

Tier 2 addresses both.

Tier 2: Mirror the session into the repo

A small Stop hook copies the just-finished session JSONL into a gitignored folder inside the repo. The destination lives next to the code it documents, and it is no longer at the mercy of Claude Code’s cleanup window.

Add .claude-history/ to your .gitignore. Then write the hook:

#!/usr/bin/env python3
"""Stop hook: mirror this session's JSONL into <repo>/.claude-history/."""
import json
import os
import shutil
import sys
from pathlib import Path

def main():
    payload = json.load(sys.stdin)
    session_id = payload.get("session_id")
    cwd = Path(payload.get("cwd", "."))
    if not session_id:
        return

    # Claude Code stores transcripts per cwd, slashes encoded.
    encoded = str(cwd).replace(":", "").replace("\\", "-").replace("/", "-").lstrip("-")
    src = Path.home() / ".claude" / "projects" / encoded / f"{session_id}.jsonl"
    if not src.exists():
        return

    dest_dir = cwd / ".claude-history"
    dest_dir.mkdir(exist_ok=True)
    shutil.copy2(src, dest_dir / src.name)

if __name__ == "__main__":
    main()

Wire it up in your project’s .claude/settings.json:

{
  "hooks": {
    "Stop": [
      {
        "matcher": ".*",
        "hooks": [{ "type": "command", "command": "python .claude/hooks/mirror-session.py" }]
      }
    ]
  }
}

One small but load-bearing detail. The Stop hook does not fire at the end of the session. It fires at the end of every Claude turn, the moment the model finishes responding. The hook runs after every reply, refreshing the mirror with the latest turn appended. By the time you close the laptop, the gitignored copy is always current. There is no “I forgot to save” failure mode.

When you come back to the project in three months, the entire conversation history is right there next to the code, browsable, greppable, and as portable as the repo itself.

The point of tier 2 is not retrieval automation. It is making sure you still have the bytes when you decide you want them.

Tier 3: Scrub before you store

Tier 2 has a failure mode worth being honest about. Session transcripts contain everything Claude saw. If you ever pasted an .env file into a turn, ran a tool that printed a credential, or grepped a secret into the model’s view, that value is now sitting in plain text inside .claude-history/. Gitignored is fine until you zip the repo to share with a colleague, force-push and accidentally break the gitignore rule, or reuse the folder as a template for the next project.

A short scrubber takes care of the obvious cases. It pattern-matches the canonical secret shapes (OpenAI keys, GitHub tokens, AWS access keys, JWT tokens, .env-style KEY=value lines, generic bearer tokens) and rewrites them in place.

"""Stream a JSONL session file through regex secret-scrubbing."""
import json
import re
import sys

PATTERNS = [
    (re.compile(r"sk-[A-Za-z0-9]{32,}"), "[REDACTED:openai_key]"),
    (re.compile(r"ghp_[A-Za-z0-9]{36,}"), "[REDACTED:github_token]"),
    (re.compile(r"ghs_[A-Za-z0-9]{36,}"), "[REDACTED:github_server_token]"),
    (re.compile(r"AKIA[0-9A-Z]{16}"), "[REDACTED:aws_access_key]"),
    (re.compile(r"eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+"), "[REDACTED:jwt]"),
    (re.compile(r"(?im)^([A-Z][A-Z0-9_]*KEY|TOKEN|SECRET|PASSWORD)=.+

quot;), r"\1=[REDACTED:env_var]"), (re.compile(r"(?i)bearer\s+[A-Za-z0-9._\-]{20,}"), "Bearer [REDACTED:bearer]"), ] def scrub(text: str) -> str: for pat, replacement in PATTERNS: text = pat.sub(replacement, text) return text for line in sys.stdin: obj = json.loads(line) obj_str = json.dumps(obj) sys.stdout.write(scrub(obj_str) + "\n") 

Run that between the mirror step and the on-disk write, or as a pre-commit guard on .claude-history/. About forty lines if you want the full kit with a CLI wrapper.

Here is the honest framing. A regex scrubber catches the shapes, not the secrets. It will stop the embarrassing accidents (someone pasted an AWS key, the model echoed your OPENAI_API_KEY=... line in a tool result, a Bearer header appeared in a curl trace). It will not catch custom internal tokens, raw values that bear no recognisable prefix, or anything the model paraphrased rather than quoted verbatim. Treat it as meaningful reduction in blast radius, not as “now safe to publish”. For anything that touches a repo you or your team might one day open-source, layer a real secrets scanner (gitleaks, trufflehog) on top of it before any move that could leak.

Tier 4: Put it in a brain

This is the tier I run. Tier 3 gives you a folder of files you can grep when you remember to. Tier 4 makes the content retrievable, semantically and at speed, without you having to remember anything.

This is also where the institutional grounding problem from the opening gets solved. The principles you have explained, the boundaries you have drawn, the gotchas you have hit, the conventions you have set, all get stored as retrievable artefacts. The new session reads them on resume, whether the previous session ended via compaction, a closed laptop on Friday afternoon, or a hand-off to a colleague. The agent stops starting from zero.

The trick is the same Stop hook from tier 2, but now it posts each turn to the brain as it lands. The brain accepts the raw turn, runs it through a re-summarisation pipeline, writes a hash-addressable topical chunk, and indexes it. On the next session start, a SessionStart hook does a small retrieval pass and injects the relevant chunks into the new context.

sequenceDiagram
    participant CC as Claude Code
    participant Hook as Stop hook<br/>(per turn)
    participant Brain as Brain<br/>(re-summarise + index)
    participant Next as Next session

    CC->>Hook: turn ends
    Hook->>Brain: POST /sessions/finalise<br/>(latest turn)
    Brain->>Brain: chunk + summarise<br/>+ index
    Next->>Brain: SessionStart: search<br/>"what was last touched here?"
    Brain-->>Next: breadcrumb chunks
    Next->>CC: inject into new context

Breadcrumbs, not a digest

This is the part of the system that took the most thought, and it is where I think most other session-memory approaches go wrong.

The default instinct is to summarise. Take the previous conversation, run it through a small model, get back a tidy paragraph of “here is what we worked on, here are the decisions we made, here are the conclusions”. Drop that into the next session’s context. The agent has a recap. Job done.

I did exactly that in my first attempt and it failed in a specific way. A condensed summary tells the next agent what happened. It does not tell the agent where the truth lives. The summary is a paraphrase, and an agent that trusts a paraphrase will quote it back, build on top of it, extend it into territory the paraphrase glossed over, and gradually drift away from the real codebase. Hallucination is what happens when an agent is left to happy-path its way off a cliff. A digest does not move the cliff. It just gives the agent more confidence walking towards it.

So I do not write digests. I write breadcrumbs. Each summary chunk captures the structure a continuation agent actually needs:

  • Primary request and intent. The original ask, and the trajectory of how it has evolved across the session. The next agent sees not just “user wants feature X” but “user asked for X, mid-session pivoted to also cover Y, with standing instruction Z applying throughout”.
  • User messages, verbatim. The user’s directives in their own words, chronologically. The next agent reads exactly what was asked, not a paraphrase of what was asked. This single field eliminates a surprising amount of drift, because paraphrase loss is where intent quietly mutates.
  • Key technical concepts. The architecture in play. Stack, ADRs, libraries, design patterns, conventions the previous session established. So the agent does not rediscover the room from scratch.
  • Files and code sections. Not just paths, but what each touched file does and how it fits. runner.ts: single-agent runner replacing five specialists, imports from critic.ts and tools.ts.” Real paths, real roles.
  • Errors and fixes. Bug plus resolution pair, not bug alone. “Vercel build broke on a synchronous export from a use server file (commit 285ae02). Hot-fixed by extracting the sync helpers to a non-server module (commit 8c742cb).” The lessons that would burn an hour if rediscovered cold.
  • Problem solving. Root-cause analysis for the hard ones. The why behind the fix, so the agent recognises when the same shape will recur.
  • Current work. What was in flight at the moment continuity broke. “Just created signing.ts, next step is wiring the export route to call signPassport when ?signed=1.”
  • Pending tasks. The task list as it stood at the end of the turn.
  • Optional next step. An explicit concrete next action, with file paths and verification commands, written by the previous session for the next session to pick up cold.

The new session reads those breadcrumbs and does not need to trust them. It can re-open the named files, re-read the user’s verbatim words, re-grep the error strings, re-run the verification commands. The summary is not a substitute for the truth. It is an index pointing at the truth, with enough specificity that the agent never has to guess where to look.

That is the difference. Most session-memory systems aim to replace the previous conversation with a short version. Mine aims to ground the next conversation in the actual artefacts, with the previous reasoning sitting alongside as navigation. The agent stays in the room. The token economy is the wrapper. The grounding is the substance.

Why this tier earns the infrastructure

Retrieval is the secondary benefit. Cross-session search (“we ruled out EAGLE3” matches a question about speculative decoding), semantic plus full-text (“did we ever set max-num-seqs?”), and queries that span repos entirely all come for free once the breadcrumb chunks exist.

I run the brain as a Docker container on an always-on machine on Tailscale, so my laptop can reach it from anywhere. The model doing the re-summarisation is a local 120B mixture-of-experts. No data leaves my network. Inference cost rounds to electricity.

The honest assessment is that this tier carries real infrastructure overhead. A machine that is always on, a service to maintain, a small set of hooks to keep aligned with whatever Claude Code’s payload shape happens to be this month. If you have ambient compute already (a homelab, a spare workstation, even an old Mac mini under the desk) it pays for itself in roughly the time it takes to lose a long debugging session once. If you do not, tier 3 is the right place to stop.

Tier 5: Make the brain shared

The same pattern scales to teams with a small but interesting twist. Instead of each developer’s brain holding their sessions only, the brain destination is shared. When a colleague hits compaction mid-investigation, the next person who picks up the thread resumes with the prior session already retrievable.

That turns “I lost my context” into “the team did not lose its context”. The bigger claim is the more useful one.

In the team version, the scrubber is no longer optional. A solo gitignored mirror only burns its author if it leaks. A shared brain leaks across colleagues by design. So between the Stop hook and the brain write, an explicit redaction step is part of the contract, not a sidecar option. Allowlist patterns become more defensible than blocklists. Anything that crosses the wire is logged, attributable, and reviewable.

I am building exactly this for a team I work with as part of a wider engineering knowledge graph. The pattern is identical to the personal version. The destination changes, the scrubber tightens, the retrieval surface gets a contributor field on every chunk so you can attribute what came from whom. Everything else, including the Stop hook and the SessionStart retrieval, is the same code with a different environment variable.

Pick your tier

If you are reading this and have never thought about session continuity before, tier 1 is a thirty-second win. Open your .claude/projects/ folder. Look at the last JSONL. Realise the conversation is on your disk.

If you have ever lost a session and wished you had not, tier 2 takes an evening to set up and pays back the first time you reach for it.

If you work with shared codebases or anything sensitive, tier 3 is the responsible step up from tier 2 and costs little extra.

Tiers 4 and 5 are real engineering and you should only attempt them if you want what they give you. But the pattern is the same at every level. The general lesson is small and worth saying plainly: do not let your AI tools forget what you taught them. The infrastructure to remember is cheaper than the work it preserves.


Drafted 2026-05-22. The session continuity stack runs in production against my own daily coding session work, and is being extended for a contract engineering team. Components: Stop hook (fires at the end of every agent turn, mirrors the turn to disk and POSTs to /sessions/finalise on the brain), brain-side chunk + re-summarise + index pipeline that writes breadcrumb-style chunks (primary request and intent, user messages verbatim, key technical concepts, files and code sections, errors and fixes, problem solving, current work, pending tasks, next step) keyed on working directory and session id, SessionStart hook that runs a small retrieval pass and injects the relevant breadcrumbs into the new context window. Re-summarisation runs on openai/gpt-oss-120b (mixture-of-experts, 120B total / 5.1B active) at concurrency up to c=256 on the local stack. Hardware: single NVIDIA DGX Spark.