Save-On-End Is Broken: Why AI Sessions Should Persist Turn by Turn

Key Takeaways

The clean-shutdown fallacy: save-on-end persistence breaks the moment a real-world event (crash, force-quit, network drop) bypasses the shutdown hook.
The trickle pattern writes the assistant’s most recent turn to disk before the next turn fires. Worst-case loss: one in-flight turn.
A periodic re-summariser turns raw turn-by-turn transcripts into topical chunks (decisions, blockers, progress, open questions, follow-ups) for retrieval.
The result: the next agent session starts where the last one finished, even after a crash.
Two lines of architecture (append-after-turn plus periodic re-summarise) close the gap that a more elaborate save-on-end design cannot.

You spend 90 minutes walking an AI agent through the auth flow of an app it has never seen. The agent learns the shape, drafts the next set of changes, and is mid-flight on the third edit when the IDE crashes. The save-on-session-end hook that was supposed to capture today's context never fires. Your morning's work, all that careful context-loading, gone. The next session starts cold and you do the briefing again.

This is not a freak accident. It is the predictable failure mode of any persistence pattern that depends on a clean shutdown. The fix is small, surgical, and almost universally absent from "AI memory" projects: persist after every assistant turn, not at session end.

The clean-shutdown fallacy

Most AI session persistence is wired to a shutdown hook. The agent runs through a session, accumulates context, and writes it all to disk when the session ends. The pattern is intuitive, easy to implement, and right roughly 80% of the time.

The other 20% is where it fails. Specifically:

IDE crash. Most agent sessions live inside an IDE. When the IDE goes down, the agent's process goes with it. Shutdown hook never fires.
Computer restart. A Windows update reboots the machine overnight. Yesterday's session, half-saved.
Force-quit. A developer kills the IDE because something is hanging. The shutdown hook is exactly what is hanging.
Network drop. The agent talks to a remote service. The connection drops mid-session. The local process keeps running but cannot persist anything.
Run-away token consumption. The user kills the session because it is burning budget. Same outcome as force-quit.

In every one of those cases, the shutdown hook never executes. Whatever was in memory is lost. The team blames "AI memory" for being unreliable, when the actual fault is a persistence pattern that assumes happy-path shutdowns in a world that rarely delivers them.

The trickle pattern

The fix is mechanical. After every assistant turn, before the next user turn fires, write the turn to disk. The persistence is asynchronous (it does not block the agent), append-only (the file grows over time), and durable (it survives any crash that does not wipe the disk).

In code shape, the pattern is roughly:

User sends a message.
Agent generates a response.
Before the agent yields the response back to the UI, an async write fires that appends the user message and the agent response to the session's persistent log.
The agent yields. The user reads. The next turn begins.

If the IDE crashes between steps 3 and 4, the latest turn is already on disk. If the crash happens during step 3, you lose at most the in-flight turn (the one the user hasn't seen yet anyway). Worst-case loss: one turn. Compared to "an entire morning of context", this is approximately negligible.

The implementation is a few lines of code. It is not exotic. The reason most projects do not have it is that the clean-shutdown pattern feels sufficient until the day it is not.

Topical chunks: the second pass

Raw turn-by-turn logs are great for crash safety and terrible for retrieval. They are too granular. A query like "what did we decide about the auth refactor" wants a coherent piece of synthesis, not a 30-turn log.

The trickle pattern handles persistence. A second pass handles retrieval. Periodically (we run ours daily), a background job reads the raw turn-by-turn log and produces topical chunks from it: the decisions made in the session, the blockers hit, the progress shipped, the open questions, the follow-ups. Each chunk is independently retrievable, with a stable hash reference back to the source log. The granularity of retrieval matches the granularity of the question.

This is what makes session capture practically useful for the next agent. A new session opens, queries memory_search for the topic, and gets the chunks (not the raw logs). The agent grounds in coherent prior context, not in 4,000 lines of conversation.

Why both halves matter

The trickle pattern alone gives you crash safety but not useful retrieval. The chunk re-summariser alone gives you retrieval but loses everything in a crash before the next summariser run. Together they cover both failure modes:

Crash before re-summarise: the raw turn-by-turn log is on disk. The next re-summariser run picks it up. No data loss, slight delay.
Crash after re-summarise: the chunks already exist in retrieval. Next session reads them. Continuity preserved.

Two simple primitives, layered. Together they make session capture genuinely robust.

What this changes for the team

The visible effect, once this is wired up, is that "the AI forgot what we were doing" stops being a thing. A new session opens against the same project, queries the brain, and gets the actual state of yesterday's work. Not a generic "this is a Next.js project". The specific decisions, the specific blockers, the specific next steps.

The invisible effect is bigger. Continuity removes the daily re-onboarding tax. No more spending the first 20 minutes of every session re-loading context the agent already saw yesterday. Whatever the team decided last week is in retrieval now. The cumulative time savings, across a team and a quarter, are substantial.

Take the Next Step

If your team's AI sessions feel amnesiac, and you suspect "AI memory" is the problem, the more likely problem is that your persistence layer trusts the shutdown hook. We help teams instrument the trickle-fed pattern and the chunk re-summariser as two small primitives that close the gap entirely. Get in touch if you want to scope it for your stack.