April 20, 2026 | Engineering Reality

The Same Bug, Every Six Months: Why Your Team Needs a Gotcha File

Key Takeaways

  • The same library quirks, schema mismatches, and silent-fail behaviours get rediscovered every six months because nobody can find the writeup from last time.
  • Confluence pages and Slack threads are the worst place to keep gotchas: invisible to retrieval, not surfaced when the symptom returns.
  • Treat gotchas as a first-class entity type in the knowledge layer, with a fixed shape: symptom, root cause, fix, repro.
  • Gotchas tagged for retrieval surface automatically when an agent or a developer encounters a similar symptom in a future session.
  • The compounding effect: institutional bug history becomes day-one knowledge for new joiners, the same bug stops getting rediscovered.

A library has a quirk. The first time it bites the team, an engineer spends two days debugging it, fixes it, and writes a Confluence page or a Slack message about what happened. Six months later, a different engineer hits the same quirk, and spends two days rediscovering it. The Confluence page is buried. The Slack message has scrolled away. The first engineer has moved teams or moved on.

This is not a discipline failure. It is a filing failure. The team produced documentation. The documentation became unreachable. The fix is not "write better docs". The fix is to treat gotchas as a first-class entity in the team's knowledge layer, with a structured shape, deliberate retrieval, and a binding rule that they get captured the moment they get found.

Three real gotchas, all from production

To make this concrete. Each of these cost an engineer days the first time, and would have cost zero seconds the second time if the gotcha file had been in place:

  • A library validates against the bundled schema and silently accepts invalid input. The library logs nothing, returns a 200, and the broken data lands in the next stage of the pipeline where it surfaces as a downstream parse failure. Time to first detection on the original bug: about a week. Time to detection if the gotcha was retrieved on first symptom: minutes.
  • A particular DSL does not support nested array filters and returns empty results without erroring. The query syntactically validates. The result set is empty. The team assumes there is no matching data. The actual data exists, the filter just cannot reach it. Original time to detection: three days, with an unrelated lead from a colleague who had hit it before.
  • An LLM model's reasoning_effort setting eats the max_tokens budget on chain-of-thought and silently truncates downstream JSON parsing. Pipeline rate is mysteriously low. Half the records parse-fail with no obvious cause. Original time to detection: a full evening of debugging. Once captured as a gotcha, every future agent session that mentions reasoning models surfaces this in retrieval automatically.

These are not exotic. Every team has its equivalents. The team's bug pattern is specific to its stack, its tools, and its history. The pattern is the team's institutional memory, and most teams are losing it the moment the engineer who hit it moves on.

The fixed shape of a gotcha

A gotcha is not a blog post. It is a structured entry with a small number of fields:

  • Symptom: what the engineer observed. The thing they would search for if they hit it again. Plain language, the actual error or the actual mysterious behaviour.
  • Root cause: what was actually happening underneath. Not the surface error, the deeper reason.
  • Fix: what the engineer did to make it go away. The exact change, with the file path or config flag if applicable.
  • Repro: a small example of how to trigger it. Optional but high-value when present.

That is the entire schema. Four fields, each one short. The structure exists to make the gotcha retrievable by symptom (the thing the future engineer will search for) and useful on retrieval (so the future engineer knows what to do). Anything else (long writeups, design discussions, context about why the team chose this stack) belongs elsewhere. The gotcha file is for getting the engineer back to work fast.

Where they live, and how they get retrieved

Gotchas live in a dedicated folder in the brain's vault, one markdown file per gotcha, tagged for retrieval. When an engineer (or an LLM agent) is working on a problem and the symptom matches a gotcha's symptom field, the brain surfaces it during normal retrieval. No special workflow, no separate gotcha database, just structured content tagged correctly.

For agents specifically, this means an agent session that hits a similar symptom gets the gotcha in retrieval before it tries to debug from first principles. The agent's first move can be "have we seen this before", and the brain's answer is concrete and bounded.

For human engineers, the same retrieval works through the brain's normal query interface. "Why is my pipeline silently dropping records" returns the gotcha if it has been written. The engineer reads the four fields and is unblocked in minutes.

The capture rule

The gotcha shape only works if gotchas actually get captured. The rule we run on Dendro Logic, and recommend for any team:

When you find a gotcha, you write the gotcha. Now. Before the next task. Four fields, ten minutes maximum. The cost of writing it is paid back the first time anyone (you, a colleague, an agent) hits the same thing again.

Capture happens at the point of pain because pain is the cheapest motivator. A week later, the urgency has faded and the writeup never gets done. An hour after the fix, the symptom and root cause are still vivid. The four fields take ten minutes. The payoff is permanent.

LLM agents help here. The brain's MCP toolchain exposes a tool that records a gotcha inline from any session. The agent can call it as soon as the human says "huh, that was annoying". One tool call, four fields filled in from the conversation, gotcha indexed and retrievable from the next session onward.

The second-order win

The first-order benefit is "the same bug does not get rediscovered". The second-order benefit is bigger. New team members get the institutional bug history on day one. Every gotcha the team has captured is searchable. Onboarding is no longer "shadow a senior engineer for three months and absorb the bug DNA". It is "query the brain, read the gotchas relevant to your area, ask follow-up questions". Days, not months.

This is the compounding effect. A team that captures gotchas religiously builds an asset that gets more valuable every quarter. A team that does not is paying the rediscovery tax forever, and the tax is highest right when a senior engineer leaves.

Take the Next Step

If your team has been losing time to bug-rediscovery cycles you cannot quite name, the fix is structural. We help teams stand up the gotcha entity, wire capture into the agent workflow, and route retrieval so future sessions catch the patterns the team has already learned. Get in touch if you want to scope it for your own team.