April 10, 2026 | Agent Grounding

The Repo Matrix: A Living Document That Stops AI Agents Hallucinating Your Codebase

Key Takeaways

  • The repo matrix is a structured snapshot of the codebase: endpoints, schema, dependencies, env vars, framework. Mechanically scanned, not LLM-generated.
  • Cheap, fast, deterministic, reproducible. The same scan over the same tree gives the same answer every time.
  • AI agents query the matrix before suggesting code. New endpoint suggestions get checked against what already exists.
  • Hallucinated endpoints fail the matrix lookup before they reach a PR. The reviewer never sees them.
  • Session capture feeds matrix evolution: decisions made today update the matrix tomorrow, with the session that produced them as the source of truth.

When an AI assistant suggests an endpoint, three things can happen. The endpoint exists (good). It does not exist (hallucination, easy to spot). Or it nearly exists in a slightly different shape, with one parameter renamed, one method swapped, one auth header forgotten (the worst kind of bug, very hard to spot). The third case is the one that costs teams weeks of debugging downstream.

The fix is a repo matrix: a mechanically-scanned, session-fed, queryable picture of the codebase that the agent reads before it suggests code. Endpoints, dependencies, schema, environment variables, framework conventions. None of it generated by the LLM. All of it grounded in what is actually in the file tree.

What "grounding" actually means in practice

Most discussions of grounding talk about retrieval over documentation. That is one shape, and it is useful, but it is not the shape that solves the AI-suggests-fake-endpoints problem.

The problem is specific. The model does not know what is in your codebase. It has seen a thousand other codebases and learned what plausible code looks like. When asked to extend yours, it generates plausible-shaped code that may or may not match what you actually have.

The matrix is the answer because it is a structured, current, mechanical extract of the actual code. Not docs (which lag the code), not READMEs (which lie), not the model's training data (which is generic). The exact list of routes the team's API exposes today, with their methods, paths, auth notes, expected payload shape, response shape, and a one-line description of each, sourced directly from the route files themselves.

When the agent is asked to add a new endpoint, the matrix lookup answers four questions cheaply:

  1. Does an endpoint with this purpose already exist?
  2. If yes, what is its current signature, so the agent can use it instead of writing a new one?
  3. If no, does the team have a pattern for this kind of endpoint that the new one should follow?
  4. What auth shape does the rest of the API use, so the new endpoint matches?

Without the matrix, the agent guesses. Plausibly. The PR ships. The bug surfaces in production three weeks later.

What the scanner is, and what it is not

The matrix is built by a mechanical scanner, not an LLM. The scanner walks the file tree and produces a structured extract:

  • API endpoints: parsed from the route files of whatever framework the codebase uses. Methods, paths, params, auth decorators, response models.
  • Database schema: parsed from migrations or schema declarations. Tables, columns, indexes, foreign keys, RLS policies if applicable.
  • Dependencies: parsed from package.json, pyproject.toml, Cargo.toml, etc. Versions and resolution status.
  • Environment variables: parsed from .env.example files and process.env.X references.
  • Framework conventions: detected from file structure (Next.js app router, Django apps, Rails engines, etc.).

Every field is extracted from a specific file at a specific line number. The matrix is reproducible: scan the same tree twice, get the same answer twice. There is no hallucination layer because there is no LLM.

What the scanner is not: it does not capture the cross-cutting concerns that have never been written down. Which auth flow touches three services. Which naming convention the team enforces by review rather than by code. Which module looks complex only because of a six-month-ago decision. The scanner extracts what is in the file tree, no more.

The unwritten things get captured elsewhere in the brain (session capture, meeting note ingestion, the wiki compiler), and the matrix is one layer of a larger picture. But it is the layer that catches the hallucinated endpoints.

Living, not static

The big lift over a static schema dump is that the matrix is living. As developers (or LLM agents) work on the repo, the structured session-capture pipeline produces topical chunks: decisions made, endpoints added, schemas changed, dependencies bumped. A periodic background job reads those chunks and proposes updates to the matrix, citing the session that produced each update. A human reviews and promotes. Over time, more of the load can move to automation.

The result is a matrix that reflects the team's current understanding of the codebase, not the snapshot from when somebody last wrote the README. New decisions land in the matrix within hours, not the weeks documentation drift takes.

What this looks like in production

We run this against our own product portfolio. The matrix for Leap (the most heavily-instrumented case) catalogues 64 API endpoints across five apps, 70+ Postgres tables with full RLS coverage, 30+ external scrape allowlist domains, and the framework choice (Next.js on Vercel, Supabase backend with pgvector for skill matching).

Each row drills down. Asking the brain "what are the Leap talent app endpoints" returns the actual list of 46 routes with their methods, paths, auth notes, request payload structure, response shape, and a one-line description of each, sourced directly from the route files. The matrix in the article is the index. The detail is too long to belong in any single document, so the agent that needs it asks for it.

When we open a fresh agent session against Leap, we do not have to explain any of this. The agent queries the matrix, gets the current state, and starts grounded. When we ask "where would I add a new endpoint for X", the agent checks whether something similar already exists before suggesting code. Hallucinated endpoints do not survive that check.

Onboarding new joiners and new agents

The same lookup that grounds AI agents also grounds humans. A new team member arrives, asks the brain "how does auth work in this app", and gets the matrix entry for the auth flow plus the most recent decision that shaped it. Days of "ask the senior engineer" become minutes.

This generalises. Any time the team rediscovers something it already knew (which is most of what slow onboarding feels like), the matrix is the cheap fix.

Take the Next Step

If your team has been losing time to PR cycles where AI-suggested code conflicts with the codebase the AI does not know, the gap is grounding. We help teams stand up the matrix layer, wire it to session capture, and route their agent runtimes to query it before suggesting anything. Get in touch if you want to scope it for your own codebase.