Insights & Articles

Field notes from the work. Sovereign AI, agent grounding, compliance architecture, and the things we learned the hard way.

All Agent Grounding App Modernization Business Intelligence Compliance Construction Core Data Logic Data Strategy Engineering Reality Enterprise Mobility Process Logic Software Sovereign AI System Strategy

April 6, 2026 | Sovereign AI

The “Fast” Model Wasn’t Fast: A Real Pipeline, 30x Faster on a Bigger Model

Key Takeaways A 9B “fast” model running at concurrency 4 hit 130 records/hour on the production pipeline. A 120B MoE reasoning model at concurrency 256 hit ~3,900 records/hour on the same hardware. “Fast” model selection logic falls apart on inference hardware that scales with concurrency. The big model is faster wall-clock when you let it […]

April 3, 2026 | Sovereign AI

The Single-Stream Benchmark Trap: How to Actually Evaluate AI Inference Hardware

Key Takeaways Memory bandwidth on inference hardware is a budget that gets *spent* per weight-read pass, not per stream. Adding streams piggybacks on the same pass. A DGX Spark single-streaming a dense 70B-class model produces around 5-6 tokens per second. The same hardware at concurrency 256 hits 695 tokens per second aggregate. Reviews that report […]

April 1, 2026 | Sovereign AI

What You Can Actually Run on £4,000 of Local AI Hardware

Key Takeaways A single DGX Spark holds 128 GB of unified LPDDR5x memory, enough to keep a 120B-class MoE model resident at native 4-bit precision plus headroom. Two services, two ports: one heavy reasoning model on 8000, one fast utility model on 8002. The combined stack uses about 85 GB of GPU memory, leaving 30+ […]

March 30, 2026 | Sovereign AI

HEAVY, MEDIUM, LIGHT: Three Tiers Beats One Big Model

Key Takeaways HEAVY tier: interactive sessions where a human is waiting and quality matters. Frontier models, gateway-routed, paid per token. Used sparingly. MEDIUM tier: background and async work. Local 120B-class reasoning models on hardware you control. No per-token bill. LIGHT tier: deterministic transforms, classification, normalisation, regex-shaped tasks. Small models or no model at all. Microsecond […]

March 27, 2026 | Sovereign AI

The One Config Flag That 5x’d Our Throughput