Benchmarks

Every figure on this page is wrapped in claim markup pointing to its published source file under /proofs/ — and this site's build fails if any figure stops matching its source. Raw logs are published as-is (redaction of local identifiers is declared in their first line; figures untouched).

Runtime OOM-recovery — three GPU vendors

Tested: boot deliberately impossible configs and let the recovery ladder work (KV-quant → context → offload). Result: NVIDIA RTX 3070 (CUDA): a 262144-token context request recovered to 131072 and served a real reply. AMD Radeon 780M (Vulkan): recovered 1000000→500000 and served. Apple M1 (Metal): runs & serves; forced-OOM recovery not yet demonstrated there. 2026-06 · raw: RESULTS → · AMD → · cross-platform →

Watchdog — mid-generation VRAM collapse, real pressure

Tested: a second GPU process loads mid-generation (the "you opened a game" event), on an RTX 4070 Laptop 8 GB. Result: floor crossed at 102 MiB free → controlled restart → recovered in 223.9 s at context 8192 → 4096 on the same port, under sustained pressure — then served. Honest cost: the generation in flight is lost. Counter-test: 0 soft alerts, 0 interventions on normal runs. 2026-06-11 · raw log →

Persistence — the config that booted is remembered

Tested: cold launch vs warm launch of a 9.5 GB MoE model on the 8 GB RTX 4070; artificial fingerprint invalidation; poisoned cache entry. Result: cold plan 222.63 s → warm boot directly at the known-good config; a failed attempt costs ~220 s of model loading — the cache exists so it is never paid twice. Invalidation and append-only supersede both proven. 2026-06-11 · raw log →

Install — zero prerequisites, host and clean VM

Tested: scrubbed environment on the host, then a brand-new Windows 11 VM with no toolchain (no GPU → CPU target). Result: host: 7.6 s from cold to a served completion (1 GB test model on disk, pinned b9592 fetch + mandatory SHA256 included). Clean VM: booted in 10.04s, 12.2s total — after the pass found and fixed two real first-run blockers (lazy TLS root store; missing MSVC runtime DLLs). 2026-06-11 · raw log →

Energy — measured, not estimated

Tested: RAPL (CPU package) + nvidia-smi during boot and a 256-token decode; Ryzen 7 8845HS + RTX 4070 Laptop. Result: decode 215.13 J over 4.41 s = 48.8 W CPU avg = 0.84 J/token (CPU-side), GPU 37.4 W avg. Without the measuring tool installed, the field is absent — never estimated. 2026-06-11 · raw log →