Changelog

Release history. Every figure below is read from a committed validation file — the rule of this site.

v0.2.0 — 2026-06-11

Renamed to VRAMPilot. The product was previously called Inference Autopilot; the Python package is now vrampilot. The generic engine keeps the governor name (~/.governor/, GOVERNOR_* environment variables) on purpose.
Zero-prerequisite first run. The first launch fetches a pinned llama.cpp build (b9592) for the detected OS and GPU, with mandatory SHA256 verification — a mismatch deletes the file and aborts. Gate measured at 7.6 s from a cold start to a real served completion.
Clean Windows VM validated. A brand-new Windows 11 VM (no GPU — the worst case, exercising the CPU fallback) passed the same gate in 12.2s, after finding and fixing two real first-run blockers the scrubbed-host test could not see: a fresh Windows TLS root-CA store breaking the first download (fixed with an OS curl.exe fallback — safe because the pinned SHA256, not the transport, is the security) and the missing MSVC runtime that official llama.cpp Windows binaries import (fixed by bundling 3 DLL(s), deployed app-locally only when missing).
GitHub Releases confirmed as the primary binary host. The host needs no trust: the pinned manifest plus mandatory SHA256 is the security. GOVERNOR_SERVER_BASE_URL remains the self-hosting mechanism for any mirror. Windows code signing postponed.
Energy measurement gate. Decode measured at 0.84 J/token (CPU-side) on the bench rig. The energy probe is observation-only and optional: the product never depends on the instrument, and when it is absent, energy is reported as absent — never estimated.

v0.1.x — June 2026 (sprint)

Honesty pass. Every VRAM reading is labeled measured or estimated — enforced by the type. The KV-cache estimate reads head_dim from the GGUF when present; the fallback is documented.
Persistence — the machine that learns. Every configuration that actually booted is persisted in a local append-only SQLite; the next launch starts at the known-good configuration. Gate passed on real runs: a failed attempt costs a full model load, measured at ~220 s per attempt for a 9.5 GB MoE. The gate also found and fixed a real bug: a fixed boot timeout was killing healthy boots of large models mid-load.
In-inference watchdog. Controlled restart at a degraded configuration when free VRAM crosses the auto-calibrated floor mid-generation. Gate passed under real external VRAM pressure: floor crossed at 102 MiB free, recovered in 223.9 s while the pressure stayed; counter-test 0 soft alerts, 0 interventions on normal generations.
Governor extraction. The recovery loop was extracted into a domain-agnostic engine (core.py) with frozen contracts; the persistence and watchdog gates were re-passed identically on the refactored code — the non-regression proof.
Cross-platform validation. Runs and serves a real completion on 3 GPU vendors on real hardware: NVIDIA (CUDA), AMD (Vulkan), Apple (Metal). OOM-recovery demonstrated clean on NVIDIA and AMD; on Apple the mechanism is in place but a clean forced-OOM demo was not feasible on the small CI runner — stated, not hidden.
Market probe. Tried by name to kill the differentiator against LM Studio, Ollama and Jan; runtime OOM-recovery survived as genuinely unserved (validation/MARKET.md).