# published copy — local paths, usernames and private infrastructure names redacted; every figure, flag and timestamp unmodified; originals preserved in the repository.
# Market probe — does the gap survive? (honest, by-name, 2026 sources)
*We actively tried to KILL the differentiator by searching LM Studio / Ollama / Jan / niche tools. Result below.*

## Verdict: GAP-PARTIAL — the OOM-recovery leg is genuinely unserved (and it's the moat)

| Feature | Competitor coverage (2026) | Keep as differentiator? |
|---|---|---|
| Live VRAM profiling | PARTIAL — LM Studio pre-load estimator + GPU overlay; Ollama smart scheduler / `ollama ps`; Jan context-to-VRAM; nvidia-smi for live. None profile *during* inference, but the value is mostly covered. | **No** (weak) |
| Auto **context** recommendation | COVERED — Jan "Fit to Hardware" (default-on, discrete GPU); Ollama v0.17 VRAM-tiered context defaults. | **No** (covered) |
| Auto **quant** recommendation | PARTIAL — absent from LM Studio/Ollama (manual pick); exists in niche CLI **LLM Checker** (Pavelevich/llm-checker). | Yes, secondary |
| **Runtime OOM-RECOVERY** (detect → back-off → shrink ctx / offload more → retry until boot+serve) | **NONE.** All competitors do **load-time PREVENTION**, not runtime recovery: Ollama auto-offload + tiered context (bypassed by explicit `num_ctx`, silently spills/crashes on overflow); LM Studio "Limit to Dedicated GPU Memory" (load-time only; still crashes at runtime — bugs #1754, #1753; v0.4.16 adds nothing); Jan "Fit to Hardware" (upfront cap; "recovery" = manual human advice). | **YES — the moat** |

## The pain is real (2026)
OOM is the #1 first-time-user complaint on Ollama's GitHub issues + r/LocalLLaMA; multiple dedicated OOM-fix guides published in 2026; open bugs persist (LM Studio #1754/#1753, Ollama #10114/#11354/#9957). "Which quant fits my GPU?" is still manual trial-and-error in the big two.

## Honest recommendation
PROCEED, but **lead the pitch with runtime OOM-recovery + auto-quant**, not profiling/auto-context (those are covered). Caveats: (1) it's an **engineering** moat — incumbents could add a recovery loop in a point release → ship fast + go deep (multi-axis back-off: context, ngl, KV-quant, then quant); (2) validate the recovery loop on **AMD/ROCm + Apple Silicon**, not just one RTX 3070; (3) profiling stays a supporting feature.

*Sources: lmstudio.ai/changelog (v0.4.16), lmstudio-ai/lmstudio-bug-tracker #1754 & #1753, docs.ollama.com/context-length, ollama/ollama #12353 #10114 #11354, jan.ai "Fit to Hardware", github.com/Pavelevich/llm-checker, + 2026 OOM-fix guides. Full probe in the workflow transcript.*