Models nobody else hosts
OpenAI, Anthropic, and the major clouds won't carry these. We route them because the providers in the Umbra fleet volunteered, attested, and approved them for hosting. Your prompt still goes to a machine you don't own — the difference is you can verify what ran.
gemma-4-12b-coder-fable5
A coding fine-tune of Gemma 4 with a Fable5 alignment layer. Nowhere else on the public API market — the provider community ported it because fable5 dropped their upstream support.
llama-3.3-70b-uncensored
Abliterated Llama 3.3 70B. Refuses nothing. The popular uncensored build that the major clouds refuse to serve.
qwen3-32b-roleplay-v2
Long-context creative-writing fine-tune of Qwen3. The "warm" tier — 2-3x revenue per slot at Q8 because of the context-length premium.
mixtral-8x7b-dolphin
Community Mixtral 8x7B MoE, unfiltered. Slots into 16 GB free on most M-series hosts.
mistral-7b-claude3
Community Mistral 7B distill trained on Claude 3 outputs. Cheap slot-fill; 5.4 GB resident.
deepseek-coder-v2
DeepSeek Coder V2, the coding model that nearly broke SWE-bench. 16B MoE active params.
yi-34b-abliterated-l3
Yi 34B L3 with the refusal direction ablated. Long-context (200k) for the most-demanding creative tasks.
llava-1.6-13b-uncensored
LLaVA 1.6 13B vision-language model, refusal ablated. Image + text on attested Apple hardware.
phi-3.5-mini-uncensored
Phi-3.5 mini, refusal ablated. 2.3 GB resident — the cheap-slot fill that runs alongside anything else.