How Chutes works
Chutes AI is a decentralized serverless compute platform for open-source models. It exposes an OpenAI-compatible inference endpoint and offers subscription plans that bundle API usage up to a cap expressed as a multiple of the equivalent pay-per-token value — similar in structure to the Z.AI GLM Coding Plan, but covering the full open-source catalog (DeepSeek, Llama, Qwen, MiniMax, Kimi, and many more). Kodus talks to Chutes through the same OpenAI-compatible adapter it uses for everything else, so there are no code changes — just BYOK credentials.Plans at a glance
Pricing and quota rules change. Always confirm at chutes.ai/pricing before choosing a tier.
| Tier | Monthly fee | Notes |
|---|---|---|
| Base | ~$3/mo | Entry tier; limited model selection. |
| Standard | ~$10/mo | Required for frontier models (DeepSeek V3, MiniMax M2.1, etc.). |
| Pro | ~$20+/mo | Higher 5× cap for heavier review volume. |
| Enterprise | custom | Contact Chutes. |
- The 5× cap resets monthly and is computed against the same per-token prices you’d pay pay-as-you-go.
- Some models require Standard or higher — the base tier does not carry frontier coding models.
- Chutes marks some models with a
-TEEsuffix indicating trusted-execution-environment (confidential compute) variants.
Recommended models
Chutes uses HuggingFace-styleorg/model identifiers, sometimes with a -TEE suffix for the confidential-compute variant:
| Model id | Notes |
|---|---|
deepseek-ai/DeepSeek-V3-0324-TEE | Frontier coding model; strong agentic behavior. Requires ≥ Standard. |
moonshotai/Kimi-K2-Instruct | Long-context Kimi K2 — great on large PRs. |
Qwen/Qwen3-Coder-480B-A35B-Instruct | Specialized coder. |
chutes/MiniMaxAI/MiniMax-M2.1-TEE | Alternative frontier option. |
Creating an API Key
- Go to chutes.ai and create an account.
- Subscribe to a tier at chutes.ai/pricing, or enable pay-as-you-go if you prefer.
- Open the developer console and create an API key. Copy it immediately.
Configure Chutes in Kodus
Option 1 — BYOK on Kodus Cloud (recommended)
- In the Kodus web UI, open Settings → BYOK and click Edit on the Main model (or Fallback).
- Toggle the form into Custom mode so you can enter a base URL and a free-text model name.
-
Fill the fields:
Field Value Provider OpenAI CompatibleAPI Key your Chutes API key Base URL https://llm.chutes.ai/v1Model e.g. deepseek-ai/DeepSeek-V3-0324-TEEMax Concurrent Requests 3–5is a safe start; raise if you don’t hit capMax Output Tokens leave default unless you hit truncation - Save. Kodus validates the key against the endpoint and surfaces any 401 / 404 immediately.
- Open a PR to trigger a review; the BYOK status badge turns green on the first successful call.
Because Chutes runs on decentralized compute, cold-start and tail latency vary more than on dedicated providers. Configure an OpenAI or Anthropic key as Fallback so Kodus can fail over when a node is slow or the monthly cap is hit.
Option 2 — Self-hosted (environment variables)
If you run Kodus in Fixed Mode (single global provider, no per-org BYOK), configure Chutes in the.env of your API + worker containers:
This path is only needed for self-hosted Kodus installs that deliberately disable BYOK. If BYOK is enabled on your self-hosted instance, prefer Option 1 — the UI-based flow is the same as on Cloud.
.env, then verify:
When to pick Chutes
- You want the broadest open-source catalog at a subscription price — frontier DeepSeek / MiniMax / Qwen at a flat fee with predictable caps.
- You care about confidential compute — Chutes offers
-TEEvariants that run inside trusted execution environments, useful if your compliance posture requires it. - You’re running at low-to-mid volume and fit within the 5× PAYG cap of a cheap tier.
Troubleshooting
Model requires higher tier
Model requires higher tier
- Frontier models (DeepSeek V3, MiniMax M2.1, some Qwen variants) are gated to the Standard tier and above since Feb 2026.
- Either upgrade, or pick a model available on your current tier (smaller Llama or Qwen variants).
Monthly 5× cap reached
Monthly 5× cap reached
- Check current usage in the Chutes dashboard.
- Switch temporarily to a cheaper model to extend the cap, or upgrade tier.
- Configure a
FallbackBYOK provider so reviews keep running while you’re capped.
401 / authentication errors
401 / authentication errors
- Confirm the key is active in the Chutes dashboard and the subscription is current.
- Make sure there are no trailing spaces or quotes in the
.envvalue.
Model not found
Model not found
- Chutes uses the
org/modelformat, with some variants ending in-TEE(confidential compute). Double-check exact capitalization at llm.chutes.ai/v1/models.
Slow or inconsistent latency
Slow or inconsistent latency
- Chutes runs on decentralized compute, so tail latency is higher than on dedicated clouds.
- For latency-sensitive reviews, prefer dedicated providers; reserve Chutes for overnight or batch review jobs, or configure a fast provider as
Mainand Chutes asFallback.
Connection errors
Connection errors
- Confirm your server can reach
llm.chutes.ai. - Review API and worker logs for the exact upstream error.