Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.kodus.io/llms.txt

Use this file to discover all available pages before exploring further.

BYOK (Bring Your Own Key) is the default way Kodus uses LLMs across every plan — Community, Teams, and Enterprise. You connect your own provider account, pick a model, pay only for what you use, and monitor costs directly on your provider’s dashboard. Kodus never marks up tokens and never sees your key in plain text.

How this maps to plans

BYOK is free on Community, included on Teams ($10/active dev/month on top of your token spend), and one of two options on Enterprise (the other being a Kodus-managed API key).

Getting Started

The BYOK screen has two paths: pick a recommended model from the curated catalog (fastest path, 90% of cases) or configure any provider manually (escape hatch for custom endpoints or uncurated models).
1

Open BYOK Settings

2

Pick a recommended model

The Main model section shows a grid of curated models we’ve benchmarked for code review. Click any card to start connecting it.
3

Paste your API key and test

Each card expands inline with a single input — just the API key. Click Test to probe the provider, or Test & save to run the test and persist the config on success.
4

Add a Fallback (recommended)

Once the Main model is configured, a Fallback model section appears. If your main provider hits rate limits or goes down, Kodus falls back automatically.
Test before saving. The Test button probes your provider with a cheap metadata call (no LLM inference is performed). It catches invalid keys, wrong base URLs, and network issues before they break your first code review.
These six models are curated for code review. They all appear in the catalog on /organization/byok and come pre-tuned with sensible defaults (temperature, max output tokens, and reasoning effort set to medium).

Claude Sonnet 4.6

Best balance of quality and costAnthropic’s latest Sonnet. Adaptive extended thinking, strong cross-file analysis, 200K context window.

Claude Opus 4.7

Flagship qualityTop-tier Anthropic model for the hardest reviews. 1M context, premium price.

Gemini 3.1 Pro (custom tools)

Largest contextGoogle’s flagship with custom-tools support. 1M context window — strongest on large PRs and monorepos.

GPT-5.4

Fast and consistentOpenAI’s latest flagship. Reliable low latency, broad knowledge, 400K context.

Kimi K2.6 Coding

Coding-specialized, cheapMoonshot AI’s coding-tuned model. Two plans: Developer API (pay-per-token) or Kimi Code Plan (subscription with dedicated endpoint).

GLM 5.1

Best subscription valueZ.ai’s latest. Two plans: Developer API (pay-per-token) or GLM Coding Plan (flat-rate subscription).
Our default recommendation: Start with Claude Sonnet 4.6 for the best overall code-review experience. If cost is the priority, GLM 5.1 on the Coding Plan or Kimi K2.6 on the Kimi Code Plan give flat-rate subscriptions that cap your monthly spend.

Plan selector (GLM 5.1 and Kimi K2.6)

Z.ai and Moonshot both offer a subscription plan with a different endpoint than their pay-per-token Developer API. The curated card for each of these models shows a Plan selector so you can pick the right endpoint before pasting your key.
PlanEndpointKeys fromBest for
Developer APIhttps://api.z.ai/api/paas/v4/z.ai/manage-apikeyBursty workloads, pay-per-token
Coding Planhttps://api.z.ai/api/coding/paas/v4z.ai/subscribePredictable team volume, flat monthly fee
GLM Coding Plan keys only work on /api/coding/paas/v4. The Lite and Pro tiers are often capped at 1 concurrent request — Kodus pre-fills maxConcurrentRequests=1 when you pick this plan. Bump it in Advanced settings if you’re on the Max tier (up to 30).

Configure manually

When the model you want isn’t in the curated list (custom endpoint, self-hosted LLM, or a provider we haven’t benchmarked), click Configure manually at the bottom of the catalog. This opens /organization/byok/manual?slot=main — a step-by-step wizard:
1

Pick a provider

Choose from OpenAI, Anthropic, Google Gemini, OpenRouter, Novita, or OpenAI Compatible (for any OpenAI-format endpoint).
2

Enter the base URL (if required)

OpenAI-compatible providers need an explicit base URL. The field only appears when you pick that provider.
3

Pick or type the model ID

If Kodus can list models from the provider, you get a dropdown. Otherwise (e.g. self-hosted or when platform keys aren’t configured), type the exact model ID manually.
4

Paste the API key

The key field appears once provider and model are set.
5

Tune advanced settings (optional)

Temperature, max tokens, reasoning effort, and max concurrent requests — all optional. Defaults are sensible for most providers.
6

Test and save

Click Test & save to run the connection probe and persist on success.
The same manual route works for Fallback — navigate with ?slot=fallback, or use the Add fallback link after Main is saved.

Supported Providers

Best for: Latest GPT models and reliable performance.Get an API key:
  1. Visit OpenAI API Keys
  2. Create a new key for Kodus
  3. Add billing information

Reasoning / Extended Thinking

All six recommended models support reasoning. The BYOK form exposes a Thinking toggle (Off / Low / Medium / High / Custom) under Advanced settings, pre-filled to Medium for every recommended model.

Preset levels

When you pick Low / Medium / High, Kodus translates the level to each provider’s native format automatically:
ProviderHow “medium” maps
Anthropic (Claude Sonnet 4.6 / Opus 4.7)thinking: { type: "adaptive" } + outputConfig: { effort: "medium" }
Google (Gemini 3.1 Pro)thinkingConfig: { thinkingLevel: "medium" }
OpenAI (GPT-5.4)reasoningEffort: "medium"
OpenRouterreasoning: { effort: "medium" }
OpenAI-compatible (Kimi K2.6 / GLM 5.1)thinking: { type: "enabled" } — binary on/off, level ignored
Kimi and GLM currently expose reasoning as a single on/off flag. Picking Low, Medium, or High all emit the same payload (thinking enabled). When their APIs add level granularity, Kodus will start forwarding it.

Custom JSON override

Picking Custom in the Thinking toggle reveals a JSON textarea. Paste the provider options directly — Kodus auto-wraps them under the active provider’s namespace. You don’t need to know the Vercel AI SDK routing rules. Use this when:
  • You need a specific budgetTokens value for Claude (instead of the preset effort mapping)
  • You want to enable/disable thinking on a per-model basis for OpenAI-compatible providers
  • You want fields beyond reasoning — caching, service tier, safety settings, user tagging, etc. The override is merged into providerOptions, so any adapter field passes through
  • The provider ships a new field Kodus hasn’t wrapped yet

Examples (paste directly — no namespace needed)

Override Claude’s thinking budget to exactly 20,000 tokens:
{
  "thinking": { "type": "enabled", "budgetTokens": 20000 }
}
Enable prompt caching (non-reasoning example):
{
  "cacheControl": { "type": "ephemeral" }
}

Going manual with namespaces (power users)

If your JSON already contains a known namespace key at the top level (anthropic, google, openai, openrouter, openaiCompatible), Kodus leaves it untouched. Useful if you want to mix multiple provider namespaces or be explicit:
{
  "openrouter": {
    "reasoning": { "effort": "high" },
    "provider": { "order": ["moonshot"], "allow_fallbacks": false }
  }
}
Under the hood, these are the namespace mappings Kodus uses:
BYOK providerNamespace key
anthropicanthropic
google_gemini / google_vertexgoogle
openaiopenai
open_routeropenrouter
openai_compatible / novitaopenaiCompatible

Gotchas

  • Valid JSON only. Missing commas or trailing commas break the parse and Kodus ignores the override.
  • Precedence: the JSON override fully replaces the effort-preset’s namespace block — if you override anthropic.thinking but forget anthropic.outputConfig, that field won’t be sent. OpenRouter routing (Pin providers / Allow fallbacks) is the one exception: it deep-merges with your override under openrouter.
  • Unknown provider = no wrap. If your BYOK provider isn’t in the namespace table above, Kodus passes the JSON through as-is. Rare — only applies if you configure a provider Kodus doesn’t recognize.

Pinning OpenRouter providers

OpenRouter is a router — when you request a model (e.g. moonshotai/kimi-k2.5), it forwards the call to one of several upstream providers (Moonshot direct, Together, Groq, Fireworks, Novita…). Each call can land on a different backend. That’s convenient, but it introduces silent variance:
  • Quality drift — upstreams run different precisions (FP8, INT4, full) and give subtly different outputs for identical prompts
  • Tool-calling inconsistency — some backends don’t support function calling the same way, leading to malformed tool use
  • Reasoning format variance — one upstream honors reasoning_effort, another only thinking.enabled, another ignores both
  • Latency swings — p50 can jump from 800ms to 4s between calls as routing changes
  • Rate-limit surprises — you hit quota on a backend you didn’t explicitly choose

How to pin

When your BYOK provider is OpenRouter, the Advanced settings panel shows an OpenRouter routing section with two fields:
  • Pin providers (in order) — comma-separated list of upstream names (e.g. moonshot, together). OpenRouter tries them in order and uses the first available.
  • Allow fallbacks — when off, requests hard-fail if none of the pinned providers are available. When on (default), OpenRouter can fall back to any other upstream that serves the model.
For a stable setup, pin a single provider and turn off fallbacks (Pin: moonshot, Allow fallbacks: off). Requests will always hit the same upstream or fail loudly — no silent quality changes. The tradeoff is zero resilience if that one upstream goes down; pair it with a different BYOK Fallback (e.g. Anthropic) to absorb outages.
Upstream names must match OpenRouter’s catalog. Check the provider tags on openrouter.ai/docs/features/provider-routing — common values include moonshot, together, groq, fireworks, novita.
Under the hood, Kodus emits this into the Vercel AI SDK call:
{
  "openrouter": {
    "provider": {
      "order": ["moonshot", "together"],
      "allow_fallbacks": false
    }
  }
}

Advanced: raw JSON override

If you need fields beyond order and allow_fallbacks (e.g. ignore, data_collection, require_parameters), switch Thinking to Custom in Advanced settings and paste the full routing payload — it’s merged into providerOptions alongside any reasoning config:
{
  "openrouter": {
    "provider": {
      "order": ["moonshot"],
      "allow_fallbacks": false,
      "ignore": ["deepinfra"],
      "data_collection": "deny"
    },
    "reasoning": { "effort": "medium" }
  }
}

Concurrency and rate limits

The maxConcurrentRequests field (under Advanced settings) caps how many inflight requests Kodus sends to your provider in parallel. Most of the time, the default is fine — but subscription plans with strict concurrency caps need it set explicitly.

Defaults Kodus pre-fills

Provider / planPre-filled valueWhy
GLM Coding Plan (Lite/Pro)1Subscription allows only one in-flight request. Going higher triggers 429s.
GLM Coding Plan (Max)1 (bump manually)Max allows up to 30, but we default to the safe value. Raise in Advanced settings.
Kimi Code Plan30Moonshot’s documented cap on the coding endpoint.
GLM Developer API(empty)Limits scale per key; no sensible global default.
Kimi Developer API(empty)Scales with your recharge tier (Tier 1 ≈ 50, Tier 5 ≈ 1000).
Anthropic / OpenAI / Google / OpenRouter(empty)Providers enforce their own TPM/RPM; Kodus doesn’t cap.

When to tune it

Raise it

  • You have a high-tier recharge on Moonshot/OpenRouter and want higher throughput on big PRs
  • You bumped your GLM Coding Plan to Max and want to use the full 30-concurrent budget
  • Reviews feel serialized on multi-file PRs and you’re not seeing 429s

Lower it

  • You see 429 or Too much concurrency errors in review logs
  • Your provider warns about rate limits on the dashboard
  • You want to conserve Coding Plan window (5h/weekly) across more PRs
Concurrency vs. RPM vs. TPM. maxConcurrentRequests only caps parallel inflight requests. Many providers also enforce separate RPM (requests per minute) and TPM (tokens per minute) limits. If you’re hitting RPM/TPM while concurrency looks fine, the fix is usually to upgrade your tier or spread load across time — not to change maxConcurrentRequests.
Fallback interaction. When Main hits a 429 and Kodus fails over to the Fallback model, the Fallback’s own maxConcurrentRequests applies. Setting a generous Fallback on a different provider is a good way to absorb bursts when your Main is on a tight subscription.

Best Practices

Security

Dedicated Keys

Create separate API keys for Kodus. Makes usage auditing and key rotation easier.

Regular Rotation

Rotate keys periodically and update them in BYOK settings.

Monitor Usage

Check your provider dashboards for unusual patterns.

Secure Storage

Never commit keys to repositories. Kodus stores them encrypted at rest and in transit.

Fallback Strategy

  • Use a different provider for Main and Fallback (e.g. Anthropic main, Google fallback). Protects against provider-specific outages.
  • Subscriptions with tight concurrency limits (GLM Coding Plan Lite/Pro, Kimi Code Plan) make poor solo configurations — pair them with a pay-per-token Fallback so bursty PRs don’t starve.

Troubleshooting

  • Copy the key without extra spaces, quotes, or trailing newlines.
  • Confirm billing is enabled and the account has credits.
  • For GLM Coding Plan / Kimi Code Plan keys, make sure you picked the matching Plan in the card — subscription keys don’t work on the Developer API endpoint and vice versa.
  • Verify the base URL matches the provider exactly (trailing slash matters for some).
  • For OpenAI-compatible providers, the models endpoint is usually {baseURL}/models.
  • The Test button validates the key/endpoint but doesn’t verify the specific model ID. If you typed a model that doesn’t exist (typo), the first real review fails.
  • Cross-check the model ID against the provider’s catalog before saving.
  • Lower Max concurrent requests in Advanced settings.
  • On GLM Coding Plan Lite/Pro, stay at 1 concurrent. Upgrade to Max (30 concurrent) if you need more throughput.
  • On Kimi Code Plan, the documented cap is 30 concurrent.
  • If Kodus is configured via .env (self-hosted Fixed Mode), the BYOK screen shows a blue info banner with the active provider/model — the key is never displayed for security.
  • Saving a BYOK config on top of .env prompts a confirm dialog before overriding.
  • Reasoning adds tokens. If cost is spiking, lower Thinking from Medium to Low, or switch to a cheaper model for Main.
  • Check your provider dashboard for the per-model breakdown.
  • Set a monthly cap at the provider level.

Frequently Asked Questions

Yes. The change takes effect for the next review — no redeploy required.
Reviews automatically switch to the Fallback model if one is configured. Without a Fallback, the review fails and returns an error. Always configure a Fallback.
Main handles every review by default. If it fails (rate limit, 5xx, timeout), Kodus retries once on Fallback. You pay only for the provider that actually processed the review.
No. Different providers protect against provider-specific outages. A common pairing: Anthropic Main + Google Fallback, or GLM Coding Plan Main + Anthropic Fallback for spike coverage.
Yes. Keys are encrypted at rest and in transit and never logged in plain text. The BYOK status endpoint never returns the raw key.
Yes — via the OpenAI Compatible provider in the manual wizard. Enter your endpoint’s base URL, the model ID it exposes, and a placeholder API key (most self-hosted runtimes ignore the key header but still require one).