BYOK - Bring Your Own Key

BYOK (Bring Your Own Key) is the default way Kodus uses LLMs across every plan — Community, Teams, and Enterprise. You connect your own provider account, pick a model, pay only for what you use, and monitor costs directly on your provider’s dashboard. Kodus never marks up tokens and never sees your key in plain text.

How this maps to plans

BYOK is free on Community, included on Teams ($10/active dev/month on top of your token spend), and one of two options on Enterprise (the other being a Kodus-managed API key).

Getting Started

The BYOK screen has two paths: pick a recommended model from the curated catalog (fastest path, 90% of cases) or configure any provider manually (escape hatch for custom endpoints or uncurated models).

Open BYOK Settings

Go to app.kodus.io/organization/byok.

Pick a recommended model

The Main model section shows a grid of curated models we’ve benchmarked for code review. Click any card to start connecting it.

Paste your API key and test

Each card expands inline with a single input — just the API key. Click Test to probe the provider, or Test & save to run the test and persist the config on success.

Add a Fallback (recommended)

Once the Main model is configured, a Fallback model section appears. If your main provider hits rate limits or goes down, Kodus falls back automatically.

Test before saving. The Test button probes your provider with a cheap metadata call (no LLM inference is performed). It catches invalid keys, wrong base URLs, and network issues before they break your first code review.

Recommended Models

These six models are curated for code review. They all appear in the catalog on /organization/byok and come pre-tuned with sensible defaults (temperature, max output tokens, and reasoning effort set to medium).

Claude Sonnet 4.6

Best balance of quality and costAnthropic’s latest Sonnet. Adaptive extended thinking, strong cross-file analysis, 200K context window.

Provider: Anthropic
Model ID: claude-sonnet-4-6
Key: console.anthropic.com

Claude Opus 4.7

Flagship qualityTop-tier Anthropic model for the hardest reviews. 1M context, premium price.

Provider: Anthropic
Model ID: claude-opus-4-7
Key: console.anthropic.com

Gemini 3.1 Pro (custom tools)

Largest contextGoogle’s flagship with custom-tools support. 1M context window — strongest on large PRs and monorepos.

Provider: Google Gemini
Model ID: gemini-3.1-pro-preview-customtools
Key: aistudio.google.com/apikey

GPT-5.4

Fast and consistentOpenAI’s latest flagship. Reliable low latency, broad knowledge, 400K context.

Provider: OpenAI
Model ID: gpt-5.4
Key: platform.openai.com/api-keys

Kimi K2.6 Coding

Coding-specialized, cheapMoonshot AI’s coding-tuned model. Two plans: Developer API (pay-per-token) or Kimi Code Plan (subscription with dedicated endpoint).

Provider: OpenAI-compatible (Moonshot AI)
Model ID: kimi-k2.6
Keys: platform.moonshot.ai or kimi.com/code

GLM 5.1

Best subscription valueZ.ai’s latest. Two plans: Developer API (pay-per-token) or GLM Coding Plan (flat-rate subscription).

Provider: OpenAI-compatible (Z.ai)
Model ID: glm-5.1
Keys: z.ai console or z.ai/subscribe

Our default recommendation: Start with Claude Sonnet 4.6 for the best overall code-review experience. If cost is the priority, GLM 5.1 on the Coding Plan or Kimi K2.6 on the Kimi Code Plan give flat-rate subscriptions that cap your monthly spend.

Plan selector (GLM 5.1 and Kimi K2.6)

Z.ai and Moonshot both offer a subscription plan with a different endpoint than their pay-per-token Developer API. The curated card for each of these models shows a Plan selector so you can pick the right endpoint before pasting your key.

GLM 5.1 (Z.ai)
Kimi K2.6 (Moonshot AI)

Plan	Endpoint	Keys from	Best for
Developer API	`https://api.z.ai/api/paas/v4/`	z.ai/manage-apikey	Bursty workloads, pay-per-token
Coding Plan	`https://api.z.ai/api/coding/paas/v4`	z.ai/subscribe	Predictable team volume, flat monthly fee

GLM Coding Plan keys only work on /api/coding/paas/v4. The Lite and Pro tiers are often capped at 1 concurrent request — Kodus pre-fills maxConcurrentRequests=1 when you pick this plan. Bump it in Advanced settings if you’re on the Max tier (up to 30).

Plan	Endpoint	Keys from	Best for
Developer API	`https://api.moonshot.ai/v1`	platform.moonshot.ai	Pay-per-token, concurrency scales with recharge tier
Kimi Code Plan	`https://api.kimi.com/coding/v1`	kimi.com/code	Subscription with dedicated coding endpoint

Kimi Code Plan is documented at a cap of 30 concurrent requests. Kodus pre-fills maxConcurrentRequests=30 when you pick that plan.

Configure manually

When the model you want isn’t in the curated list (custom endpoint, self-hosted LLM, or a provider we haven’t benchmarked), click Configure manually at the bottom of the catalog. This opens /organization/byok/manual?slot=main — a step-by-step wizard:

Pick a provider

Choose from OpenAI, Anthropic, Google Gemini, OpenRouter, Novita, or OpenAI Compatible (for any OpenAI-format endpoint).

Enter the base URL (if required)

OpenAI-compatible providers need an explicit base URL. The field only appears when you pick that provider.

Pick or type the model ID

If Kodus can list models from the provider, you get a dropdown. Otherwise (e.g. self-hosted or when platform keys aren’t configured), type the exact model ID manually.

Paste the API key

The key field appears once provider and model are set.

Tune advanced settings (optional)

Temperature, max tokens, reasoning effort, and max concurrent requests — all optional. Defaults are sensible for most providers.

Test and save

Click Test & save to run the connection probe and persist on success.

The same manual route works for Fallback — navigate with ?slot=fallback, or use the Add fallback link after Main is saved.

Supported Providers

Best for: Latest GPT models and reliable performance.Get an API key:

Visit OpenAI API Keys
Create a new key for Kodus
Add billing information

Best for: Specialized providers (Moonshot, Z.ai, Fireworks, Together, Groq, DeepSeek) or self-hosted endpoints.How to configure:

In the manual wizard, pick OpenAI Compatible as the provider.
Enter the base URL (e.g. https://api.moonshot.ai/v1, https://api.z.ai/api/paas/v4/, https://api.fireworks.ai/inference/v1).
Provide the key and model ID.

Z.ai (GLM) guide

Full Z.ai setup with Coding Plan details.

Moonshot (Kimi) guide

Kimi K2.6 + Kimi Code Plan setup.

Fireworks AI

Fireworks-specific setup.

Together AI

Together AI setup.

Reasoning / Extended Thinking

All six recommended models support reasoning. The BYOK form exposes a Thinking toggle (Off / Low / Medium / High / Custom) under Advanced settings, pre-filled to Medium for every recommended model.

Preset levels

When you pick Low / Medium / High, Kodus translates the level to each provider’s native format automatically:

Provider	How “medium” maps
Anthropic (Claude Sonnet 4.6 / Opus 4.7)	`thinking: { type: "adaptive" }` + `outputConfig: { effort: "medium" }`
Google (Gemini 3.1 Pro)	`thinkingConfig: { thinkingLevel: "medium" }`
OpenAI (GPT-5.4)	`reasoningEffort: "medium"`
OpenRouter	`reasoning: { effort: "medium" }`
OpenAI-compatible (Kimi K2.6 / GLM 5.1)	`thinking: { type: "enabled" }` — binary on/off, level ignored

Kimi and GLM currently expose reasoning as a single on/off flag. Picking Low, Medium, or High all emit the same payload (thinking enabled). When their APIs add level granularity, Kodus will start forwarding it.

Custom JSON override

Picking Custom in the Thinking toggle reveals a JSON textarea. Paste the provider options directly — Kodus auto-wraps them under the active provider’s namespace. You don’t need to know the Vercel AI SDK routing rules. Use this when:

You need a specific budgetTokens value for Claude (instead of the preset effort mapping)
You want to enable/disable thinking on a per-model basis for OpenAI-compatible providers
You want fields beyond reasoning — caching, service tier, safety settings, user tagging, etc. The override is merged into providerOptions, so any adapter field passes through
The provider ships a new field Kodus hasn’t wrapped yet

Examples (paste directly — no namespace needed)

Anthropic
Google Gemini
OpenAI
OpenRouter
OpenAI-compatible (Kimi, GLM, etc.)

Override Claude’s thinking budget to exactly 20,000 tokens:

{
  "thinking": { "type": "enabled", "budgetTokens": 20000 }
}

Enable prompt caching (non-reasoning example):

{
  "cacheControl": { "type": "ephemeral" }
}

Explicit thinking budget (Gemini 2.5) or level (Gemini 3+):

{
  "thinkingConfig": { "thinkingBudget": 16000 }
}

Adjust safety settings:

{
  "safetySettings": [
    { "category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE" }
  ]
}

Reasoning with OpenAI-specific fields:

{
  "reasoningEffort": "high",
  "serviceTier": "flex",
  "store": false,
  "user": "kodus-review"
}

Force reasoning + ignore a specific upstream:

{
  "reasoning": { "effort": "high" },
  "ignore": ["deepinfra"]
}

Enable thinking with a budget hint:

{
  "thinking": { "type": "enabled", "budget_tokens": 25000 }
}

Explicitly disable thinking:

{
  "thinking": { "type": "disabled" }
}

Fields the upstream provider doesn’t recognize (e.g. budget_tokens on a server that ignores it) are silently dropped. Check the provider’s docs to confirm what they accept.

Going manual with namespaces (power users)

If your JSON already contains a known namespace key at the top level (anthropic, google, openai, openrouter, openaiCompatible), Kodus leaves it untouched. Useful if you want to mix multiple provider namespaces or be explicit:

{
  "openrouter": {
    "reasoning": { "effort": "high" },
    "provider": { "order": ["moonshot"], "allow_fallbacks": false }
  }
}

Under the hood, these are the namespace mappings Kodus uses:

BYOK provider	Namespace key
`anthropic`	`anthropic`
`google_gemini` / `google_vertex`	`google`
`openai`	`openai`
`open_router`	`openrouter`
`openai_compatible` / `novita`	`openaiCompatible`

Gotchas

Valid JSON only. Missing commas or trailing commas break the parse and Kodus ignores the override.
Precedence: the JSON override fully replaces the effort-preset’s namespace block — if you override anthropic.thinking but forget anthropic.outputConfig, that field won’t be sent. OpenRouter routing (Pin providers / Allow fallbacks) is the one exception: it deep-merges with your override under openrouter.
Unknown provider = no wrap. If your BYOK provider isn’t in the namespace table above, Kodus passes the JSON through as-is. Rare — only applies if you configure a provider Kodus doesn’t recognize.

Pinning OpenRouter providers

OpenRouter is a router — when you request a model (e.g. moonshotai/kimi-k2.5), it forwards the call to one of several upstream providers (Moonshot direct, Together, Groq, Fireworks, Novita…). Each call can land on a different backend. That’s convenient, but it introduces silent variance:

Quality drift — upstreams run different precisions (FP8, INT4, full) and give subtly different outputs for identical prompts
Tool-calling inconsistency — some backends don’t support function calling the same way, leading to malformed tool use
Reasoning format variance — one upstream honors reasoning_effort, another only thinking.enabled, another ignores both
Latency swings — p50 can jump from 800ms to 4s between calls as routing changes
Rate-limit surprises — you hit quota on a backend you didn’t explicitly choose

How to pin

When your BYOK provider is OpenRouter, the Advanced settings panel shows an OpenRouter routing section with two fields:

Pin providers (in order) — comma-separated list of upstream names (e.g. moonshot, together). OpenRouter tries them in order and uses the first available.
Allow fallbacks — when off, requests hard-fail if none of the pinned providers are available. When on (default), OpenRouter can fall back to any other upstream that serves the model.

For a stable setup, pin a single provider and turn off fallbacks (Pin: moonshot, Allow fallbacks: off). Requests will always hit the same upstream or fail loudly — no silent quality changes. The tradeoff is zero resilience if that one upstream goes down; pair it with a different BYOK Fallback (e.g. Anthropic) to absorb outages.

Upstream names must match OpenRouter’s catalog. Check the provider tags on openrouter.ai/docs/features/provider-routing — common values include moonshot, together, groq, fireworks, novita.

Under the hood, Kodus emits this into the Vercel AI SDK call:

{
  "openrouter": {
    "provider": {
      "order": ["moonshot", "together"],
      "allow_fallbacks": false
    }
  }
}

Advanced: raw JSON override

If you need fields beyond order and allow_fallbacks (e.g. ignore, data_collection, require_parameters), switch Thinking to Custom in Advanced settings and paste the full routing payload — it’s merged into providerOptions alongside any reasoning config:

{
  "openrouter": {
    "provider": {
      "order": ["moonshot"],
      "allow_fallbacks": false,
      "ignore": ["deepinfra"],
      "data_collection": "deny"
    },
    "reasoning": { "effort": "medium" }
  }
}

Concurrency and rate limits

The maxConcurrentRequests field (under Advanced settings) caps how many inflight requests Kodus sends to your provider in parallel. Most of the time, the default is fine — but subscription plans with strict concurrency caps need it set explicitly.

Defaults Kodus pre-fills

Provider / plan	Pre-filled value	Why
GLM Coding Plan (Lite/Pro)	`1`	Subscription allows only one in-flight request. Going higher triggers 429s.
GLM Coding Plan (Max)	`1` (bump manually)	Max allows up to 30, but we default to the safe value. Raise in Advanced settings.
Kimi Code Plan	`30`	Moonshot’s documented cap on the coding endpoint.
GLM Developer API	(empty)	Limits scale per key; no sensible global default.
Kimi Developer API	(empty)	Scales with your recharge tier (Tier 1 ≈ 50, Tier 5 ≈ 1000).
Anthropic / OpenAI / Google / OpenRouter	(empty)	Providers enforce their own TPM/RPM; Kodus doesn’t cap.

When to tune it

Raise it

You have a high-tier recharge on Moonshot/OpenRouter and want higher throughput on big PRs
You bumped your GLM Coding Plan to Max and want to use the full 30-concurrent budget
Reviews feel serialized on multi-file PRs and you’re not seeing 429s

Lower it

You see 429 or Too much concurrency errors in review logs
Your provider warns about rate limits on the dashboard
You want to conserve Coding Plan window (5h/weekly) across more PRs

Concurrency vs. RPM vs. TPM. maxConcurrentRequests only caps parallel inflight requests. Many providers also enforce separate RPM (requests per minute) and TPM (tokens per minute) limits. If you’re hitting RPM/TPM while concurrency looks fine, the fix is usually to upgrade your tier or spread load across time — not to change maxConcurrentRequests.

Fallback interaction. When Main hits a 429 and Kodus fails over to the Fallback model, the Fallback’s own maxConcurrentRequests applies. Setting a generous Fallback on a different provider is a good way to absorb bursts when your Main is on a tight subscription.

Best Practices

Security

Dedicated Keys

Create separate API keys for Kodus. Makes usage auditing and key rotation easier.

Regular Rotation

Rotate keys periodically and update them in BYOK settings.

Monitor Usage

Check your provider dashboards for unusual patterns.

Secure Storage

Never commit keys to repositories. Kodus stores them encrypted at rest and in transit.

Fallback Strategy

Use a different provider for Main and Fallback (e.g. Anthropic main, Google fallback). Protects against provider-specific outages.
Subscriptions with tight concurrency limits (GLM Coding Plan Lite/Pro, Kimi Code Plan) make poor solo configurations — pair them with a pay-per-token Fallback so bursty PRs don’t starve.

Troubleshooting

'Invalid API key' when clicking Test

Copy the key without extra spaces, quotes, or trailing newlines.
Confirm billing is enabled and the account has credits.
For GLM Coding Plan / Kimi Code Plan keys, make sure you picked the matching Plan in the card — subscription keys don’t work on the Developer API endpoint and vice versa.

'Endpoint not found' when clicking Test

Verify the base URL matches the provider exactly (trailing slash matters for some).
For OpenAI-compatible providers, the models endpoint is usually {baseURL}/models.

Model not found at review time (key test passed)

The Test button validates the key/endpoint but doesn’t verify the specific model ID. If you typed a model that doesn’t exist (typo), the first real review fails.
Cross-check the model ID against the provider’s catalog before saving.

'Rate limited' or 'Too much concurrency'

Lower Max concurrent requests in Advanced settings.
On GLM Coding Plan Lite/Pro, stay at 1 concurrent. Upgrade to Max (30 concurrent) if you need more throughput.
On Kimi Code Plan, the documented cap is 30 concurrent.

Self-hosted env vars not showing

If Kodus is configured via .env (self-hosted Fixed Mode), the BYOK screen shows a blue info banner with the active provider/model — the key is never displayed for security.
Saving a BYOK config on top of .env prompts a confirm dialog before overriding.

High or unexpected costs

Reasoning adds tokens. If cost is spiking, lower Thinking from Medium to Low, or switch to a cheaper model for Main.
Check your provider dashboard for the per-model breakdown.
Set a monthly cap at the provider level.

Frequently Asked Questions

Can I switch providers anytime?

Yes. The change takes effect for the next review — no redeploy required.

What happens if my API key runs out of credits?

Reviews automatically switch to the Fallback model if one is configured. Without a Fallback, the review fails and returns an error. Always configure a Fallback.

How does the primary/fallback system work?

Main handles every review by default. If it fails (rate limit, 5xx, timeout), Kodus retries once on Fallback. You pay only for the provider that actually processed the review.

Should I use the same provider for Main and Fallback?

No. Different providers protect against provider-specific outages. A common pairing: Anthropic Main + Google Fallback, or GLM Coding Plan Main + Anthropic Fallback for spike coverage.

Do you store our API keys securely?

Yes. Keys are encrypted at rest and in transit and never logged in plain text. The BYOK status endpoint never returns the raw key.

Can I use a self-hosted LLM (e.g. Ollama, vLLM)?

Yes — via the OpenAI Compatible provider in the manual wizard. Enter your endpoint’s base URL, the model ID it exposes, and a placeholder API key (most self-hosted runtimes ignore the key header but still require one).

Getting Started

Billing & Licenses

AI Code Review

CLI

Cockpit

Kody Issues

Organization

Security

Troubleshooting

Documentation Index

How this maps to plans

​Getting Started

​Recommended Models

Claude Sonnet 4.6

Claude Opus 4.7

Gemini 3.1 Pro (custom tools)

GPT-5.4

Kimi K2.6 Coding

GLM 5.1

​Plan selector (GLM 5.1 and Kimi K2.6)

​Configure manually

​Supported Providers

Novita Setup Guide

Z.ai (GLM) guide

Moonshot (Kimi) guide

Fireworks AI

Together AI

​Reasoning / Extended Thinking

​Preset levels

​Custom JSON override

​Examples (paste directly — no namespace needed)

​Going manual with namespaces (power users)

​Gotchas

​Pinning OpenRouter providers

​How to pin

​Advanced: raw JSON override

​Concurrency and rate limits

​Defaults Kodus pre-fills

​When to tune it

Raise it

Lower it

​Best Practices

​Security

Dedicated Keys

Regular Rotation

Monitor Usage

Secure Storage

​Fallback Strategy

​Troubleshooting

​Frequently Asked Questions

Getting Started

Recommended Models

Plan selector (GLM 5.1 and Kimi K2.6)

Configure manually

Supported Providers

Reasoning / Extended Thinking

Preset levels

Custom JSON override

Examples (paste directly — no namespace needed)

Going manual with namespaces (power users)

Gotchas

Pinning OpenRouter providers

How to pin

Advanced: raw JSON override

Concurrency and rate limits

Defaults Kodus pre-fills

When to tune it

Best Practices

Security

Fallback Strategy

Troubleshooting

Frequently Asked Questions