How this maps to plans
Getting Started
The BYOK screen has two paths: pick a recommended model from the curated catalog (fastest path, 90% of cases) or configure any provider manually (escape hatch for custom endpoints or uncurated models).Open BYOK Settings
Pick a recommended model
Paste your API key and test
Recommended Models
These six models are curated for code review. They all appear in the catalog on/organization/byok and come pre-tuned with sensible defaults (temperature, max output tokens, and reasoning effort set to medium).
Claude Sonnet 4.6
- Provider: Anthropic
- Model ID:
claude-sonnet-4-6 - Key: console.anthropic.com
Claude Opus 4.7
- Provider: Anthropic
- Model ID:
claude-opus-4-7 - Key: console.anthropic.com
Gemini 3.1 Pro (custom tools)
- Provider: Google Gemini
- Model ID:
gemini-3.1-pro-preview-customtools - Key: aistudio.google.com/apikey
GPT-5.4
- Provider: OpenAI
- Model ID:
gpt-5.4 - Key: platform.openai.com/api-keys
Kimi K2.6 Coding
- Provider: OpenAI-compatible (Moonshot AI)
- Model ID:
kimi-k2.6 - Keys: platform.moonshot.ai or kimi.com/code
GLM 5.1
- Provider: OpenAI-compatible (Z.ai)
- Model ID:
glm-5.1 - Keys: z.ai console or z.ai/subscribe
Plan selector (GLM 5.1 and Kimi K2.6)
Z.ai and Moonshot both offer a subscription plan with a different endpoint than their pay-per-token Developer API. The curated card for each of these models shows a Plan selector so you can pick the right endpoint before pasting your key.- GLM 5.1 (Z.ai)
- Kimi K2.6 (Moonshot AI)
| Plan | Endpoint | Keys from | Best for |
|---|---|---|---|
| Developer API | https://api.z.ai/api/paas/v4/ | z.ai/manage-apikey | Bursty workloads, pay-per-token |
| Coding Plan | https://api.z.ai/api/coding/paas/v4 | z.ai/subscribe | Predictable team volume, flat monthly fee |
Configure manually
When the model you want isn’t in the curated list (custom endpoint, self-hosted LLM, or a provider we haven’t benchmarked), click Configure manually at the bottom of the catalog. This opens/organization/byok/manual?slot=main — a step-by-step wizard:
Pick a provider
Enter the base URL (if required)
Pick or type the model ID
Tune advanced settings (optional)
Supported Providers
- OpenAI
- Google Gemini
- Anthropic Claude
- Novita AI
- OpenRouter
- OpenAI Compatible
- Visit OpenAI API Keys
- Create a new key for Kodus
- Add billing information
Reasoning / Extended Thinking
All six recommended models support reasoning. The BYOK form exposes a Thinking toggle (Off / Low / Medium / High / Custom) under Advanced settings, pre-filled to Medium for every recommended model.Preset levels
When you pick Low / Medium / High, Kodus translates the level to each provider’s native format automatically:| Provider | How “medium” maps |
|---|---|
| Anthropic (Claude Sonnet 4.6 / Opus 4.7) | thinking: { type: "adaptive" } + outputConfig: { effort: "medium" } |
| Google (Gemini 3.1 Pro) | thinkingConfig: { thinkingLevel: "medium" } |
| OpenAI (GPT-5.4) | reasoningEffort: "medium" |
| OpenRouter | reasoning: { effort: "medium" } |
| OpenAI-compatible (Kimi K2.6 / GLM 5.1) | thinking: { type: "enabled" } — binary on/off, level ignored |
Custom JSON override
Picking Custom in the Thinking toggle reveals a JSON textarea. Paste the provider options directly — Kodus auto-wraps them under the active provider’s namespace. You don’t need to know the Vercel AI SDK routing rules. Use this when:- You need a specific
budgetTokensvalue for Claude (instead of the preset effort mapping) - You want to enable/disable thinking on a per-model basis for OpenAI-compatible providers
- You want fields beyond reasoning — caching, service tier, safety settings,
usertagging, etc. The override is merged intoproviderOptions, so any adapter field passes through - The provider ships a new field Kodus hasn’t wrapped yet
Examples (paste directly — no namespace needed)
- Anthropic
- Google Gemini
- OpenAI
- OpenRouter
- OpenAI-compatible (Kimi, GLM, etc.)
Going manual with namespaces (power users)
If your JSON already contains a known namespace key at the top level (anthropic, google, openai, openrouter, openaiCompatible), Kodus leaves it untouched. Useful if you want to mix multiple provider namespaces or be explicit:
| BYOK provider | Namespace key |
|---|---|
anthropic | anthropic |
google_gemini / google_vertex | google |
openai | openai |
open_router | openrouter |
openai_compatible / novita | openaiCompatible |
Gotchas
- Valid JSON only. Missing commas or trailing commas break the parse and Kodus ignores the override.
- Precedence: the JSON override fully replaces the effort-preset’s namespace block — if you override
anthropic.thinkingbut forgetanthropic.outputConfig, that field won’t be sent. OpenRouter routing (Pin providers / Allow fallbacks) is the one exception: it deep-merges with your override underopenrouter. - Unknown provider = no wrap. If your BYOK provider isn’t in the namespace table above, Kodus passes the JSON through as-is. Rare — only applies if you configure a provider Kodus doesn’t recognize.
Pinning OpenRouter providers
OpenRouter is a router — when you request a model (e.g.moonshotai/kimi-k2.5), it forwards the call to one of several upstream providers (Moonshot direct, Together, Groq, Fireworks, Novita…). Each call can land on a different backend. That’s convenient, but it introduces silent variance:
- Quality drift — upstreams run different precisions (FP8, INT4, full) and give subtly different outputs for identical prompts
- Tool-calling inconsistency — some backends don’t support function calling the same way, leading to malformed tool use
- Reasoning format variance — one upstream honors
reasoning_effort, another onlythinking.enabled, another ignores both - Latency swings — p50 can jump from 800ms to 4s between calls as routing changes
- Rate-limit surprises — you hit quota on a backend you didn’t explicitly choose
How to pin
When your BYOK provider is OpenRouter, the Advanced settings panel shows an OpenRouter routing section with two fields:- Pin providers (in order) — comma-separated list of upstream names (e.g.
moonshot, together). OpenRouter tries them in order and uses the first available. - Allow fallbacks — when off, requests hard-fail if none of the pinned providers are available. When on (default), OpenRouter can fall back to any other upstream that serves the model.
Advanced: raw JSON override
If you need fields beyondorder and allow_fallbacks (e.g. ignore, data_collection, require_parameters), switch Thinking to Custom in Advanced settings and paste the full routing payload — it’s merged into providerOptions alongside any reasoning config:
Concurrency and rate limits
ThemaxConcurrentRequests field (under Advanced settings) caps how many inflight requests Kodus sends to your provider in parallel. Most of the time, the default is fine — but subscription plans with strict concurrency caps need it set explicitly.
Defaults Kodus pre-fills
| Provider / plan | Pre-filled value | Why |
|---|---|---|
| GLM Coding Plan (Lite/Pro) | 1 | Subscription allows only one in-flight request. Going higher triggers 429s. |
| GLM Coding Plan (Max) | 1 (bump manually) | Max allows up to 30, but we default to the safe value. Raise in Advanced settings. |
| Kimi Code Plan | 30 | Moonshot’s documented cap on the coding endpoint. |
| GLM Developer API | (empty) | Limits scale per key; no sensible global default. |
| Kimi Developer API | (empty) | Scales with your recharge tier (Tier 1 ≈ 50, Tier 5 ≈ 1000). |
| Anthropic / OpenAI / Google / OpenRouter | (empty) | Providers enforce their own TPM/RPM; Kodus doesn’t cap. |
When to tune it
Raise it
- You have a high-tier recharge on Moonshot/OpenRouter and want higher throughput on big PRs
- You bumped your GLM Coding Plan to Max and want to use the full 30-concurrent budget
- Reviews feel serialized on multi-file PRs and you’re not seeing 429s
Lower it
- You see
429orToo much concurrencyerrors in review logs - Your provider warns about rate limits on the dashboard
- You want to conserve Coding Plan window (5h/weekly) across more PRs
maxConcurrentRequests applies. Setting a generous Fallback on a different provider is a good way to absorb bursts when your Main is on a tight subscription.Best Practices
Security
Dedicated Keys
Regular Rotation
Monitor Usage
Secure Storage
Fallback Strategy
- Use a different provider for Main and Fallback (e.g. Anthropic main, Google fallback). Protects against provider-specific outages.
- Subscriptions with tight concurrency limits (GLM Coding Plan Lite/Pro, Kimi Code Plan) make poor solo configurations — pair them with a pay-per-token Fallback so bursty PRs don’t starve.
Troubleshooting
'Invalid API key' when clicking Test
'Invalid API key' when clicking Test
- Copy the key without extra spaces, quotes, or trailing newlines.
- Confirm billing is enabled and the account has credits.
- For GLM Coding Plan / Kimi Code Plan keys, make sure you picked the matching Plan in the card — subscription keys don’t work on the Developer API endpoint and vice versa.
'Endpoint not found' when clicking Test
'Endpoint not found' when clicking Test
- Verify the base URL matches the provider exactly (trailing slash matters for some).
- For OpenAI-compatible providers, the models endpoint is usually
{baseURL}/models.
Model not found at review time (key test passed)
Model not found at review time (key test passed)
- The Test button validates the key/endpoint but doesn’t verify the specific model ID. If you typed a model that doesn’t exist (typo), the first real review fails.
- Cross-check the model ID against the provider’s catalog before saving.
'Rate limited' or 'Too much concurrency'
'Rate limited' or 'Too much concurrency'
- Lower Max concurrent requests in Advanced settings.
- On GLM Coding Plan Lite/Pro, stay at 1 concurrent. Upgrade to Max (30 concurrent) if you need more throughput.
- On Kimi Code Plan, the documented cap is 30 concurrent.
Self-hosted env vars not showing
Self-hosted env vars not showing
- If Kodus is configured via
.env(self-hosted Fixed Mode), the BYOK screen shows a blue info banner with the active provider/model — the key is never displayed for security. - Saving a BYOK config on top of
.envprompts a confirm dialog before overriding.
High or unexpected costs
High or unexpected costs
- Reasoning adds tokens. If cost is spiking, lower Thinking from Medium to Low, or switch to a cheaper model for Main.
- Check your provider dashboard for the per-model breakdown.
- Set a monthly cap at the provider level.
Frequently Asked Questions
Can I switch providers anytime?
Can I switch providers anytime?
What happens if my API key runs out of credits?
What happens if my API key runs out of credits?
How does the primary/fallback system work?
How does the primary/fallback system work?
Should I use the same provider for Main and Fallback?
Should I use the same provider for Main and Fallback?
Do you store our API keys securely?
Do you store our API keys securely?
Can I use a self-hosted LLM (e.g. Ollama, vLLM)?
Can I use a self-hosted LLM (e.g. Ollama, vLLM)?