BYOK (Bring Your Own Key) is the default way Kodus uses LLMs across every plan — Community, Teams, and Enterprise. You connect your own provider account, pick a model, pay only for what you use, and monitor costs directly on your provider’s dashboard. Kodus never marks up tokens and never sees your key in plain text.Documentation Index
Fetch the complete documentation index at: https://docs.kodus.io/llms.txt
Use this file to discover all available pages before exploring further.
How this maps to plans
Getting Started
The BYOK screen has two paths: pick a recommended model from the curated catalog (fastest path, 90% of cases) or configure any provider manually (escape hatch for custom endpoints or uncurated models).Open BYOK Settings
Pick a recommended model
Paste your API key and test
Recommended Models
These six models are curated for code review. They all appear in the catalog on/organization/byok and come pre-tuned with sensible defaults (temperature, max output tokens, and reasoning effort set to medium).
Claude Sonnet 4.6
- Provider: Anthropic
- Model ID:
claude-sonnet-4-6 - Key: console.anthropic.com
Claude Opus 4.7
- Provider: Anthropic
- Model ID:
claude-opus-4-7 - Key: console.anthropic.com
Gemini 3.1 Pro (custom tools)
- Provider: Google Gemini
- Model ID:
gemini-3.1-pro-preview-customtools - Key: aistudio.google.com/apikey
GPT-5.4
- Provider: OpenAI
- Model ID:
gpt-5.4 - Key: platform.openai.com/api-keys
Kimi K2.6 Coding
- Provider: OpenAI-compatible (Moonshot AI)
- Model ID:
kimi-k2.6-coding - Keys: platform.moonshot.ai or kimi.com/code
GLM 5.1
- Provider: OpenAI-compatible (Z.ai)
- Model ID:
glm-5.1 - Keys: z.ai console or z.ai/subscribe
Plan selector (GLM 5.1 and Kimi K2.6)
Z.ai and Moonshot both offer a subscription plan with a different endpoint than their pay-per-token Developer API. The curated card for each of these models shows a Plan selector so you can pick the right endpoint before pasting your key.- GLM 5.1 (Z.ai)
- Kimi K2.6 (Moonshot AI)
| Plan | Endpoint | Keys from | Best for |
|---|---|---|---|
| Developer API | https://api.z.ai/api/paas/v4/ | z.ai/manage-apikey | Bursty workloads, pay-per-token |
| Coding Plan | https://api.z.ai/api/coding/paas/v4 | z.ai/subscribe | Predictable team volume, flat monthly fee |
Configure manually
When the model you want isn’t in the curated list (custom endpoint, self-hosted LLM, or a provider we haven’t benchmarked), click Configure manually at the bottom of the catalog. This opens/organization/byok/manual?slot=main — a step-by-step wizard:
Pick a provider
Enter the base URL (if required)
Pick or type the model ID
Tune advanced settings (optional)
Supported Providers
- OpenAI
- Google Gemini
- Anthropic Claude
- Novita AI
- OpenRouter
- OpenAI Compatible
- Visit OpenAI API Keys
- Create a new key for Kodus
- Add billing information
Reasoning / Extended Thinking
All six recommended models support reasoning. The BYOK form exposes a Thinking toggle (Off / Low / Medium / High / Custom) under Advanced settings, pre-filled to Medium for every recommended model.Preset levels
When you pick Low / Medium / High, Kodus translates the level to each provider’s native format automatically:| Provider | How “medium” maps |
|---|---|
| Anthropic (Claude Sonnet 4.6 / Opus 4.7) | thinking: { type: "adaptive" } + outputConfig: { effort: "medium" } |
| Google (Gemini 3.1 Pro) | thinkingConfig: { thinkingLevel: "medium" } |
| OpenAI (GPT-5.4) | reasoningEffort: "medium" |
| OpenRouter | reasoning: { effort: "medium" } |
| OpenAI-compatible (Kimi K2.6 / GLM 5.1) | thinking: { type: "enabled" } — binary on/off, level ignored |
Custom JSON override
Picking Custom in the Thinking toggle reveals a JSON textarea. Paste the provider options directly — Kodus auto-wraps them under the active provider’s namespace. You don’t need to know the Vercel AI SDK routing rules. Use this when:- You need a specific
budgetTokensvalue for Claude (instead of the preset effort mapping) - You want to enable/disable thinking on a per-model basis for OpenAI-compatible providers
- You want fields beyond reasoning — caching, service tier, safety settings,
usertagging, etc. The override is merged intoproviderOptions, so any adapter field passes through - The provider ships a new field Kodus hasn’t wrapped yet
Examples (paste directly — no namespace needed)
- Anthropic
- Google Gemini
- OpenAI
- OpenRouter
- OpenAI-compatible (Kimi, GLM, etc.)
Going manual with namespaces (power users)
If your JSON already contains a known namespace key at the top level (anthropic, google, openai, openrouter, openaiCompatible), Kodus leaves it untouched. Useful if you want to mix multiple provider namespaces or be explicit:
| BYOK provider | Namespace key |
|---|---|
anthropic | anthropic |
google_gemini / google_vertex | google |
openai | openai |
open_router | openrouter |
openai_compatible / novita | openaiCompatible |
Gotchas
- Valid JSON only. Missing commas or trailing commas break the parse and Kodus ignores the override.
- Precedence: the JSON override fully replaces the effort-preset’s namespace block — if you override
anthropic.thinkingbut forgetanthropic.outputConfig, that field won’t be sent. OpenRouter routing (Pin providers / Allow fallbacks) is the one exception: it deep-merges with your override underopenrouter. - Unknown provider = no wrap. If your BYOK provider isn’t in the namespace table above, Kodus passes the JSON through as-is. Rare — only applies if you configure a provider Kodus doesn’t recognize.
Pinning OpenRouter providers
OpenRouter is a router — when you request a model (e.g.moonshotai/kimi-k2.5), it forwards the call to one of several upstream providers (Moonshot direct, Together, Groq, Fireworks, Novita…). Each call can land on a different backend. That’s convenient, but it introduces silent variance:
- Quality drift — upstreams run different precisions (FP8, INT4, full) and give subtly different outputs for identical prompts
- Tool-calling inconsistency — some backends don’t support function calling the same way, leading to malformed tool use
- Reasoning format variance — one upstream honors
reasoning_effort, another onlythinking.enabled, another ignores both - Latency swings — p50 can jump from 800ms to 4s between calls as routing changes
- Rate-limit surprises — you hit quota on a backend you didn’t explicitly choose
How to pin
When your BYOK provider is OpenRouter, the Advanced settings panel shows an OpenRouter routing section with two fields:- Pin providers (in order) — comma-separated list of upstream names (e.g.
moonshot, together). OpenRouter tries them in order and uses the first available. - Allow fallbacks — when off, requests hard-fail if none of the pinned providers are available. When on (default), OpenRouter can fall back to any other upstream that serves the model.
Advanced: raw JSON override
If you need fields beyondorder and allow_fallbacks (e.g. ignore, data_collection, require_parameters), switch Thinking to Custom in Advanced settings and paste the full routing payload — it’s merged into providerOptions alongside any reasoning config:
Concurrency and rate limits
ThemaxConcurrentRequests field (under Advanced settings) caps how many inflight requests Kodus sends to your provider in parallel. Most of the time, the default is fine — but subscription plans with strict concurrency caps need it set explicitly.
Defaults Kodus pre-fills
| Provider / plan | Pre-filled value | Why |
|---|---|---|
| GLM Coding Plan (Lite/Pro) | 1 | Subscription allows only one in-flight request. Going higher triggers 429s. |
| GLM Coding Plan (Max) | 1 (bump manually) | Max allows up to 30, but we default to the safe value. Raise in Advanced settings. |
| Kimi Code Plan | 30 | Moonshot’s documented cap on the coding endpoint. |
| GLM Developer API | (empty) | Limits scale per key; no sensible global default. |
| Kimi Developer API | (empty) | Scales with your recharge tier (Tier 1 ≈ 50, Tier 5 ≈ 1000). |
| Anthropic / OpenAI / Google / OpenRouter | (empty) | Providers enforce their own TPM/RPM; Kodus doesn’t cap. |
When to tune it
Raise it
- You have a high-tier recharge on Moonshot/OpenRouter and want higher throughput on big PRs
- You bumped your GLM Coding Plan to Max and want to use the full 30-concurrent budget
- Reviews feel serialized on multi-file PRs and you’re not seeing 429s
Lower it
- You see
429orToo much concurrencyerrors in review logs - Your provider warns about rate limits on the dashboard
- You want to conserve Coding Plan window (5h/weekly) across more PRs
maxConcurrentRequests applies. Setting a generous Fallback on a different provider is a good way to absorb bursts when your Main is on a tight subscription.Best Practices
Security
Dedicated Keys
Regular Rotation
Monitor Usage
Secure Storage
Fallback Strategy
- Use a different provider for Main and Fallback (e.g. Anthropic main, Google fallback). Protects against provider-specific outages.
- Subscriptions with tight concurrency limits (GLM Coding Plan Lite/Pro, Kimi Code Plan) make poor solo configurations — pair them with a pay-per-token Fallback so bursty PRs don’t starve.
Troubleshooting
'Invalid API key' when clicking Test
'Invalid API key' when clicking Test
- Copy the key without extra spaces, quotes, or trailing newlines.
- Confirm billing is enabled and the account has credits.
- For GLM Coding Plan / Kimi Code Plan keys, make sure you picked the matching Plan in the card — subscription keys don’t work on the Developer API endpoint and vice versa.
'Endpoint not found' when clicking Test
'Endpoint not found' when clicking Test
- Verify the base URL matches the provider exactly (trailing slash matters for some).
- For OpenAI-compatible providers, the models endpoint is usually
{baseURL}/models.
Model not found at review time (key test passed)
Model not found at review time (key test passed)
- The Test button validates the key/endpoint but doesn’t verify the specific model ID. If you typed a model that doesn’t exist (typo), the first real review fails.
- Cross-check the model ID against the provider’s catalog before saving.
'Rate limited' or 'Too much concurrency'
'Rate limited' or 'Too much concurrency'
- Lower Max concurrent requests in Advanced settings.
- On GLM Coding Plan Lite/Pro, stay at 1 concurrent. Upgrade to Max (30 concurrent) if you need more throughput.
- On Kimi Code Plan, the documented cap is 30 concurrent.
Self-hosted env vars not showing
Self-hosted env vars not showing
- If Kodus is configured via
.env(self-hosted Fixed Mode), the BYOK screen shows a blue info banner with the active provider/model — the key is never displayed for security. - Saving a BYOK config on top of
.envprompts a confirm dialog before overriding.
High or unexpected costs
High or unexpected costs
- Reasoning adds tokens. If cost is spiking, lower Thinking from Medium to Low, or switch to a cheaper model for Main.
- Check your provider dashboard for the per-model breakdown.
- Set a monthly cap at the provider level.
Frequently Asked Questions
Can I switch providers anytime?
Can I switch providers anytime?
What happens if my API key runs out of credits?
What happens if my API key runs out of credits?
How does the primary/fallback system work?
How does the primary/fallback system work?
Should I use the same provider for Main and Fallback?
Should I use the same provider for Main and Fallback?
Do you store our API keys securely?
Do you store our API keys securely?
Can I use a self-hosted LLM (e.g. Ollama, vLLM)?
Can I use a self-hosted LLM (e.g. Ollama, vLLM)?