Moonshot (Kimi) - OpenAI-Compatible Inference

How Moonshot works

Moonshot AI publishes the Kimi family of models (K2, K2.5, K2.6, K2.6 Coding). Kimi is particularly strong on long-context code understanding and agentic workflows, and the API is fully OpenAI-compatible — Kodus talks to it via the OpenAI Compatible provider (or directly through the curated Kimi K2.6 Coding card in BYOK). Moonshot offers two paths to the same model family, each with its own endpoint:

Developer API (platform.moonshot.ai) — pay-per-token, billed per usage. Concurrency scales with your recharge tier.
Kimi Code Plan (kimi.com/code) — subscription with a dedicated coding endpoint. Flat pricing, capped concurrency (30 concurrent).

Moonshot’s consumer Kimi.com chat subscriptions (Andante, Moderato, etc.) are separate from both API paths. Chat subscriptions do not grant API access. Kimi Code Plan is the API-specific subscription.

Moonshot also operates a China-only platform (platform.moonshot.cn, base URL https://api.moonshot.cn/v1) billed in CNY. Use that only if you operate inside mainland China.

Plans at a glance

Kimi Code Plan (subscription)

Attribute	Value
Endpoint	`https://api.kimi.com/coding/v1`
Concurrency	Capped at 30 concurrent requests
Billing	Flat-rate subscription
Keys from	kimi.com/code

Developer API (pay-per-token)

Model	Pricing (1M input / output tokens)	Context Window	Notes
Kimi K2.6 Coding `recommended`	~ $0.60 /$ 2.50	~256k tokens	Latest, tuned for code review.
Kimi K2.5	$0.60 /$ 2.50	~256k tokens	Previous generation, still capable.
Kimi K2 (0905)	lower tier	~128k tokens	Stable general-purpose model.

Developer API endpoint: https://api.moonshot.ai/v1 (international). Concurrency scales with recharge tier — Tier 1 ($10 recharge) starts at ~50 concurrent, up to ~1000 concurrent on Tier 5.

Creating an API Key

A Moonshot account is required to create an API key.

Kimi Code Plan subscriber
Developer API (pay-per-token)

Go to kimi.com/code and subscribe to the plan.
Open the key management area for your subscription.
Create a Kimi Code key and copy it.

Kimi Code keys only work against https://api.kimi.com/coding/v1. They will return 401 if sent to api.moonshot.ai.

Sign in at platform.moonshot.ai (or platform.moonshot.cn if you operate inside mainland China).
Add a payment method — Moonshot may grant a small starter balance when you first add billing.
Open the API Keys section at platform.moonshot.ai/console/api-keys.
Click Create API Key, give it a descriptive name (e.g. kodus-prod), and copy the key immediately.

Developer API keys only work against api.moonshot.ai/v1 (international) or api.moonshot.cn/v1 (China). Keys are not portable between regions.

Configure Moonshot in Kodus

The primary flow is BYOK on Kodus Cloud — the curated Kimi K2.6 Coding card handles the endpoint swap for you. Self-hosted users who prefer fixing the provider at the process level can use environment variables instead.

Option 1 — BYOK on Kodus Cloud (recommended)

Open BYOK and pick Kimi K2.6 Coding

Go to app.kodus.io/organization/byok and click the Kimi K2.6 Coding card in the Main model section.

Select your plan

The card expands with a Plan selector. Pick:

Developer API — if your key is from platform.moonshot.ai
Kimi Code Plan — if your key is from a kimi.com/code subscription

The base URL and “Get a key” link update automatically.

Paste your API key

Just the key. For Kimi Code Plan users, Kodus pre-fills maxConcurrentRequests=30 in Advanced settings (matches the documented cap).

Test & save

Click Test & save. Kodus probes the endpoint with a cheap metadata call and persists the config on success. 401 means the key doesn’t match the selected plan’s endpoint.

Tuning reasoning (optional)

Reasoning is ON by default for Kimi K2.6 Coding — the curated card pre-fills Thinking: Medium, which for OpenAI-compatible providers emits thinking: { type: "enabled" }. Two common overrides:

Disable thinking for faster/cheaper reviews on small PRs:
```
{
  "thinking": { "type": "disabled" }
}
```
Force a specific token budget (if Moonshot adds support for budget_tokens on your tier):
```
{
  "thinking": { "type": "enabled", "budget_tokens": 25000 }
}
```

No namespace wrapping needed — Kodus auto-wraps under openaiCompatible (the active provider) before sending. See the main BYOK doc → Custom JSON override for details.

Tuning concurrency

Kimi Code Plan: keep the pre-filled maxConcurrentRequests=30 (the documented cap). Going higher returns 429.
Developer API: start empty (no cap). Your actual limit scales with your recharge tier — Tier 1 (~ $10 recharge) allows ~50 concurrent; Tier 5 (~$ 3000) allows ~1000. Lower it explicitly if you see 429s at review time.

Configure Kimi as Main and keep an OpenAI or Anthropic key as Fallback — if Moonshot returns 429 or 402, Kodus fails over automatically.

Option 2 — Manual configuration

If you need a Kimi variant not in the curated catalog (e.g. kimi-k2.5 or kimi-k2-0905), click Configure manually at the bottom of the catalog and fill:

Field	Value
Provider	`OpenAI Compatible`
Base URL	`https://api.moonshot.ai/v1` (Developer API) `https://api.kimi.com/coding/v1` (Kimi Code Plan) `https://api.moonshot.cn/v1` (mainland China only)
Model	`kimi-k2.6`, `kimi-k2.6`, `kimi-k2.5`, `kimi-k2-0905`, `kimi-k2`
API Key	your Moonshot or Kimi Code key (matching the base URL above)
Max Concurrent Requests	`30` on Kimi Code Plan; leave empty on Developer API (scales with recharge tier)

Option 3 — Self-hosted (environment variables)

If you run Kodus in Fixed Mode (single global provider, no per-org BYOK), configure Moonshot in the .env of your API + worker containers:

# Moonshot (Kimi) configuration (Fixed Mode)
API_LLM_PROVIDER_MODEL="kimi-k2.6"
API_OPENAI_FORCE_BASE_URL="https://api.moonshot.ai/v1"    # or https://api.kimi.com/coding/v1 for Kimi Code Plan
API_OPEN_AI_API_KEY="your-moonshot-or-kimi-code-api-key"

This path is only needed for self-hosted Kodus installs that deliberately disable BYOK. If BYOK is enabled on your self-hosted instance, prefer Option 1 — the curated card handles the endpoint logic for you.

Restart the API and worker containers after editing .env, then verify the integration:

docker-compose logs api worker | grep -iE "moonshot|kimi"

For the full self-hosted setup (domains, security keys, database, webhooks, reverse proxy), follow the generic VM deployment guide and only swap the LLM block for the one above.

Choosing between Kimi Code Plan, Developer API, and aggregators

Kimi Code Plan — predictable flat-rate cost, 30-concurrent cap, dedicated api.kimi.com/coding/v1 endpoint optimized for coding workflows. Best for steady-state teams with predictable PR volume.
Moonshot Developer API — pay-per-token, concurrency scales with recharge tier, largest flexibility. Best for bursty workloads.
OpenRouter proxy — if you want one billing relationship across many providers, OpenRouter exposes Kimi models with a small routing markup. Pick this when Kimi is part of a mixed-provider fleet, not a primary workload.

Troubleshooting

401 after Test — key doesn't match endpoint

Kimi Code Plan keys only work against api.kimi.com/coding/v1.
Developer API keys from platform.moonshot.ai only work against api.moonshot.ai/v1.
Developer API keys from platform.moonshot.cn only work against api.moonshot.cn/v1.
In the curated card, confirm the Plan selector matches your key origin.

Insufficient balance

Developer API bills pay-per-token. If balance runs out, requests return HTTP 402.
Add funds in the billing section of the console or set a monthly cap to avoid surprises.
Kimi Code Plan has flat pricing but is bound by its 30-concurrent cap and quota windows — 429 means you’ve hit one.

Model not found

Confirm the model name matches the catalog (kimi-k2.6, kimi-k2.6, kimi-k2.5, kimi-k2-0905, kimi-k2).
Check platform.kimi.ai/docs for the current list — new versions ship regularly.

Slow first response

First call after idle periods may cold-start on Moonshot’s side.
If latency matters, kimi-k2-0905 is generally faster than the K2.6 variants for routine reviews.

Region / connectivity

Users outside China should always use api.moonshot.ai or api.kimi.com. api.moonshot.cn may be unreachable or rate-limited from outside mainland China.
Confirm outbound HTTPS to the chosen endpoint is allowed from your Kodus deployment.

Documentation Index

​How Moonshot works

​Plans at a glance

​Kimi Code Plan (subscription)

​Developer API (pay-per-token)

​Creating an API Key

​Configure Moonshot in Kodus

​Option 1 — BYOK on Kodus Cloud (recommended)

​Tuning reasoning (optional)

​Tuning concurrency

​Option 2 — Manual configuration

​Option 3 — Self-hosted (environment variables)

​Choosing between Kimi Code Plan, Developer API, and aggregators

​Troubleshooting

​Related

How Moonshot works

Plans at a glance

Kimi Code Plan (subscription)

Developer API (pay-per-token)

Creating an API Key

Configure Moonshot in Kodus

Option 1 — BYOK on Kodus Cloud (recommended)

Tuning reasoning (optional)

Tuning concurrency

Option 2 — Manual configuration

Option 3 — Self-hosted (environment variables)

Choosing between Kimi Code Plan, Developer API, and aggregators

Troubleshooting

Related