Documentation Index
Fetch the complete documentation index at: https://docs.kodus.io/llms.txt
Use this file to discover all available pages before exploring further.
How Z.AI works
Z.AI (developed by Zhipu AI) serves the GLM family of models. It’s one of the few major providers offering a flat-rate subscription for API access: the GLM Coding Plan bundles model usage at a fixed monthly price, with rate limits applied over 5-hour and weekly windows instead of per-token billing. For higher-volume or variable workloads, Z.AI also offers pay-per-token access to the same models on its standard Developer API. Both paths expose an OpenAI-compatible endpoint, so Kodus talks to them via theOpenAI Compatible provider (or directly through the curated GLM 5.1 card in BYOK).
Plans at a glance
Pricing and quotas change regularly. Always confirm current numbers at z.ai/subscribe and docs.z.ai before choosing a tier.
GLM Coding Plan (subscription)
| Tier | Price (monthly equivalent) | Approximate API-value equivalent | Concurrency |
|---|---|---|---|
| Lite | ~$18/mo (billed quarterly) | ~15× the monthly fee | ~1 concurrent |
| Pro | ~$30/mo (billed quarterly) | ~20× the monthly fee | ~1 concurrent |
| Max | ~$80/mo (billed quarterly) | ~30× the monthly fee | up to 30 concurrent |
- Quotas reset on a rolling 5-hour window and a weekly window — plan around the ceiling, not a monthly cap.
- Coverage includes GLM-5.1, GLM-5-Turbo, GLM-5, GLM-4.5, and GLM-4.5-Air.
- Dedicated endpoint:
https://api.z.ai/api/coding/paas/v4— Coding Plan keys only work here.
Pay-per-token Developer API
| Model | Pricing (1M input / output tokens) | Context Window |
|---|---|---|
GLM-5.1 recommended | 3.15 | ~200k tokens |
| GLM-5 | 2.30 | ~131k tokens |
| GLM-4.5 | 2.20 | ~128k tokens |
| GLM-4.5-Air | lower tier, optimized for routing | ~128k tokens |
https://api.z.ai/api/paas/v4/ (OpenAI-compatible).
Creating an API Key
- Coding Plan subscriber
- Developer API (pay-per-token)
- Sign in at z.ai.
- Purchase a GLM Coding Plan tier at z.ai/subscribe.
- Open the key management page for your subscription and create a Coding Plan key.
- Copy the key — you will not be able to see it again.
Coding Plan keys are tied to the
/api/coding/paas/v4 endpoint. They will return 401 if sent against the standard /api/paas/v4/ endpoint.Configure Z.AI in Kodus
The primary flow is BYOK on Kodus Cloud — the curated GLM 5.1 card handles the endpoint swap for you. Self-hosted users who prefer fixing the provider at the process level can use environment variables instead.Option 1 — BYOK on Kodus Cloud (recommended)
Open BYOK and pick GLM 5.1
Go to app.kodus.io/organization/byok and click the GLM 5.1 card in the Main model section.
Select your plan
The card expands with a Plan selector. Pick:
- Developer API — if your key is from z.ai/manage-apikey
- Coding Plan — if your key is from a GLM Coding Plan subscription
Paste your API key
Just the key — nothing else to configure. For Coding Plan users, Kodus pre-fills
maxConcurrentRequests=1 in Advanced settings, which matches Lite/Pro tier limits. Bump this up to 30 if you’re on Max.Tuning reasoning (optional)
The curated GLM 5.1 card pre-fills Thinking: Medium, which for OpenAI-compatible providers emitsthinking: { type: "enabled" }. That’s fine for most workloads. Two cases to override:
-
Force a specific token budget — switch Thinking to Custom under Advanced settings and paste:
-
Disable thinking — for fastest/cheapest reviews on small PRs:
No namespace wrapping needed — Kodus auto-wraps under
openaiCompatible (the active provider) before sending. See the main BYOK doc → Custom JSON override for details.Tuning concurrency
- Coding Plan Lite / Pro: keep the pre-filled
maxConcurrentRequests=1. Going higher returns429 Too much concurrency. - Coding Plan Max: raise to
5first, up to30if you don’t see 429s. Max tier allows up to 30 concurrent. - Developer API: start empty (no cap). Drop to
5if you see rate-limit errors, then tune up.
Configure GLM 5.1 as your Main model and keep an OpenAI or Anthropic key as Fallback so that reviews keep running when your Coding Plan 5-hour window is exhausted. Kodus fails over automatically.
Option 2 — Manual configuration
If you need a GLM variant not in the curated catalog (e.g. GLM-5 or GLM-4.5), click Configure manually at the bottom of the catalog and fill:| Field | Value |
|---|---|
| Provider | OpenAI Compatible |
| Base URL | https://api.z.ai/api/coding/paas/v4 (Coding Plan)https://api.z.ai/api/paas/v4/ (Developer API) |
| Model | glm-5.1, glm-5, glm-5-turbo, glm-4.5, glm-4.5-air |
| API Key | your Z.AI key (matching the base URL above) |
| Max Concurrent Requests | 1 on Lite/Pro Coding Plan; up to 30 on Max; leave empty on Developer API |
Option 3 — Self-hosted (environment variables)
If you run Kodus in Fixed Mode (single global provider, no per-org BYOK), configure Z.AI in the.env of your API + worker containers:
This path is only needed for self-hosted Kodus installs that deliberately disable BYOK. If BYOK is enabled on your self-hosted instance, prefer Option 1 — the curated card handles the endpoint logic for you.
.env, then verify the integration:
Choosing between the Coding Plan and pay-per-token
- Pick the Coding Plan when you have a predictable team of reviewers and want a flat monthly cost. The 5-hour and weekly quotas translate to roughly 15–30× the subscription fee in equivalent API spend.
- Pick pay-per-token when your traffic is bursty, when you need occasional access to the largest context windows, or when you want cost to scale linearly with PR volume.
- Pair them: use the Coding Plan as Main and a Developer API key (or an entirely different provider) as Fallback to cover bursts that exhaust your subscription window.
Troubleshooting
401 after Test — key doesn't match endpoint
401 after Test — key doesn't match endpoint
- Coding Plan keys only work on
/api/coding/paas/v4. Developer API keys only work on/api/paas/v4/. - In the curated card, confirm the Plan selector matches the key type.
- In manual mode, confirm the Base URL matches the key origin.
'Too much concurrency' at review time
'Too much concurrency' at review time
- Lite and Pro Coding Plan tiers typically allow only 1 concurrent request. Kodus pre-fills this for you; raise it only on Max.
- Lower Max concurrent requests in Advanced settings if you’re still hitting 429s.
Quota exhausted on the Coding Plan
Quota exhausted on the Coding Plan
- Quotas are enforced on a 5-hour rolling window and a weekly window. Hitting one of them returns HTTP 429.
- Check remaining quota in the Z.AI console.
- Options: wait for the next window, upgrade to a higher tier, or have a Developer API key configured as Fallback to cover the gap.
Model not found
Model not found
- Verify the model ID matches Z.AI’s catalog (
glm-5.1,glm-5-turbo,glm-5,glm-4.5,glm-4.5-air). - The Coding Plan currently covers only the GLM family — non-GLM model names will be rejected.
Connection errors (timeout, DNS)
Connection errors (timeout, DNS)
- Confirm your server can reach
api.z.ai. - Check the API and worker logs for the exact upstream error.
- If you are in a region with restricted outbound traffic, route requests through a reverse proxy your infra allows.