How Z.AI works
Z.AI (developed by Zhipu AI) serves the GLM family of models. It’s one of the few major providers offering a flat-rate subscription for API access: the GLM Coding Plan bundles model usage at a fixed monthly price, with rate limits applied over 5-hour and weekly windows instead of per-token billing. For higher-volume or variable workloads, Z.AI also offers pay-per-token access to the same models on its standard Developer API. Both paths expose an OpenAI-compatible endpoint, so Kodus talks to them via theOpenAI Compatible provider (or directly through the curated GLM 5.1 card in BYOK).
Plans at a glance
Pricing and quotas change regularly. Always confirm current numbers at z.ai/subscribe and docs.z.ai before choosing a tier.
GLM Coding Plan (subscription)
| Tier | Price (monthly equivalent) | Approximate API-value equivalent | Concurrency |
|---|---|---|---|
| Lite | ~$18/mo (billed quarterly) | ~15× the monthly fee | ~1 concurrent |
| Pro | ~$30/mo (billed quarterly) | ~20× the monthly fee | ~1 concurrent |
| Max | ~$80/mo (billed quarterly) | ~30× the monthly fee | up to 30 concurrent |
- Quotas reset on a rolling 5-hour window and a weekly window — plan around the ceiling, not a monthly cap.
- Coverage includes GLM-5.1, GLM-5-Turbo, GLM-5, GLM-4.5, and GLM-4.5-Air.
- Dedicated endpoint:
https://api.z.ai/api/coding/paas/v4— Coding Plan keys only work here.
Pay-per-token Developer API
| Model | Pricing (1M input / output tokens) | Context Window |
|---|---|---|
GLM-5.1 recommended | 3.15 | ~200k tokens |
| GLM-5 | 2.30 | ~131k tokens |
| GLM-4.5 | 2.20 | ~128k tokens |
| GLM-4.5-Air | lower tier, optimized for routing | ~128k tokens |
https://api.z.ai/api/paas/v4/ (OpenAI-compatible).
Creating an API Key
- Coding Plan subscriber
- Developer API (pay-per-token)
- Sign in at z.ai.
- Purchase a GLM Coding Plan tier at z.ai/subscribe.
- Open the key management page for your subscription and create a Coding Plan key.
- Copy the key — you will not be able to see it again.
Coding Plan keys are tied to the
/api/coding/paas/v4 endpoint. They will return 401 if sent against the standard /api/paas/v4/ endpoint.Configure Z.AI in Kodus
The primary flow is BYOK on Kodus Cloud — the curated GLM 5.1 card handles the endpoint swap for you. Self-hosted users who prefer fixing the provider at the process level can use environment variables instead.Option 1 — BYOK on Kodus Cloud (recommended)
Open BYOK and pick GLM 5.1
Go to app.kodus.io/organization/byok and click the GLM 5.1 card in the Main model section.
Select your plan
The card expands with a Plan selector. Pick:
- Developer API — if your key is from z.ai/manage-apikey
- Coding Plan — if your key is from a GLM Coding Plan subscription
Paste your API key
Just the key — nothing else to configure. For Coding Plan users, Kodus pre-fills
maxConcurrentRequests=1 in Advanced settings, which matches Lite/Pro tier limits. Bump this up to 30 if you’re on Max.Tuning reasoning (optional)
The curated GLM 5.1 card pre-fills Thinking: Medium, which for OpenAI-compatible providers emitsthinking: { type: "enabled" }. That’s fine for most workloads. Two cases to override:
-
Force a specific token budget — switch Thinking to Custom under Advanced settings and paste:
-
Disable thinking — for fastest/cheapest reviews on small PRs:
No namespace wrapping needed — Kodus auto-wraps under
openaiCompatible (the active provider) before sending. See the main BYOK doc → Custom JSON override for details.Tuning concurrency
- Coding Plan Lite / Pro: keep the pre-filled
maxConcurrentRequests=1. Going higher returns429 Too much concurrency. - Coding Plan Max: raise to
5first, up to30if you don’t see 429s. Max tier allows up to 30 concurrent. - Developer API: start empty (no cap). Drop to
5if you see rate-limit errors, then tune up.
Configure GLM 5.1 as your Main model and keep an OpenAI or Anthropic key as Fallback so that reviews keep running when your Coding Plan 5-hour window is exhausted. Kodus fails over automatically.
Option 2 — Manual configuration
If you need a GLM variant not in the curated catalog (e.g. GLM-5 or GLM-4.5), click Configure manually at the bottom of the catalog and fill:| Field | Value |
|---|---|
| Provider | OpenAI Compatible |
| Base URL | https://api.z.ai/api/coding/paas/v4 (Coding Plan)https://api.z.ai/api/paas/v4/ (Developer API) |
| Model | glm-5.1, glm-5, glm-5-turbo, glm-4.5, glm-4.5-air |
| API Key | your Z.AI key (matching the base URL above) |
| Max Concurrent Requests | 1 on Lite/Pro Coding Plan; up to 30 on Max; leave empty on Developer API |
Option 3 — Self-hosted (environment variables)
If you run Kodus in Fixed Mode (single global provider, no per-org BYOK), configure Z.AI in the.env of your API + worker containers:
This path is only needed for self-hosted Kodus installs that deliberately disable BYOK. If BYOK is enabled on your self-hosted instance, prefer Option 1 — the curated card handles the endpoint logic for you.
.env, then verify the integration:
Choosing between the Coding Plan and pay-per-token
- Pick the Coding Plan when you have a predictable team of reviewers and want a flat monthly cost. The 5-hour and weekly quotas translate to roughly 15–30× the subscription fee in equivalent API spend.
- Pick pay-per-token when your traffic is bursty, when you need occasional access to the largest context windows, or when you want cost to scale linearly with PR volume.
- Pair them: use the Coding Plan as Main and a Developer API key (or an entirely different provider) as Fallback to cover bursts that exhaust your subscription window.
Troubleshooting
401 after Test — key doesn't match endpoint
401 after Test — key doesn't match endpoint
- Coding Plan keys only work on
/api/coding/paas/v4. Developer API keys only work on/api/paas/v4/. - In the curated card, confirm the Plan selector matches the key type.
- In manual mode, confirm the Base URL matches the key origin.
'Too much concurrency' at review time
'Too much concurrency' at review time
- Lite and Pro Coding Plan tiers typically allow only 1 concurrent request. Kodus pre-fills this for you; raise it only on Max.
- Lower Max concurrent requests in Advanced settings if you’re still hitting 429s.
Quota exhausted on the Coding Plan
Quota exhausted on the Coding Plan
- Quotas are enforced on a 5-hour rolling window and a weekly window. Hitting one of them returns HTTP 429.
- Check remaining quota in the Z.AI console.
- Options: wait for the next window, upgrade to a higher tier, or have a Developer API key configured as Fallback to cover the gap.
Model not found
Model not found
- Verify the model ID matches Z.AI’s catalog (
glm-5.1,glm-5-turbo,glm-5,glm-4.5,glm-4.5-air). - The Coding Plan currently covers only the GLM family — non-GLM model names will be rejected.
Connection errors (timeout, DNS)
Connection errors (timeout, DNS)
- Confirm your server can reach
api.z.ai. - Check the API and worker logs for the exact upstream error.
- If you are in a region with restricted outbound traffic, route requests through a reverse proxy your infra allows.