How Z.AI works
Z.AI (developed by Zhipu AI) serves the GLM family of models. It’s one of the few major providers offering a flat-rate subscription for API access: the GLM Coding Plan bundles model usage at a fixed monthly price, with rate limits applied over 5-hour and weekly windows instead of per-token billing. For higher-volume or variable workloads, Z.AI also offers pay-per-token access to the same models on its standard API. Both paths expose OpenAI-compatible and Anthropic-compatible endpoints, so Kodus can talk to them without any adapter changes.Plans at a glance
Pricing and quotas change regularly. Always confirm current numbers at z.ai/subscribe and docs.z.ai before choosing a tier.
GLM Coding Plan (subscription)
| Tier | Price (monthly equivalent) | Approximate API-value equivalent |
|---|---|---|
| Lite | ~$18/mo (billed quarterly) | ~15× the monthly fee |
| Pro | ~$30/mo (billed quarterly) | ~20× the monthly fee |
| Max | ~$80/mo (billed quarterly) | ~30× the monthly fee |
- Quotas reset on a rolling 5-hour window and a weekly window — this is the ceiling to plan around, not a monthly cap.
- Coverage includes GLM-5.1, GLM-5-Turbo, GLM-5, GLM-4.5, and GLM-4.5-Air.
- Dedicated endpoint:
https://api.z.ai/api/coding/paas/v4(OpenAI-compatible) orhttps://api.z.ai/api/anthropic(Anthropic-compatible).
Pay-per-token API
| Model | Pricing (1M input / output tokens) | Context Window |
|---|---|---|
GLM-5.1 recommended | 3.15 | ~200k tokens |
| GLM-5 | 2.30 | ~131k tokens |
| GLM-4.5 | 2.20 | ~128k tokens |
| GLM-4.5-Air | lower tier, optimized for routing | ~128k tokens |
https://api.z.ai/api/paas/v4 (OpenAI-compatible).
Creating an API Key
- Go to z.ai and create an account (or sign in).
- If you want the subscription, purchase a GLM Coding Plan tier at z.ai/subscribe. Without this, your key bills pay-per-token.
- Open the API Keys section in the console.
- Click Create API Key, give it a descriptive name (e.g.
kodus-prod), and copy the key — you will not be able to see it again.
The same API key works against both the Coding Plan endpoint and the pay-per-token endpoint. Kodus will bill against whichever endpoint URL you configure.
Configure Z.AI in Kodus
The primary flow is BYOK on Kodus Cloud — you paste the Z.AI key into the web UI and you’re done. Self-hosted users who prefer fixing the provider at the process level can use environment variables instead.Option 1 — BYOK on Kodus Cloud (recommended)
- In the Kodus web UI, open Settings → BYOK and click Edit on the Main model (or Fallback, if you want Z.AI as a backup only).
- Toggle the form into Custom mode so you can enter a base URL and a free-text model name.
-
Fill the fields:
Field Value Provider OpenAI CompatibleAPI Key your Z.AI API key Base URL https://api.z.ai/api/coding/paas/v4— for GLM Coding Plan subscribershttps://api.z.ai/api/paas/v4— for pay-per-token accountsModel glm-5.1(recommended) — orglm-5,glm-5-turbo,glm-4.5,glm-4.5-airMax Concurrent Requests start at 3–5on Coding Plan, higher on pay-per-tokenMax Output Tokens leave default unless you hit truncation - Save. Kodus validates the key against the endpoint and surfaces any 401 / 404 immediately.
- Open any PR to trigger a review and confirm Z.AI is now serving responses — the BYOK status badge in Settings turns green on the first successful call.
You can configure Z.AI as your Main model and keep an OpenAI or Anthropic key as Fallback so that reviews keep running when your Coding Plan window is exhausted. Kodus fails over automatically.
Option 2 — Self-hosted (environment variables)
If you run Kodus in Fixed Mode (single global provider, no per-org BYOK), configure Z.AI in the.env of your API + worker containers:
This path is only needed for self-hosted Kodus installs that deliberately disable BYOK. If BYOK is enabled on your self-hosted instance, prefer Option 1 — the UI-based flow is the same as on Cloud.
.env, then verify the integration:
Choosing between the Coding Plan and pay-per-token
- Pick the Coding Plan when you have a predictable team of reviewers and want a flat monthly cost. The 5-hour and weekly quotas translate to roughly 15–30× the subscription fee in equivalent API spend.
- Pick pay-per-token when your traffic is bursty, when you need occasional access to the largest context windows, or when you want cost to scale linearly with PR volume.
- You can switch endpoints at any time by changing
API_OPENAI_FORCE_BASE_URL(self-hosted) or the BYOK base URL (cloud) — the API key is the same.
Troubleshooting
Quota exhausted on the Coding Plan
Quota exhausted on the Coding Plan
- Quotas are enforced on a 5-hour rolling window and a weekly window. Hitting one of them returns HTTP 429.
- Check remaining quota in the Z.AI console.
- Either wait for the next window to reset, upgrade to a higher tier, or temporarily switch the base URL to
https://api.z.ai/api/paas/v4to use pay-per-token credits for the spike.
401 / authentication errors
401 / authentication errors
- Confirm the key is active in the Z.AI console.
- Make sure there are no trailing spaces or quotes in the
.envvalue. - Keys are global across Z.AI endpoints — the same key works for both Coding Plan and pay-per-token.
Model not found
Model not found
- Verify the model name matches one listed in the Z.AI model catalog (e.g.
glm-5.1,glm-5-turbo,glm-4.5). - The Coding Plan currently covers only the GLM family — non-GLM model names will be rejected.
Connection errors
Connection errors
- Confirm your server can reach
api.z.ai. - Check the API and worker logs for the exact upstream error.
- If you are in a region with restricted outbound traffic, route requests through a reverse proxy your infra allows.
Rate limiting on pay-per-token
Rate limiting on pay-per-token
- The standard API enforces per-account rate limits separate from the Coding Plan quotas.
- Lower concurrency by capping
maxConcurrentRequestson the BYOK config, or spread large code reviews across more time.