Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.kodus.io/llms.txt

Use this file to discover all available pages before exploring further.

How Z.AI works

Z.AI (developed by Zhipu AI) serves the GLM family of models. It’s one of the few major providers offering a flat-rate subscription for API access: the GLM Coding Plan bundles model usage at a fixed monthly price, with rate limits applied over 5-hour and weekly windows instead of per-token billing. For higher-volume or variable workloads, Z.AI also offers pay-per-token access to the same models on its standard Developer API. Both paths expose an OpenAI-compatible endpoint, so Kodus talks to them via the OpenAI Compatible provider (or directly through the curated GLM 5.1 card in BYOK).

Plans at a glance

Pricing and quotas change regularly. Always confirm current numbers at z.ai/subscribe and docs.z.ai before choosing a tier.

GLM Coding Plan (subscription)

TierPrice (monthly equivalent)Approximate API-value equivalentConcurrency
Lite~$18/mo (billed quarterly)~15× the monthly fee~1 concurrent
Pro~$30/mo (billed quarterly)~20× the monthly fee~1 concurrent
Max~$80/mo (billed quarterly)~30× the monthly feeup to 30 concurrent
  • Quotas reset on a rolling 5-hour window and a weekly window — plan around the ceiling, not a monthly cap.
  • Coverage includes GLM-5.1, GLM-5-Turbo, GLM-5, GLM-4.5, and GLM-4.5-Air.
  • Dedicated endpoint: https://api.z.ai/api/coding/paas/v4 — Coding Plan keys only work here.

Pay-per-token Developer API

ModelPricing (1M input / output tokens)Context Window
GLM-5.1 recommended0.95/0.95 / 3.15~200k tokens
GLM-50.72/0.72 / 2.30~131k tokens
GLM-4.50.60/0.60 / 2.20~128k tokens
GLM-4.5-Airlower tier, optimized for routing~128k tokens
Standard endpoint: https://api.z.ai/api/paas/v4/ (OpenAI-compatible).

Creating an API Key

A Z.AI account is required to create an API key.
  1. Sign in at z.ai.
  2. Purchase a GLM Coding Plan tier at z.ai/subscribe.
  3. Open the key management page for your subscription and create a Coding Plan key.
  4. Copy the key — you will not be able to see it again.
Coding Plan keys are tied to the /api/coding/paas/v4 endpoint. They will return 401 if sent against the standard /api/paas/v4/ endpoint.

Configure Z.AI in Kodus

The primary flow is BYOK on Kodus Cloud — the curated GLM 5.1 card handles the endpoint swap for you. Self-hosted users who prefer fixing the provider at the process level can use environment variables instead.
1

Open BYOK and pick GLM 5.1

Go to app.kodus.io/organization/byok and click the GLM 5.1 card in the Main model section.
2

Select your plan

The card expands with a Plan selector. Pick:The base URL and “Get a key” link update automatically to match your plan.
3

Paste your API key

Just the key — nothing else to configure. For Coding Plan users, Kodus pre-fills maxConcurrentRequests=1 in Advanced settings, which matches Lite/Pro tier limits. Bump this up to 30 if you’re on Max.
4

Test & save

Click Test & save. Kodus probes the endpoint with a cheap metadata call and persists the config on success. 401 means the key doesn’t match the selected plan’s endpoint; 404 means the base URL is wrong.

Tuning reasoning (optional)

The curated GLM 5.1 card pre-fills Thinking: Medium, which for OpenAI-compatible providers emits thinking: { type: "enabled" }. That’s fine for most workloads. Two cases to override:
  • Force a specific token budget — switch Thinking to Custom under Advanced settings and paste:
    {
      "thinking": { "type": "enabled", "budget_tokens": 20000 }
    }
    
  • Disable thinking — for fastest/cheapest reviews on small PRs:
    {
      "thinking": { "type": "disabled" }
    }
    
No namespace wrapping needed — Kodus auto-wraps under openaiCompatible (the active provider) before sending. See the main BYOK doc → Custom JSON override for details.

Tuning concurrency

  • Coding Plan Lite / Pro: keep the pre-filled maxConcurrentRequests=1. Going higher returns 429 Too much concurrency.
  • Coding Plan Max: raise to 5 first, up to 30 if you don’t see 429s. Max tier allows up to 30 concurrent.
  • Developer API: start empty (no cap). Drop to 5 if you see rate-limit errors, then tune up.
Configure GLM 5.1 as your Main model and keep an OpenAI or Anthropic key as Fallback so that reviews keep running when your Coding Plan 5-hour window is exhausted. Kodus fails over automatically.

Option 2 — Manual configuration

If you need a GLM variant not in the curated catalog (e.g. GLM-5 or GLM-4.5), click Configure manually at the bottom of the catalog and fill:
FieldValue
ProviderOpenAI Compatible
Base URLhttps://api.z.ai/api/coding/paas/v4 (Coding Plan)
https://api.z.ai/api/paas/v4/ (Developer API)
Modelglm-5.1, glm-5, glm-5-turbo, glm-4.5, glm-4.5-air
API Keyyour Z.AI key (matching the base URL above)
Max Concurrent Requests1 on Lite/Pro Coding Plan; up to 30 on Max; leave empty on Developer API

Option 3 — Self-hosted (environment variables)

If you run Kodus in Fixed Mode (single global provider, no per-org BYOK), configure Z.AI in the .env of your API + worker containers:
# Z.AI configuration (Fixed Mode)
API_LLM_PROVIDER_MODEL="glm-5.1"                                  # any GLM model you have access to
API_OPENAI_FORCE_BASE_URL="https://api.z.ai/api/coding/paas/v4"   # use /api/paas/v4/ for pay-per-token
API_OPEN_AI_API_KEY="your-z-ai-api-key"
This path is only needed for self-hosted Kodus installs that deliberately disable BYOK. If BYOK is enabled on your self-hosted instance, prefer Option 1 — the curated card handles the endpoint logic for you.
Restart the API and worker containers after editing .env, then verify the integration:
docker-compose logs api worker | grep -iE "z\.ai|glm"
For the full self-hosted setup (domains, security keys, database, webhooks, reverse proxy), follow the generic VM deployment guide and only swap the LLM block for the one above.

Choosing between the Coding Plan and pay-per-token

  • Pick the Coding Plan when you have a predictable team of reviewers and want a flat monthly cost. The 5-hour and weekly quotas translate to roughly 15–30× the subscription fee in equivalent API spend.
  • Pick pay-per-token when your traffic is bursty, when you need occasional access to the largest context windows, or when you want cost to scale linearly with PR volume.
  • Pair them: use the Coding Plan as Main and a Developer API key (or an entirely different provider) as Fallback to cover bursts that exhaust your subscription window.

Troubleshooting

  • Coding Plan keys only work on /api/coding/paas/v4. Developer API keys only work on /api/paas/v4/.
  • In the curated card, confirm the Plan selector matches the key type.
  • In manual mode, confirm the Base URL matches the key origin.
  • Lite and Pro Coding Plan tiers typically allow only 1 concurrent request. Kodus pre-fills this for you; raise it only on Max.
  • Lower Max concurrent requests in Advanced settings if you’re still hitting 429s.
  • Quotas are enforced on a 5-hour rolling window and a weekly window. Hitting one of them returns HTTP 429.
  • Check remaining quota in the Z.AI console.
  • Options: wait for the next window, upgrade to a higher tier, or have a Developer API key configured as Fallback to cover the gap.
  • Verify the model ID matches Z.AI’s catalog (glm-5.1, glm-5-turbo, glm-5, glm-4.5, glm-4.5-air).
  • The Coding Plan currently covers only the GLM family — non-GLM model names will be rejected.
  • Confirm your server can reach api.z.ai.
  • Check the API and worker logs for the exact upstream error.
  • If you are in a region with restricted outbound traffic, route requests through a reverse proxy your infra allows.