Sypha AI Docs
Provider config

Z AI (Zhipu AI)

Learn how to configure and use Z AI's GLM-4.5 models with Sypha. Experience advanced hybrid reasoning, agentic capabilities, and open-source excellence with regional optimization.

Z AI (previously known as Zhipu AI) delivers the innovative GLM-4.5 series, showcasing hybrid reasoning abilities and agentic AI architecture. Launched in July 2025, these models demonstrate excellence in integrated reasoning, coding, and intelligent agent implementations while preserving open-source availability under MIT license.

Website: https://z.ai/model-api (International) | https://open.bigmodel.cn/ (China)

Getting an API Key

International Users

  1. Sign Up/Sign In: Visit https://z.ai/model-api. Establish an account or authenticate.
  2. Navigate to API Keys: Enter your account dashboard and locate the API keys section.
  3. Create a Key: Produce a new API key for your application.
  4. Copy the Key: Capture the API key right away and preserve it securely.

China Mainland Users

  1. Sign Up/Sign In: Visit https://open.bigmodel.cn/. Establish an account or authenticate.
  2. Navigate to API Keys: Enter your account dashboard and locate the API keys section.
  3. Create a Key: Produce a new API key for your application.
  4. Copy the Key: Capture the API key right away and preserve it securely.

Supported Models

Z AI offers distinct model catalogs depending on your chosen region:

GLM-4.5 Series

  • GLM-4.5 - Primary model featuring 355B total parameters, 32B active parameters
  • GLM-4.5-Air - Streamlined model featuring 106B total parameters, 12B active parameters

GLM-4.5 Hybrid Reasoning Models

  • GLM-4.5 (Thinking Mode) - Sophisticated reasoning with sequential analysis
  • GLM-4.5-Air (Thinking Mode) - Optimized reasoning for standard hardware

All models include:

  • 128,000 token context window enabling comprehensive document processing
  • Mixture of Experts (MoE) architecture delivering optimal performance
  • Agent-native design combining reasoning, coding, and tool utilization
  • Open-source availability through MIT license

Configuration in Sypha

  1. Open Sypha Settings: Select the settings icon (⚙️) within the Sypha panel.
  2. Select Provider: Pick "Z AI" from the "API Provider" dropdown menu.
  3. Select Region: Pick your region:
    • "International" for worldwide access
    • "China" for mainland China access
  4. Enter API Key: Insert your Z AI API key into the "Z AI API Key" field.
  5. Select Model: Pick your preferred model from the "Model" dropdown menu.

GLM Coding Plans

Z AI provides subscription tiers tailored specifically for coding use cases. These tiers deliver economical access to GLM-4.5 models via a prompt-based framework instead of conventional API usage charges.

Plan Options

GLM Coding Lite - $3/month

  • 120 prompts within each 5-hour cycle
  • Availability of GLM-4.5 model
  • Functions exclusively via coding tools like Sypha

GLM Coding Pro - $15/month

  • 600 prompts within each 5-hour cycle
  • Availability of GLM-4.5 model
  • Functions exclusively via coding tools like Sypha

Both tiers feature introductory pricing for the initial month: Lite reduces from $6 to $3, Pro reduces from $30 to $15.

zAI subscription page showing GLM Coding Lite and Pro plans with pricing

Setting up GLM Coding Plans

To utilize the GLM Coding Plans with Sypha:

  1. Subscribe: Navigate to https://z.ai/subscribe and select your preferred plan.

  2. Create API Key: Following subscription, access your zAI dashboard and generate an API key for your coding plan.

  3. Configure in Sypha: Launch Sypha settings, designate "Z AI" as your provider, and insert your API key into the "Z AI API Key" field.

Sypha settings with zAI provider selected and API key field highlighted

The configuration links your subscription directly to Sypha, granting you access to GLM-4.5's tool-calling features refined for coding workflows.

Z AI's Hybrid Intelligence

Z AI's GLM-4.5 series presents groundbreaking capabilities that distinguish it from traditional language models:

Hybrid Reasoning Architecture

GLM-4.5 functions in two separate modes:

  • Thinking Mode: Built for intricate reasoning tasks and tool employment, participating in extensive analytical procedures
  • Non-Thinking Mode: Delivers instant responses for simple queries, maximizing efficiency

This dual-mode framework embodies an "agent-native" design approach that adjusts processing intensity according to query complexity.

Exceptional Performance

GLM-4.5 attains a complete score of 63.2 throughout 12 benchmarks covering agentic tasks, reasoning, and coding challenges, claiming 3rd place among all proprietary and open-source models. GLM-4.5-Air sustains competitive performance with a score of 59.8 while providing enhanced efficiency.

Mixture of Experts Excellence

The advanced MoE architecture maximizes performance while preserving computational efficiency:

  • GLM-4.5: 355B total parameters featuring 32B active parameters
  • GLM-4.5-Air: 106B total parameters featuring 12B active parameters

Extended Context Capabilities

The 128,000-token context window facilitates thorough understanding of extensive documents and codebases, with practical testing validating effective processing of approximately 2,000-line codebases while sustaining exceptional performance.

Open-Source Leadership

Published under MIT license, GLM-4.5 grants researchers and developers access to cutting-edge capabilities without proprietary limitations, including base models, hybrid reasoning editions, and optimized FP8 variants.

Regional Optimization

API Endpoints

  • International: Employs https://api.z.ai/api/paas/v4
  • China: Employs https://open.bigmodel.cn/api/paas/v4

Model Availability

The region configuration dictates both API endpoint and accessible models, with automatic filtering guaranteeing compatibility with your chosen region.

Special Features

Agentic Capabilities

GLM-4.5's integrated architecture renders it especially appropriate for sophisticated intelligent agent applications demanding unified reasoning, coding, and tool utilization abilities.

Comprehensive Benchmarking

Performance assessment includes:

  • 3 agentic task benchmarks
  • 7 reasoning benchmarks
  • 2 coding benchmarks

This thorough evaluation exhibits versatility throughout various AI applications.

Developer Integration

Models enable integration via multiple frameworks:

  • transformers
  • vLLM
  • SGLang

Packaged with specialized model code, tool parser, and reasoning parser implementations.

Performance Comparisons

vs Claude 4 Sonnet

GLM-4.5 demonstrates competitive performance in agentic coding and reasoning tasks, although Claude Sonnet 4 retains advantages in coding success rates and autonomous multi-feature application development.

vs GPT-4.5

GLM-4.5 positions competitively in reasoning and agent benchmarks, with GPT-4.5 typically leading in direct task accuracy on professional benchmarks like MMLU and AIME.

Tips and Notes

  • Region Selection: Select the suitable region for peak performance and adherence to local regulations.
  • Model Selection: GLM-4.5 for highest performance, GLM-4.5-Air for efficiency and standard hardware compatibility.
  • Context Advantage: Expansive 128K context window permits processing of considerable codebases and documents.
  • Open Source Benefits: MIT license permits both commercial usage and secondary development.
  • Agentic Applications: Especially robust for applications demanding reasoning, coding, and tool usage integration.
  • Hybrid Reasoning: Apply Thinking Mode for intricate problems, Non-Thinking Mode for straightforward queries.
  • API Compatibility: OpenAI-compatible API delivers streaming responses and usage reporting.
  • Framework Support: Multiple integration alternatives accessible for various deployment scenarios.

On this page