zai-org
zai-org/GLM-5-FP8
No description available.
Model Documentation
GLM-5-FP8
👋 Join our WeChat or Discord community.
📖 Check out the GLM-5 technical blog.
📍 Use GLM-5 API services on Z.ai API Platform.
👉 One click to GLM-5.
Introduction
We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI). Compared to GLM-4.5, GLM-5 scales from 355B parameters (32B active) to 744B parameters (40B active), and increases pre-training data from 23T to 28.5T tokens. GLM-5 also integrates DeepSeek Sparse Attention (DSA), largely reducing deployment cost while preserving long-context capacity.
Reinforcement learning aims to bridge the gap between competence and excellence in pre-trained models. However, deploying it at scale for LLMs is a challenge due to the RL training inefficiency. To this end, we developed slime, a novel asynchronous RL infrastructure that substantially improves training throughput and efficiency, enabling more fine-grained post-training iterations. With advances in both pre-training and post-training, GLM-5 delivers significant improvement compared to GLM-4.7 across a wide range of academic benchmarks and achieves best-in-class performance among all open-source models in the world on reasoning, coding, and agentic tasks, closing the gap with frontier models.
Benchmark
| | GLM-5 | GLM-4.7 | DeepSeek-V3.2 | Kimi K2.5 | Claude Opus 4.5 | Gemini 3 Pro | GPT-5.2 (xhigh) | | -------------------------------
> *: refers to their scores of full set. > > †: A verified version of Terminal-Bench 2.0 that fixes some ambiguous instructions. See footnote for more evaluation details.
Footnote
* Humanity’s Last Exam (HLE) & other reasoning tasks: We evaluate with a maximum generation length of 131,072 tokens (
temperature=1.0, top_p=0.95, max_new_tokens=131072). By default, we report the text-only subset; results marked with * are from the full set. We use GPT-5.2 (medium) as the judge model. For HLE-with-tools, we use a maximum context length of 202,752 tokens.
* SWE-bench & SWE-bench Multilingual: We run the SWE-bench suite with OpenHands using a tailored instruction prompt. Settings: temperature=0.7, top_p=0.95, max_new_tokens=16384, with a 200K context window.
* BrowserComp: Without context management, we retain details from the most recent 5 turns. With context management, we use the same discard-all strategy as DeepSeek-v3.2 and Kimi K2.5.
* Terminal-Bench 2.0 (Terminus 2): We evaluate with the Terminus framework using timeout=2h, temperature=0.7, top_p=1.0, max_new_tokens=8192, with a 128K context window. Resource limits are capped at 16 CPUs and 32 GB RAM.
* Terminal-Bench 2.0 (Claude Code): We evaluate in Claude Code 2.1.14 (think mode, default effort) with temperature=1.0, top_p=0.95, max_new_tokens=65536. We remove wall-clock time limits due to generation speed, while preserving per-task CPU and memory constraints. Scores are averaged over 5 runs. We fix environment issues introduced by Claude Code and also report results on a verified Terminal-Bench 2.0 dataset that resolves ambiguous instructions (see: https://huggingface.co/datasets/zai-org/terminal-bench-2-verified).
* CyberGym: We evaluate in Claude Code 2.1.18 (think mode, no web tools) with (temperature=1.0, top_p=1.0, max_new_tokens=32000) and a 250-minute timeout per task. Results are single-run Pass@1 over 1,507 tasks.
* MCP-Atlas: All models are evaluated in think mode on the 500-task public subset with a 10-minute timeout per task. We use Gemini 3 Pro as the judge model.
* τ²-bench: We add a small prompt adjustment in Retail and Telecom to avoid failures caused by premature user termination. For Airline, we apply the domain fixes proposed in the Claude Opus 4.5 system card.
* Vending Bench 2: Runs are conducted independently by Andon Labs.Serve GLM-5 Locally
Prepare environment
vLLM, SGLang, KTransformers, and xLLM all support local deployment of GLM-5. A simple deployment guide is provided here.
+ vLLM
Using Docker as:
shell
docker pull vllm/vllm-openai:nightly
or using pip:
shell
pip install -U vllm --pre --index-url https://pypi.org/simple --extra-index-url https://wheels.vllm.ai/nightly
then upgrade transformers:
pip install git+https://github.com/huggingface/transformers.git
+ SGLang
Using Docker as:
bash
docker pull lmsysorg/sglang:glm5-hopper For Hopper GPU
docker pull lmsysorg/sglang:glm5-blackwell For Blackwell GPU
Deploy
+ vLLM
shell
vllm serve zai-org/GLM-5-FP8 \
--tensor-parallel-size 8 \
--gpu-memory-utilization 0.85 \
--speculative-config.method mtp \
--speculative-config.num_speculative_tokens 1 \
--tool-call-parser glm47 \
--reasoning-parser glm45 \
--enable-auto-tool-choice \
--served-model-name glm-5-fp8
Check the recipes for more details.
+ SGLang
shell
python3 -m sglang.launch_server \
--model-path zai-org/GLM-5-FP8 \
--tp-size 8 \
--tool-call-parser glm47 \
--reasoning-parser glm45 \
--speculative-algorithm EAGLE \
--speculative-num-steps 3 \
--speculative-eagle-topk 1 \
--speculative-num-draft-tokens 4 \
--mem-fraction-static 0.85 \
--served-model-name glm-5-fp8
+ xLLM and other Ascend NPU
Please check the deployment guide here.
+ KTransformers
Please check the deployment guide here.
Citation
Our technical report is coming soon.
Files & Weights
| Filename | Size | Action |
|---|---|---|
| model-00001-of-00142.safetensors | 5.00 GB | |
| model-00002-of-00142.safetensors | 4.99 GB | |
| model-00003-of-00142.safetensors | 4.99 GB | |
| model-00004-of-00142.safetensors | 4.99 GB | |
| model-00005-of-00142.safetensors | 4.99 GB | |
| model-00006-of-00142.safetensors | 4.99 GB | |
| model-00007-of-00142.safetensors | 4.99 GB | |
| model-00008-of-00142.safetensors | 4.99 GB | |
| model-00009-of-00142.safetensors | 4.99 GB | |
| model-00010-of-00142.safetensors | 4.99 GB | |
| model-00011-of-00142.safetensors | 4.99 GB | |
| model-00012-of-00142.safetensors | 4.99 GB | |
| model-00013-of-00142.safetensors | 4.99 GB | |
| model-00014-of-00142.safetensors | 4.99 GB | |
| model-00015-of-00142.safetensors | 4.99 GB | |
| model-00016-of-00142.safetensors | 4.99 GB | |
| model-00017-of-00142.safetensors | 4.99 GB | |
| model-00018-of-00142.safetensors | 4.99 GB | |
| model-00019-of-00142.safetensors | 4.99 GB | |
| model-00020-of-00142.safetensors | 4.99 GB | |
| model-00021-of-00142.safetensors | 4.99 GB | |
| model-00022-of-00142.safetensors | 4.99 GB | |
| model-00023-of-00142.safetensors | 4.99 GB | |
| model-00024-of-00142.safetensors | 4.99 GB | |
| model-00025-of-00142.safetensors | 4.99 GB | |
| model-00026-of-00142.safetensors | 4.99 GB | |
| model-00027-of-00142.safetensors | 4.99 GB | |
| model-00028-of-00142.safetensors | 4.99 GB | |
| model-00029-of-00142.safetensors | 4.99 GB | |
| model-00030-of-00142.safetensors | 4.99 GB | |
| model-00031-of-00142.safetensors | 4.99 GB | |
| model-00032-of-00142.safetensors | 4.99 GB | |
| model-00033-of-00142.safetensors | 4.99 GB | |
| model-00034-of-00142.safetensors | 4.99 GB | |
| model-00035-of-00142.safetensors | 4.99 GB | |
| model-00036-of-00142.safetensors | 4.99 GB | |
| model-00037-of-00142.safetensors | 4.99 GB | |
| model-00038-of-00142.safetensors | 4.99 GB | |
| model-00039-of-00142.safetensors | 4.99 GB | |
| model-00040-of-00142.safetensors | 4.99 GB | |
| model-00041-of-00142.safetensors | 4.99 GB | |
| model-00042-of-00142.safetensors | 4.99 GB | |
| model-00043-of-00142.safetensors | 4.99 GB | |
| model-00044-of-00142.safetensors | 4.99 GB | |
| model-00045-of-00142.safetensors | 4.99 GB | |
| model-00046-of-00142.safetensors | 4.99 GB | |
| model-00047-of-00142.safetensors | 4.98 GB | |
| model-00048-of-00142.safetensors | 4.99 GB | |
| model-00049-of-00142.safetensors | 4.99 GB | |
| model-00050-of-00142.safetensors | 4.99 GB | |
| model-00051-of-00142.safetensors | 4.99 GB | |
| model-00052-of-00142.safetensors | 4.99 GB | |
| model-00053-of-00142.safetensors | 4.99 GB | |
| model-00054-of-00142.safetensors | 4.99 GB | |
| model-00055-of-00142.safetensors | 4.99 GB | |
| model-00056-of-00142.safetensors | 4.99 GB | |
| model-00057-of-00142.safetensors | 4.99 GB | |
| model-00058-of-00142.safetensors | 4.99 GB | |
| model-00059-of-00142.safetensors | 4.99 GB | |
| model-00060-of-00142.safetensors | 4.99 GB | |
| model-00061-of-00142.safetensors | 4.99 GB | |
| model-00062-of-00142.safetensors | 4.99 GB | |
| model-00063-of-00142.safetensors | 4.99 GB | |
| model-00064-of-00142.safetensors | 4.99 GB | |
| model-00065-of-00142.safetensors | 4.99 GB | |
| model-00066-of-00142.safetensors | 4.99 GB | |
| model-00067-of-00142.safetensors | 4.99 GB | |
| model-00068-of-00142.safetensors | 4.99 GB | |
| model-00069-of-00142.safetensors | 4.99 GB | |
| model-00070-of-00142.safetensors | 4.99 GB | |
| model-00071-of-00142.safetensors | 4.99 GB | |
| model-00072-of-00142.safetensors | 4.99 GB | |
| model-00073-of-00142.safetensors | 4.99 GB | |
| model-00074-of-00142.safetensors | 4.99 GB | |
| model-00075-of-00142.safetensors | 4.99 GB | |
| model-00076-of-00142.safetensors | 4.99 GB | |
| model-00077-of-00142.safetensors | 4.99 GB | |
| model-00078-of-00142.safetensors | 4.99 GB | |
| model-00079-of-00142.safetensors | 4.99 GB | |
| model-00080-of-00142.safetensors | 4.99 GB | |
| model-00081-of-00142.safetensors | 4.99 GB | |
| model-00082-of-00142.safetensors | 4.95 GB | |
| model-00083-of-00142.safetensors | 4.99 GB | |
| model-00084-of-00142.safetensors | 4.99 GB | |
| model-00085-of-00142.safetensors | 4.99 GB | |
| model-00086-of-00142.safetensors | 4.99 GB | |
| model-00087-of-00142.safetensors | 4.99 GB | |
| model-00088-of-00142.safetensors | 4.99 GB | |
| model-00089-of-00142.safetensors | 4.99 GB | |
| model-00090-of-00142.safetensors | 4.99 GB | |
| model-00091-of-00142.safetensors | 4.99 GB | |
| model-00092-of-00142.safetensors | 4.99 GB | |
| model-00093-of-00142.safetensors | 4.99 GB | |
| model-00094-of-00142.safetensors | 4.99 GB | |
| model-00095-of-00142.safetensors | 4.99 GB | |
| model-00096-of-00142.safetensors | 4.99 GB | |
| model-00097-of-00142.safetensors | 4.99 GB | |
| model-00098-of-00142.safetensors | 4.99 GB | |
| model-00099-of-00142.safetensors | 4.99 GB | |
| model-00100-of-00142.safetensors | 4.99 GB | |
| model-00101-of-00142.safetensors | 4.99 GB | |
| model-00102-of-00142.safetensors | 4.99 GB | |
| model-00103-of-00142.safetensors | 4.99 GB | |
| model-00104-of-00142.safetensors | 4.99 GB | |
| model-00105-of-00142.safetensors | 4.99 GB | |
| model-00106-of-00142.safetensors | 4.99 GB | |
| model-00107-of-00142.safetensors | 4.99 GB | |
| model-00108-of-00142.safetensors | 4.99 GB | |
| model-00109-of-00142.safetensors | 4.99 GB | |
| model-00110-of-00142.safetensors | 4.99 GB | |
| model-00111-of-00142.safetensors | 4.99 GB | |
| model-00112-of-00142.safetensors | 4.99 GB | |
| model-00113-of-00142.safetensors | 4.99 GB | |
| model-00114-of-00142.safetensors | 4.99 GB | |
| model-00115-of-00142.safetensors | 4.99 GB | |
| model-00116-of-00142.safetensors | 4.99 GB | |
| model-00117-of-00142.safetensors | 5.00 GB | |
| model-00118-of-00142.safetensors | 4.99 GB | |
| model-00119-of-00142.safetensors | 4.99 GB | |
| model-00120-of-00142.safetensors | 4.99 GB | |
| model-00121-of-00142.safetensors | 4.99 GB | |
| model-00122-of-00142.safetensors | 4.99 GB | |
| model-00123-of-00142.safetensors | 4.99 GB | |
| model-00124-of-00142.safetensors | 4.99 GB | |
| model-00125-of-00142.safetensors | 4.99 GB | |
| model-00126-of-00142.safetensors | 4.99 GB | |
| model-00127-of-00142.safetensors | 4.99 GB | |
| model-00128-of-00142.safetensors | 4.99 GB | |
| model-00129-of-00142.safetensors | 4.99 GB | |
| model-00130-of-00142.safetensors | 4.99 GB | |
| model-00131-of-00142.safetensors | 4.99 GB | |
| model-00132-of-00142.safetensors | 4.99 GB | |
| model-00133-of-00142.safetensors | 4.99 GB | |
| model-00134-of-00142.safetensors | 4.99 GB | |
| model-00135-of-00142.safetensors | 4.99 GB | |
| model-00136-of-00142.safetensors | 4.99 GB | |
| model-00137-of-00142.safetensors | 4.99 GB | |
| model-00138-of-00142.safetensors | 5.00 GB | |
| model-00139-of-00142.safetensors | 4.99 GB | |
| model-00140-of-00142.safetensors | 4.99 GB | |
| model-00141-of-00142.safetensors | 4.98 GB | |
| model-00142-of-00142.safetensors | 0.14 GB |