zai-org

zai-org/GLM-5-FP8

No description available.

Model Documentation

GLM-5-FP8



👋 Join our WeChat or Discord community.
📖 Check out the GLM-5 technical blog.
📍 Use GLM-5 API services on Z.ai API Platform.
👉 One click to GLM-5.



Introduction



We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI). Compared to GLM-4.5, GLM-5 scales from 355B parameters (32B active) to 744B parameters (40B active), and increases pre-training data from 23T to 28.5T tokens. GLM-5 also integrates DeepSeek Sparse Attention (DSA), largely reducing deployment cost while preserving long-context capacity.

Reinforcement learning aims to bridge the gap between competence and excellence in pre-trained models. However, deploying it at scale for LLMs is a challenge due to the RL training inefficiency. To this end, we developed slime, a novel asynchronous RL infrastructure that substantially improves training throughput and efficiency, enabling more fine-grained post-training iterations. With advances in both pre-training and post-training, GLM-5 delivers significant improvement compared to GLM-4.7 across a wide range of academic benchmarks and achieves best-in-class performance among all open-source models in the world on reasoning, coding, and agentic tasks, closing the gap with frontier models.

Benchmark



| | GLM-5 | GLM-4.7 | DeepSeek-V3.2 | Kimi K2.5 | Claude Opus 4.5 | Gemini 3 Pro | GPT-5.2 (xhigh) | | -------------------------------
  • | ---------------------- | --------- | ------------- |-----------| --------------- | ------------ | --------------- |
  • | HLE | 30.5 | 24.8 | 25.1 | 31.5 | 28.4 | 37.2 | 35.4 | | HLE (w/ Tools) | 50.4 | 42.8 | 40.8 | 51.8 | 43.4* | 45.8* | 45.5* | | AIME 2026 I | 92.7 | 92.9 | 92.7 | 92.5 | 93.3 | 90.6 |
  • |
  • | HMMT Nov. 2025 | 96.9 | 93.5 | 90.2 | 91.1 | 91.7 | 93.0 | 97.1 | | IMOAnswerBench | 82.5 | 82.0 | 78.3 | 81.8 | 78.5 | 83.3 | 86.3 | | GPQA-Diamond | 86.0 | 85.7 | 82.4 | 87.6 | 87.0 | 91.9 | 92.4 | | SWE-bench Verified | 77.8 | 73.8 | 73.1 | 76.8 | 80.9 | 76.2 | 80.0 | | SWE-bench Multilingual | 73.3 | 66.7 | 70.2 | 73.0 | 77.5 | 65.0 | 72.0 | | Terminal-Bench 2.0 (Terminus 2) | 56.2 / 60.7 † | 41.0 | 39.3 | 50.8 | 59.3 | 54.2 | 54.0 | | Terminal-Bench 2.0 (Claude Code) | 56.2 / 61.1 † | 32.8 | 46.4 |
  • | 57.9 | - | - |
  • | CyberGym | 43.2 | 23.5 | 17.3 | 41.3 | 50.6 | 39.9 |
  • |
  • | BrowseComp | 62.0 | 52.0 | 51.4 | 60.6 | 37.0 | 37.8 |
  • |
  • | BrowseComp (w/ Context Manage) | 75.9 | 67.5 | 67.6 | 74.9 | 67.8 | 59.2 | 65.8 | | BrowseComp-Zh | 72.7 | 66.6 | 65.0 | 62.3 | 62.4 | 66.8 | 76.1 | | τ²-Bench | 89.7 | 87.4 | 85.3 | 80.2 | 91.6 | 90.7 | 85.5 | | MCP-Atlas (Public Set) | 67.8 | 52.0 | 62.2 | 63.8 | 65.2 | 66.6 | 68.0 | | Tool-Decathlon | 38.0 | 23.8 | 35.2 | 27.8 | 43.5 | 36.4 | 46.3 | | Vending Bench 2 | $4,432.12 | $2,376.82 | $1,034.00 | $1,198.46 | $4,967.06 | $5,478.16 | $3,591.33 |

    > *: refers to their scores of full set. > > †: A verified version of Terminal-Bench 2.0 that fixes some ambiguous instructions. See footnote for more evaluation details.

    Footnote



    * Humanity’s Last Exam (HLE) & other reasoning tasks: We evaluate with a maximum generation length of 131,072 tokens (temperature=1.0, top_p=0.95, max_new_tokens=131072). By default, we report the text-only subset; results marked with * are from the full set. We use GPT-5.2 (medium) as the judge model. For HLE-with-tools, we use a maximum context length of 202,752 tokens. * SWE-bench & SWE-bench Multilingual: We run the SWE-bench suite with OpenHands using a tailored instruction prompt. Settings: temperature=0.7, top_p=0.95, max_new_tokens=16384, with a 200K context window. * BrowserComp: Without context management, we retain details from the most recent 5 turns. With context management, we use the same discard-all strategy as DeepSeek-v3.2 and Kimi K2.5. * Terminal-Bench 2.0 (Terminus 2): We evaluate with the Terminus framework using timeout=2h, temperature=0.7, top_p=1.0, max_new_tokens=8192, with a 128K context window. Resource limits are capped at 16 CPUs and 32 GB RAM. * Terminal-Bench 2.0 (Claude Code): We evaluate in Claude Code 2.1.14 (think mode, default effort) with temperature=1.0, top_p=0.95, max_new_tokens=65536. We remove wall-clock time limits due to generation speed, while preserving per-task CPU and memory constraints. Scores are averaged over 5 runs. We fix environment issues introduced by Claude Code and also report results on a verified Terminal-Bench 2.0 dataset that resolves ambiguous instructions (see: https://huggingface.co/datasets/zai-org/terminal-bench-2-verified). * CyberGym: We evaluate in Claude Code 2.1.18 (think mode, no web tools) with (temperature=1.0, top_p=1.0, max_new_tokens=32000) and a 250-minute timeout per task. Results are single-run Pass@1 over 1,507 tasks. * MCP-Atlas: All models are evaluated in think mode on the 500-task public subset with a 10-minute timeout per task. We use Gemini 3 Pro as the judge model. * τ²-bench: We add a small prompt adjustment in Retail and Telecom to avoid failures caused by premature user termination. For Airline, we apply the domain fixes proposed in the Claude Opus 4.5 system card. * Vending Bench 2: Runs are conducted independently by Andon Labs.

    Serve GLM-5 Locally



    Prepare environment



    vLLM, SGLang, KTransformers, and xLLM all support local deployment of GLM-5. A simple deployment guide is provided here.

    + vLLM

    Using Docker as:

    shell
        docker pull vllm/vllm-openai:nightly 
        


    or using pip:

    shell
        pip install -U vllm --pre --index-url https://pypi.org/simple --extra-index-url https://wheels.vllm.ai/nightly
        


    then upgrade transformers:

    
        pip install git+https://github.com/huggingface/transformers.git
        


    + SGLang

    Using Docker as:
    bash
        docker pull lmsysorg/sglang:glm5-hopper 

    For Hopper GPU

    docker pull lmsysorg/sglang:glm5-blackwell

    For Blackwell GPU



    Deploy



    + vLLM

    shell
        vllm serve zai-org/GLM-5-FP8 \
             --tensor-parallel-size 8 \
             --gpu-memory-utilization 0.85 \
             --speculative-config.method mtp \
             --speculative-config.num_speculative_tokens 1 \
             --tool-call-parser glm47 \
             --reasoning-parser glm45 \
             --enable-auto-tool-choice \
             --served-model-name glm-5-fp8
        


    Check the recipes for more details.

    + SGLang

    shell
        python3 -m sglang.launch_server \
          --model-path zai-org/GLM-5-FP8 \
          --tp-size 8 \
          --tool-call-parser glm47  \
          --reasoning-parser glm45 \
          --speculative-algorithm EAGLE \
          --speculative-num-steps 3 \
          --speculative-eagle-topk 1 \
          --speculative-num-draft-tokens 4 \
          --mem-fraction-static 0.85 \
          --served-model-name glm-5-fp8
        
    Check the sglang cookbook for more details.

    + xLLM and other Ascend NPU

    Please check the deployment guide here.

    + KTransformers

    Please check the deployment guide here.

    Citation



    Our technical report is coming soon.

    Files & Weights

    FilenameSizeAction
    model-00001-of-00142.safetensors 5.00 GB
    model-00002-of-00142.safetensors 4.99 GB
    model-00003-of-00142.safetensors 4.99 GB
    model-00004-of-00142.safetensors 4.99 GB
    model-00005-of-00142.safetensors 4.99 GB
    model-00006-of-00142.safetensors 4.99 GB
    model-00007-of-00142.safetensors 4.99 GB
    model-00008-of-00142.safetensors 4.99 GB
    model-00009-of-00142.safetensors 4.99 GB
    model-00010-of-00142.safetensors 4.99 GB
    model-00011-of-00142.safetensors 4.99 GB
    model-00012-of-00142.safetensors 4.99 GB
    model-00013-of-00142.safetensors 4.99 GB
    model-00014-of-00142.safetensors 4.99 GB
    model-00015-of-00142.safetensors 4.99 GB
    model-00016-of-00142.safetensors 4.99 GB
    model-00017-of-00142.safetensors 4.99 GB
    model-00018-of-00142.safetensors 4.99 GB
    model-00019-of-00142.safetensors 4.99 GB
    model-00020-of-00142.safetensors 4.99 GB
    model-00021-of-00142.safetensors 4.99 GB
    model-00022-of-00142.safetensors 4.99 GB
    model-00023-of-00142.safetensors 4.99 GB
    model-00024-of-00142.safetensors 4.99 GB
    model-00025-of-00142.safetensors 4.99 GB
    model-00026-of-00142.safetensors 4.99 GB
    model-00027-of-00142.safetensors 4.99 GB
    model-00028-of-00142.safetensors 4.99 GB
    model-00029-of-00142.safetensors 4.99 GB
    model-00030-of-00142.safetensors 4.99 GB
    model-00031-of-00142.safetensors 4.99 GB
    model-00032-of-00142.safetensors 4.99 GB
    model-00033-of-00142.safetensors 4.99 GB
    model-00034-of-00142.safetensors 4.99 GB
    model-00035-of-00142.safetensors 4.99 GB
    model-00036-of-00142.safetensors 4.99 GB
    model-00037-of-00142.safetensors 4.99 GB
    model-00038-of-00142.safetensors 4.99 GB
    model-00039-of-00142.safetensors 4.99 GB
    model-00040-of-00142.safetensors 4.99 GB
    model-00041-of-00142.safetensors 4.99 GB
    model-00042-of-00142.safetensors 4.99 GB
    model-00043-of-00142.safetensors 4.99 GB
    model-00044-of-00142.safetensors 4.99 GB
    model-00045-of-00142.safetensors 4.99 GB
    model-00046-of-00142.safetensors 4.99 GB
    model-00047-of-00142.safetensors 4.98 GB
    model-00048-of-00142.safetensors 4.99 GB
    model-00049-of-00142.safetensors 4.99 GB
    model-00050-of-00142.safetensors 4.99 GB
    model-00051-of-00142.safetensors 4.99 GB
    model-00052-of-00142.safetensors 4.99 GB
    model-00053-of-00142.safetensors 4.99 GB
    model-00054-of-00142.safetensors 4.99 GB
    model-00055-of-00142.safetensors 4.99 GB
    model-00056-of-00142.safetensors 4.99 GB
    model-00057-of-00142.safetensors 4.99 GB
    model-00058-of-00142.safetensors 4.99 GB
    model-00059-of-00142.safetensors 4.99 GB
    model-00060-of-00142.safetensors 4.99 GB
    model-00061-of-00142.safetensors 4.99 GB
    model-00062-of-00142.safetensors 4.99 GB
    model-00063-of-00142.safetensors 4.99 GB
    model-00064-of-00142.safetensors 4.99 GB
    model-00065-of-00142.safetensors 4.99 GB
    model-00066-of-00142.safetensors 4.99 GB
    model-00067-of-00142.safetensors 4.99 GB
    model-00068-of-00142.safetensors 4.99 GB
    model-00069-of-00142.safetensors 4.99 GB
    model-00070-of-00142.safetensors 4.99 GB
    model-00071-of-00142.safetensors 4.99 GB
    model-00072-of-00142.safetensors 4.99 GB
    model-00073-of-00142.safetensors 4.99 GB
    model-00074-of-00142.safetensors 4.99 GB
    model-00075-of-00142.safetensors 4.99 GB
    model-00076-of-00142.safetensors 4.99 GB
    model-00077-of-00142.safetensors 4.99 GB
    model-00078-of-00142.safetensors 4.99 GB
    model-00079-of-00142.safetensors 4.99 GB
    model-00080-of-00142.safetensors 4.99 GB
    model-00081-of-00142.safetensors 4.99 GB
    model-00082-of-00142.safetensors 4.95 GB
    model-00083-of-00142.safetensors 4.99 GB
    model-00084-of-00142.safetensors 4.99 GB
    model-00085-of-00142.safetensors 4.99 GB
    model-00086-of-00142.safetensors 4.99 GB
    model-00087-of-00142.safetensors 4.99 GB
    model-00088-of-00142.safetensors 4.99 GB
    model-00089-of-00142.safetensors 4.99 GB
    model-00090-of-00142.safetensors 4.99 GB
    model-00091-of-00142.safetensors 4.99 GB
    model-00092-of-00142.safetensors 4.99 GB
    model-00093-of-00142.safetensors 4.99 GB
    model-00094-of-00142.safetensors 4.99 GB
    model-00095-of-00142.safetensors 4.99 GB
    model-00096-of-00142.safetensors 4.99 GB
    model-00097-of-00142.safetensors 4.99 GB
    model-00098-of-00142.safetensors 4.99 GB
    model-00099-of-00142.safetensors 4.99 GB
    model-00100-of-00142.safetensors 4.99 GB
    model-00101-of-00142.safetensors 4.99 GB
    model-00102-of-00142.safetensors 4.99 GB
    model-00103-of-00142.safetensors 4.99 GB
    model-00104-of-00142.safetensors 4.99 GB
    model-00105-of-00142.safetensors 4.99 GB
    model-00106-of-00142.safetensors 4.99 GB
    model-00107-of-00142.safetensors 4.99 GB
    model-00108-of-00142.safetensors 4.99 GB
    model-00109-of-00142.safetensors 4.99 GB
    model-00110-of-00142.safetensors 4.99 GB
    model-00111-of-00142.safetensors 4.99 GB
    model-00112-of-00142.safetensors 4.99 GB
    model-00113-of-00142.safetensors 4.99 GB
    model-00114-of-00142.safetensors 4.99 GB
    model-00115-of-00142.safetensors 4.99 GB
    model-00116-of-00142.safetensors 4.99 GB
    model-00117-of-00142.safetensors 5.00 GB
    model-00118-of-00142.safetensors 4.99 GB
    model-00119-of-00142.safetensors 4.99 GB
    model-00120-of-00142.safetensors 4.99 GB
    model-00121-of-00142.safetensors 4.99 GB
    model-00122-of-00142.safetensors 4.99 GB
    model-00123-of-00142.safetensors 4.99 GB
    model-00124-of-00142.safetensors 4.99 GB
    model-00125-of-00142.safetensors 4.99 GB
    model-00126-of-00142.safetensors 4.99 GB
    model-00127-of-00142.safetensors 4.99 GB
    model-00128-of-00142.safetensors 4.99 GB
    model-00129-of-00142.safetensors 4.99 GB
    model-00130-of-00142.safetensors 4.99 GB
    model-00131-of-00142.safetensors 4.99 GB
    model-00132-of-00142.safetensors 4.99 GB
    model-00133-of-00142.safetensors 4.99 GB
    model-00134-of-00142.safetensors 4.99 GB
    model-00135-of-00142.safetensors 4.99 GB
    model-00136-of-00142.safetensors 4.99 GB
    model-00137-of-00142.safetensors 4.99 GB
    model-00138-of-00142.safetensors 5.00 GB
    model-00139-of-00142.safetensors 4.99 GB
    model-00140-of-00142.safetensors 4.99 GB
    model-00141-of-00142.safetensors 4.98 GB
    model-00142-of-00142.safetensors 0.14 GB