Sehyo

Sehyo/Qwen3.5-122B-A10B-NVFP4

No description available.

Model Documentation

Qwen3.5-122B-A10B-NVFP4



This is a quantized version of Qwen/Qwen3.5-122B-A10B using the NVFP4 quantization scheme.

Please use nightly vLLM for support.

Changelog

  • 02/03/2026: Added MTP (multi-token prediction) weights from source checkpoint, enabling speculative decoding with vLLM.
  • 25/02/2026: Initial upload.


  • Calibration



  • Samples: 512 (256 from each dataset)
  • Datasets:
  • HuggingFaceH4/ultrachat_200k (train_sft split)
  • nvidia/Nemotron-Post-Training-Dataset-v2 (chat split)
  • Max sequence length: 4096
  • All experts calibrated: moe_calibrate_all_experts=True


  • Creation



    This model was created using VLLM's LLM Compressor with Qwen3.5 MoE support added via PR #2383. The PR adds a custom CalibrationQwen3MoeSparseMoeBlock that routes calibration data to all experts during quantization, ensuring every expert receives proper calibration for accurate NVFP4 quantization.

    Files & Weights

    FilenameSizeAction
    extra_weights.safetensors 4.70 GB
    model-00001-of-00002.safetensors 46.58 GB
    model-00002-of-00002.safetensors 24.61 GB