jayn7

jayn7/Z-Image-Turbo-GGUF

No description available.

Model Documentation

Quantized GGUF versions of the Z-Image Turbo by Tongyi-Mai.

📂 Available Models

| Model | Download | |--------|--------------| | Z-Image Turbo GGUF | Download | | Qwen3-4B (Text Encoder) | unsloth/Qwen3-4B-GGUF

📷 Example Comparison

z_image_comparison_1 z_image_comparison_2 z_image_comparison_3



Model Information



Check out the original model card Z-Image Turbo for detailed information about the model.

Usage



The model can be used with:

  • ComfyUI-GGUF by city96
  • Diffusers


  • #

    Example Usage



    Diffusers

    sh
    pip install git+https://github.com/huggingface/diffusers
    


    py
    from diffusers import ZImagePipeline, ZImageTransformer2DModel, GGUFQuantizationConfig
    import torch

    prompt = "Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (西安大雁塔), blurred colorful distant lights." height = 1024 width = 1024 seed = 42

    #hf_path = "https://huggingface.co/jayn7/Z-Image-Turbo-GGUF/blob/main/z_image_turbo-Q3_K_M.gguf" local_path = "path\to\local\model\z_image_turbo-Q3_K_M.gguf"

    transformer = ZImageTransformer2DModel.from_single_file( local_path, quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16), dtype=torch.bfloat16, )

    pipeline = ZImagePipeline.from_pretrained( "Tongyi-MAI/Z-Image-Turbo", transformer=transformer, dtype=torch.bfloat16, ).to("cuda")

    [Optional] Attention Backend

    Diffusers uses SDPA by default. Switch to Custom attention backend for better efficiency if supported:

    #pipeline.transformer.set_attention_backend("_sage_qk_int8_pv_fp16_triton")

    Enable Sage Attention

    #pipeline.transformer.set_attention_backend("flash")

    Enable Flash-Attention-2

    #pipeline.transformer.set_attention_backend("_flash_3")

    Enable Flash-Attention-3



    [Optional] Model Compilation

    Compiling the DiT model accelerates inference, but the first run will take longer to compile.

    #pipeline.transformer.compile()

    [Optional] CPU Offloading

    Enable CPU offloading for memory-constrained devices.

    #pipeline.enable_model_cpu_offload()

    images = pipeline( prompt=prompt, num_inference_steps=9,

    This actually results in 8 DiT forwards

    guidance_scale=0.0,

    Guidance should be 0 for the Turbo models

    height=height, width=width, generator=torch.Generator("cuda").manual_seed(seed) ).images[0]

    images.save("zimage.png")




    Credits



  • Original Model: Z-Image Turbo by Tongyi-MAI
  • Quantization Tools & Guide: llama.cpp & city96


  • License

    This repository follows the same license as the Z-Image Turbo.

    Files & Weights

    FilenameSizeAction
    z_image_turbo-Q3_K_M.gguf 3.84 GB
    z_image_turbo-Q3_K_S.gguf 3.53 GB
    z_image_turbo-Q4_K_M.gguf 4.64 GB
    z_image_turbo-Q4_K_S.gguf 4.34 GB
    z_image_turbo-Q5_K_M.gguf 5.14 GB
    z_image_turbo-Q5_K_S.gguf 4.83 GB
    z_image_turbo-Q6_K.gguf 5.50 GB
    z_image_turbo-Q8_0.gguf 6.73 GB