abhishekchohan

abhishekchohan/gemma-3-12b-it-quantized-W4A16

No description available.

Model Documentation

Gemma 3 Quantized Models



This repository contains W4A16 quantized versions of Google's Gemma 3 instruction-tuned models, making them more accessible for deployment on consumer hardware while maintaining good performance.

Models



  • abhishekchohan/gemma-3-27b-it-quantized-W4A16
  • abhishekchohan/gemma-3-12b-it-quantized-W4A16
  • abhishekchohan/gemma-3-4b-it-quantized-W4A16


  • Repository Structure



    
    gemma-3-{size}-it-quantized-W4A16/
    ├── README.md
    ├── templates/
    │   └── chat_template.jinja
    ├── tools/
    │   └── tool_parser.py
    └── [model files]
    


    Quantization Details



    These models use W4A16 quantization via LLM Compressor:
  • Weights quantized to 4-bit precision
  • Activations use 16-bit precision
  • Significantly reduced memory requirements


  • Usage with vLLM



    bash
    vllm serve abhishekchohan/gemma-3-{size}-it-quantized-W4A16 --chat-template templates/chat_template.jinja --enable-auto-tool-choice --tool-call-parser gemma --tool-parser-plugin tools/tool_parser.py
    


    License



    These models are subject to the Gemma license. Users must acknowledge and accept the license terms before using the models.

    Citation



    
    @article{gemma_2025,
        title={Gemma 3},
        url={https://goo.gle/Gemma3Report},
        publisher={Kaggle},
        author={Gemma Team},
        year={2025}
    }
    

    Files & Weights

    FilenameSizeAction
    model-00001-of-00002.safetensors 4.65 GB
    model-00002-of-00002.safetensors 3.18 GB