Intel
Intel/distilbert-base-uncased-distilled-squad-int8-static-inc
No description available.
Model Documentation
Model Card for INT8 DistilBERT Base Uncased Fine-Tuned on SQuAD
This model is an INT8 quantized version of DistilBERT base uncased, which has been fine-tuned on the Stanford Question Answering Dataset (SQuAD). The quantization was performed using the Hugging Face's Optimum-Intel, leveraging the Intel® Neural Compressor.| Model Detail | Description | | ----------
| Intended Use | Description | | ----------
Evaluation
PyTorch Version
This is an INT8 PyTorch model quantized with huggingface/optimum-intel through the usage of Intel® Neural Compressor.
| |INT8|FP32| |---|:---:|:---:| | Accuracy (eval-f1) |86.1069|86.8374| | Model size (MB) |74.7|265|
ONNX Version
This is an INT8 ONNX model quantized with Intel® Neural Compressor.
| |INT8|FP32| |---|:---:|:---:| | Accuracy (eval-f1) |0.8633|0.8687| | Model size (MB) |154|254|
Usage
Optimum Intel w/ Neural Compressor
python
from optimum.intel import INCModelForQuestionAnswering
model_id = "Intel/distilbert-base-uncased-distilled-squad-int8-static"
int8_model = INCModelForQuestionAnswering.from_pretrained(model_id)
Optimum w/ ONNX Runtime
python
from optimum.onnxruntime import ORTModelForQuestionAnswering
model = ORTModelForQuestionAnswering.from_pretrained('Intel/distilbert-base-uncased-distilled-squad-int8-static')
Ethical Considerations
While not explicitly mentioned, users should be aware of potential biases present in the training data (SQuAD and Wikipedia), and consider the implications of these biases on the model's outputs. Additionally, quantization may introduce or exacerbate biases in certain scenarios.Caveats and Recommendations
Files & Weights
| Filename | Size | Action |
|---|---|---|
| model.onnx | 0.15 GB | |
| pytorch_model.bin | 0.07 GB |