Results for "image-to-text"
100 matches found.
Salesforce/blip-image-captioning-base
No description available.
zai-org/GLM-OCR
No description available.
Salesforce/blip-image-captioning-large
No description available.
microsoft/trocr-large-printed
The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image enc...
PaddlePaddle/UVDoc
No description available.
microsoft/trocr-base-printed
The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image enc...
PaddlePaddle/PP-LCNet_x1_0_doc_ori
No description available.
xtuner/llava-llama-3-8b-v1_1-gguf
No description available.
naver-clova-ix/donut-base
Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a ...
PaddlePaddle/PP-OCRv5_server_det
No description available.
Salesforce/blip2-opt-2.7b-coco
BLIP-2 consists of 3 models: a CLIP-like image encoder, a Querying Transformer (Q-Former) and a large language model. The authors initialize...
nlpconnect/vit-gpt2-image-captioning
No description available.
PaddlePaddle/en_PP-OCRv5_mobile_rec
No description available.
kha-white/manga-ocr-base
No description available.
microsoft/trocr-base-handwritten
The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image enc...
PaddlePaddle/PP-LCNet_x1_0_textline_ori
No description available.
facebook/nougat-base
Nougat is a Donut model trained to transcribe scientific PDFs into an easy-to-use markdown format. The model consists of a Swin Transformer ...
microsoft/kosmos-2-patch14-224
No description available.
microsoft/trocr-large-handwritten
The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image enc...
lightonai/LightOnOCR-1B-1025
LightOnOCR combines a Vision Transformer encoder(Pixtral-based) with a lightweight text decoder(Qwen3-based) distilled from high-quality ope...
breezedeus/pix2text-mfr
This MFR model utilizes the TrOCR architecture developed by Microsoft, starting with its initial values and retrained using a dataset of mat...
alibaba-damo/mgp-str-base
MGP-STR is pure vision STR model, consisting of ViT and specially designed A^3 modules. The ViT module was initialized from the weights of D...
rtr46/meiki.txt.recognition.v0
No description available.
rtr46/meiki.text.detect.v0
No description available.
numind/NuMarkdown-8B-Thinking
No description available.
PaddlePaddle/PP-OCRv5_server_rec
No description available.
PaddlePaddle/latin_PP-OCRv5_mobile_rec
No description available.
naver-clova-ix/donut-base-finetuned-cord-v2
Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a ...
ibm-granite/granite-vision-3.3-2b
No description available.
PaddlePaddle/PP-OCRv5_mobile_det
No description available.
OleehyO/TexTeller
No description available.
hezarai/crnn-base-fa-v2
A CRNN model for Persian OCR. This model is based on a simple CNN + LSTM architecture inspired by this paper....
optimum-intel-internal-testing/pix2struct-tiny-random
No description available.
microsoft/trocr-small-printed
The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image enc...
laion/mscoco_finetuned_CoCa-ViT-L-14-laion2B-s13B-b90k
No description available.
Riksarkivet/trocr-base-handwritten-hist-swe-2
No description available.
microsoft/trocr-base-stage1
The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image enc...
PaddlePaddle/RT-DETR-L_wired_table_cell_det
No description available.
microsoft/trocr-small-handwritten
The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image enc...
PaddlePaddle/PP-DocLayout_plus-L
No description available.
PaddlePaddle/RT-DETR-L_wireless_table_cell_det
No description available.
optimum-intel-internal-testing/trocr-small-handwritten
The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image enc...
zhiyuanyou/DeQA-Score-Mix3
No description available.
PaddlePaddle/PP-OCRv5_mobile_rec
No description available.
microsoft/git-base
GIT is a Transformer decoder conditioned on both CLIP image tokens and text tokens. The model is trained using "teacher forcing" on a lot of...
PaddlePaddle/PP-LCNet_x1_0_table_cls
No description available.
microsoft/trocr-small-stage1
The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image enc...
PaddlePaddle/PP-DocBlockLayout
No description available.
google/pix2struct-textcaps-base
No description available.
IAMJB/chexpert-mimic-cxr-findings-baseline
Evaluation on chexpert-plus...
PaddlePaddle/SLANeXt_wired
No description available.
IAMJB/chexpert-mimic-cxr-impression-baseline
Evaluation on chexpert-plus...
PaddlePaddle/SLANet_plus
No description available.
PaddlePaddle/PP-FormulaNet_plus-L
No description available.
fxmarty/pix2struct-tiny-random
No description available.
PaddlePaddle/PP-Chart2Table
No description available.
Xenova/vit-gpt2-image-captioning
No description available.
microsoft/git-large-coco
GIT is a Transformer decoder conditioned on both CLIP image tokens and text tokens. The model is trained using "teacher forcing" on a lot of...
mradermacher/Qwen2.5-VL-7B-Abliterated-Caption-it-GGUF
No description available.
PaddlePaddle/korean_PP-OCRv5_mobile_rec
No description available.
nyu-visionx/Cambrian-S-7B
Architecture: Qwen2.5-7B-Instruct + SigLIP2-SO400M vision encoder + 2-layer MLP adapter - Parameters: 7B - Vision Encoder: SigLIP-384 (SiGLI...
noctrex/LightOnOCR-2-1B-GGUF
This are the quantizations of the model LightOnOCR-2-1B...
PaddlePaddle/PP-OCRv4_mobile_det
No description available.
mradermacher/Hulu-Med-30A3-i1-GGUF
No description available.
mrrtmob/kiri-ocr
No description available.
PaddlePaddle/eslav_PP-OCRv5_mobile_rec
No description available.
facebook/nougat-small
Nougat is a Donut model trained to transcribe scientific PDFs into an easy-to-use markdown format. The model consists of a Swin Transformer ...
ydshieh/vit-gpt2-coco-en
No description available.
FireRedTeam/FireRed-OCR
No description available.
naver-clova-ix/donut-base-finetuned-rvlcdip
Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a ...
PaddlePaddle/PP-LCNet_x0_25_textline_ori
No description available.
mlx-community/GLM-OCR-bf16
No description available.
microsoft/git-base-coco
GIT is a Transformer decoder conditioned on both CLIP image tokens and text tokens. The model is trained using "teacher forcing" on a lot of...
PaddlePaddle/PP-OCRv3_mobile_det
No description available.
PaddlePaddle/en_PP-OCRv4_mobile_rec
No description available.
breezedeus/pix2text-mfd
No description available.
noctrex/Chandra-OCR-GGUF
These are quantizations of the model Chandra-OCR...
xtuner/llava-phi-3-mini-gguf
No description available.
mradermacher/Qwen3-VL-8B-Abliterated-Caption-it-i1-GGUF
No description available.
google/pix2struct-base
No description available.
Norm/nougat-latex-base
No description available.
noctrex/PaddleOCR-VL-1.5-GGUF
These are quantizations of the model PaddleOCR-VL-1.5...
sbintuitions/sarashina2.2-vision-3b
No description available.
unsloth/GLM-OCR
No description available.
kazars24/trocr-base-handwritten-ru
No description available.
PaddlePaddle/PP-OCRv4_mobile_rec
No description available.
PaddlePaddle/arabic_PP-OCRv5_mobile_rec
No description available.
PaddlePaddle/en_PP-OCRv3_mobile_rec
No description available.
openthaigpt/thai-trocr
No description available.
thwri/CogFlorence-2.2-Large
No description available.
to-be/donut-base-finetuned-invoices
Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a ...
PaddlePaddle/PP-DocLayout-L
No description available.
unography/blip-large-long-cap
No description available.
PaddlePaddle/ta_PP-OCRv5_mobile_rec
No description available.
noctrex/ZwZ-8B-GGUF
These are quantizations of the model ZwZ-8B, using a imatrix created from text\en\medium...
logasanjeev/indian-id-validator
Below is a detailed breakdown of each model, including the classes they detect and their evaluation metrics on a custom Indian ID dataset. |...
zhangzicheng/q-sit-mini
No description available.
mradermacher/Qwen3-VL-8B-Abliterated-Caption-it-GGUF
No description available.
mradermacher/QwenStoryteller-i1-GGUF
No description available.
breezedeus/pix2text-mfr-1.5
This MFR model utilizes the TrOCR architecture developed by Microsoft, starting with its initial values and retrained using a dataset of mat...