Nirman.online | Premium AI Directory

Salesforce

Salesforce/blip-image-captioning-base

No description available.

🖼️ image-to-text 3,111,099

zai-org

zai-org/GLM-OCR

No description available.

🖼️ image-to-text 2,515,619

Salesforce

Salesforce/blip-image-captioning-large

No description available.

🖼️ image-to-text 906,541

microsoft

microsoft/trocr-large-printed

The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image enc...

🖼️ image-to-text 762,173

PaddlePaddle

PaddlePaddle/UVDoc

No description available.

🖼️ image-to-text 639,443

microsoft

microsoft/trocr-base-printed

The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image enc...

🖼️ image-to-text 609,043

PaddlePaddle

PaddlePaddle/PP-LCNet_x1_0_doc_ori

No description available.

🖼️ image-to-text 514,816

xtuner

xtuner/llava-llama-3-8b-v1_1-gguf

No description available.

🖼️ image-to-text 469,006

naver-clova-ix

naver-clova-ix/donut-base

Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a ...

🖼️ image-to-text 464,616

PaddlePaddle

PaddlePaddle/PP-OCRv5_server_det

No description available.

🖼️ image-to-text 452,998

Salesforce

Salesforce/blip2-opt-2.7b-coco

BLIP-2 consists of 3 models: a CLIP-like image encoder, a Querying Transformer (Q-Former) and a large language model. The authors initialize...

🖼️ image-to-text 409,659

nlpconnect

nlpconnect/vit-gpt2-image-captioning

No description available.

🖼️ image-to-text 326,819

PaddlePaddle

PaddlePaddle/en_PP-OCRv5_mobile_rec

No description available.

🖼️ image-to-text 297,397

kha-white

kha-white/manga-ocr-base

No description available.

🖼️ image-to-text 252,329

microsoft

microsoft/trocr-base-handwritten

The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image enc...

🖼️ image-to-text 227,354

PaddlePaddle

PaddlePaddle/PP-LCNet_x1_0_textline_ori

No description available.

🖼️ image-to-text 222,836

facebook

facebook/nougat-base

Nougat is a Donut model trained to transcribe scientific PDFs into an easy-to-use markdown format. The model consists of a Swin Transformer ...

🖼️ image-to-text 220,358

microsoft

microsoft/kosmos-2-patch14-224

No description available.

🖼️ image-to-text 177,502

microsoft

microsoft/trocr-large-handwritten

The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image enc...

🖼️ image-to-text 165,839

lightonai

lightonai/LightOnOCR-1B-1025

LightOnOCR combines a Vision Transformer encoder(Pixtral-based) with a lightweight text decoder(Qwen3-based) distilled from high-quality ope...

🖼️ image-to-text 154,269

breezedeus

breezedeus/pix2text-mfr

This MFR model utilizes the TrOCR architecture developed by Microsoft, starting with its initial values and retrained using a dataset of mat...

🖼️ image-to-text 124,565

alibaba-damo

alibaba-damo/mgp-str-base

MGP-STR is pure vision STR model, consisting of ViT and specially designed A^3 modules. The ViT module was initialized from the weights of D...

🖼️ image-to-text 108,960

rtr46

rtr46/meiki.txt.recognition.v0

No description available.

🖼️ image-to-text 90,746

rtr46

rtr46/meiki.text.detect.v0

No description available.

🖼️ image-to-text 89,364

numind

numind/NuMarkdown-8B-Thinking

No description available.

🖼️ image-to-text 79,290

PaddlePaddle

PaddlePaddle/PP-OCRv5_server_rec

No description available.

🖼️ image-to-text 69,351

PaddlePaddle

PaddlePaddle/latin_PP-OCRv5_mobile_rec

No description available.

🖼️ image-to-text 61,402

naver-clova-ix

naver-clova-ix/donut-base-finetuned-cord-v2

Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a ...

🖼️ image-to-text 59,081

ibm-granite

ibm-granite/granite-vision-3.3-2b

No description available.

🖼️ image-to-text 56,468

PaddlePaddle

PaddlePaddle/PP-OCRv5_mobile_det

No description available.

🖼️ image-to-text 53,940

OleehyO

OleehyO/TexTeller

No description available.

🖼️ image-to-text 49,158

hezarai

hezarai/crnn-base-fa-v2

A CRNN model for Persian OCR. This model is based on a simple CNN + LSTM architecture inspired by this paper....

🖼️ image-to-text 32,961

optimum-intel-internal-testing

optimum-intel-internal-testing/pix2struct-tiny-random

No description available.

🖼️ image-to-text 31,611

microsoft

microsoft/trocr-small-printed

The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image enc...

🖼️ image-to-text 29,667

laion

laion/mscoco_finetuned_CoCa-ViT-L-14-laion2B-s13B-b90k

No description available.

🖼️ image-to-text 22,089

Riksarkivet

Riksarkivet/trocr-base-handwritten-hist-swe-2

No description available.

🖼️ image-to-text 17,063

microsoft

microsoft/trocr-base-stage1

The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image enc...

🖼️ image-to-text 15,995

PaddlePaddle

PaddlePaddle/RT-DETR-L_wired_table_cell_det

No description available.

🖼️ image-to-text 14,820

microsoft

microsoft/trocr-small-handwritten

The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image enc...

🖼️ image-to-text 14,054

PaddlePaddle

PaddlePaddle/PP-DocLayout_plus-L

No description available.

🖼️ image-to-text 13,910

PaddlePaddle

PaddlePaddle/RT-DETR-L_wireless_table_cell_det

No description available.

🖼️ image-to-text 13,543

optimum-intel-internal-testing

optimum-intel-internal-testing/trocr-small-handwritten

The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image enc...

🖼️ image-to-text 12,264

zhiyuanyou

zhiyuanyou/DeQA-Score-Mix3

No description available.

🖼️ image-to-text 12,105

PaddlePaddle

PaddlePaddle/PP-OCRv5_mobile_rec

No description available.

🖼️ image-to-text 11,546

microsoft

microsoft/git-base

GIT is a Transformer decoder conditioned on both CLIP image tokens and text tokens. The model is trained using "teacher forcing" on a lot of...

🖼️ image-to-text 10,910

PaddlePaddle

PaddlePaddle/PP-LCNet_x1_0_table_cls

No description available.

🖼️ image-to-text 10,193

microsoft

microsoft/trocr-small-stage1

The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image enc...

🖼️ image-to-text 10,045

PaddlePaddle

PaddlePaddle/PP-DocBlockLayout

No description available.

🖼️ image-to-text 9,442

google

google/pix2struct-textcaps-base

No description available.

🖼️ image-to-text 8,410

IAMJB

IAMJB/chexpert-mimic-cxr-findings-baseline

Evaluation on chexpert-plus...

🖼️ image-to-text 8,227

PaddlePaddle

PaddlePaddle/SLANeXt_wired

No description available.

🖼️ image-to-text 8,012

IAMJB

IAMJB/chexpert-mimic-cxr-impression-baseline

Evaluation on chexpert-plus...

🖼️ image-to-text 7,765

PaddlePaddle

PaddlePaddle/SLANet_plus

No description available.

🖼️ image-to-text 7,550

PaddlePaddle

PaddlePaddle/PP-FormulaNet_plus-L

No description available.

🖼️ image-to-text 7,362

fxmarty

fxmarty/pix2struct-tiny-random

No description available.

🖼️ image-to-text 7,030

PaddlePaddle

PaddlePaddle/PP-Chart2Table

No description available.

🖼️ image-to-text 6,143

Xenova

Xenova/vit-gpt2-image-captioning

No description available.

🖼️ image-to-text 5,849

microsoft

microsoft/git-large-coco

GIT is a Transformer decoder conditioned on both CLIP image tokens and text tokens. The model is trained using "teacher forcing" on a lot of...

🖼️ image-to-text 5,835

mradermacher

mradermacher/Qwen2.5-VL-7B-Abliterated-Caption-it-GGUF

No description available.

🖼️ image-to-text 5,249

PaddlePaddle

PaddlePaddle/korean_PP-OCRv5_mobile_rec

No description available.

🖼️ image-to-text 5,199

nyu-visionx

nyu-visionx/Cambrian-S-7B

Architecture: Qwen2.5-7B-Instruct + SigLIP2-SO400M vision encoder + 2-layer MLP adapter - Parameters: 7B - Vision Encoder: SigLIP-384 (SiGLI...

🖼️ image-to-text 5,157

noctrex

noctrex/LightOnOCR-2-1B-GGUF

This are the quantizations of the model LightOnOCR-2-1B...

🖼️ image-to-text 5,043

PaddlePaddle

PaddlePaddle/PP-OCRv4_mobile_det

No description available.

🖼️ image-to-text 4,956

mradermacher

mradermacher/Hulu-Med-30A3-i1-GGUF

No description available.

🖼️ image-to-text 4,588

mrrtmob

mrrtmob/kiri-ocr

No description available.

🖼️ image-to-text 4,527

PaddlePaddle

PaddlePaddle/eslav_PP-OCRv5_mobile_rec

No description available.

🖼️ image-to-text 4,488

facebook

facebook/nougat-small

Nougat is a Donut model trained to transcribe scientific PDFs into an easy-to-use markdown format. The model consists of a Swin Transformer ...

🖼️ image-to-text 4,394

ydshieh

ydshieh/vit-gpt2-coco-en

No description available.

🖼️ image-to-text 4,304

FireRedTeam

FireRedTeam/FireRed-OCR

No description available.

🖼️ image-to-text 4,178

naver-clova-ix

naver-clova-ix/donut-base-finetuned-rvlcdip

Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a ...

🖼️ image-to-text 4,101

PaddlePaddle

PaddlePaddle/PP-LCNet_x0_25_textline_ori

No description available.

🖼️ image-to-text 3,993

mlx-community

mlx-community/GLM-OCR-bf16

No description available.

🖼️ image-to-text 3,947

microsoft

microsoft/git-base-coco

GIT is a Transformer decoder conditioned on both CLIP image tokens and text tokens. The model is trained using "teacher forcing" on a lot of...

🖼️ image-to-text 3,752

PaddlePaddle

PaddlePaddle/PP-OCRv3_mobile_det

No description available.

🖼️ image-to-text 3,678

PaddlePaddle

PaddlePaddle/en_PP-OCRv4_mobile_rec

No description available.

🖼️ image-to-text 3,490

breezedeus

breezedeus/pix2text-mfd

No description available.

🖼️ image-to-text 3,453

noctrex

noctrex/Chandra-OCR-GGUF

These are quantizations of the model Chandra-OCR...

🖼️ image-to-text 3,392

xtuner

xtuner/llava-phi-3-mini-gguf

No description available.

🖼️ image-to-text 3,325

mradermacher

mradermacher/Qwen3-VL-8B-Abliterated-Caption-it-i1-GGUF

No description available.

🖼️ image-to-text 3,039

google

google/pix2struct-base

No description available.

🖼️ image-to-text 2,992

Norm

Norm/nougat-latex-base

No description available.

🖼️ image-to-text 2,983

noctrex

noctrex/PaddleOCR-VL-1.5-GGUF

These are quantizations of the model PaddleOCR-VL-1.5...

🖼️ image-to-text 2,552

sbintuitions

sbintuitions/sarashina2.2-vision-3b

No description available.

🖼️ image-to-text 2,451

unsloth

unsloth/GLM-OCR

No description available.

🖼️ image-to-text 2,412

kazars24

kazars24/trocr-base-handwritten-ru

No description available.

🖼️ image-to-text 2,362

PaddlePaddle

PaddlePaddle/PP-OCRv4_mobile_rec

No description available.

🖼️ image-to-text 2,326

PaddlePaddle

PaddlePaddle/arabic_PP-OCRv5_mobile_rec

No description available.

🖼️ image-to-text 2,316

PaddlePaddle

PaddlePaddle/en_PP-OCRv3_mobile_rec

No description available.

🖼️ image-to-text 2,164

openthaigpt

openthaigpt/thai-trocr

No description available.

🖼️ image-to-text 2,122

thwri

thwri/CogFlorence-2.2-Large

No description available.

🖼️ image-to-text 2,085

to-be

to-be/donut-base-finetuned-invoices

Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a ...

🖼️ image-to-text 1,932

PaddlePaddle

PaddlePaddle/PP-DocLayout-L

No description available.

🖼️ image-to-text 1,862

unography

unography/blip-large-long-cap

No description available.

🖼️ image-to-text 1,859

PaddlePaddle

PaddlePaddle/ta_PP-OCRv5_mobile_rec

No description available.

🖼️ image-to-text 1,809

noctrex

noctrex/ZwZ-8B-GGUF

These are quantizations of the model ZwZ-8B, using a imatrix created from text\en\medium...

🖼️ image-to-text 1,769

logasanjeev

logasanjeev/indian-id-validator

Below is a detailed breakdown of each model, including the classes they detect and their evaluation metrics on a custom Indian ID dataset. |...

🖼️ image-to-text 1,746

zhangzicheng

zhangzicheng/q-sit-mini

No description available.

🖼️ image-to-text 1,556

mradermacher

mradermacher/Qwen3-VL-8B-Abliterated-Caption-it-GGUF

No description available.

🖼️ image-to-text 1,556

mradermacher

mradermacher/QwenStoryteller-i1-GGUF

No description available.

🖼️ image-to-text 1,552

breezedeus

breezedeus/pix2text-mfr-1.5

This MFR model utilizes the TrOCR architecture developed by Microsoft, starting with its initial values and retrained using a dataset of mat...

🖼️ image-to-text 1,523

Results for "image-to-text"

Salesforce/blip-image-captioning-base

zai-org/GLM-OCR

Salesforce/blip-image-captioning-large

microsoft/trocr-large-printed

PaddlePaddle/UVDoc

microsoft/trocr-base-printed

PaddlePaddle/PP-LCNet_x1_0_doc_ori

xtuner/llava-llama-3-8b-v1_1-gguf

naver-clova-ix/donut-base

PaddlePaddle/PP-OCRv5_server_det

Salesforce/blip2-opt-2.7b-coco

nlpconnect/vit-gpt2-image-captioning

PaddlePaddle/en_PP-OCRv5_mobile_rec

kha-white/manga-ocr-base

microsoft/trocr-base-handwritten

PaddlePaddle/PP-LCNet_x1_0_textline_ori

facebook/nougat-base

microsoft/kosmos-2-patch14-224

microsoft/trocr-large-handwritten

lightonai/LightOnOCR-1B-1025

breezedeus/pix2text-mfr

alibaba-damo/mgp-str-base

rtr46/meiki.txt.recognition.v0

rtr46/meiki.text.detect.v0

numind/NuMarkdown-8B-Thinking

PaddlePaddle/PP-OCRv5_server_rec

PaddlePaddle/latin_PP-OCRv5_mobile_rec

naver-clova-ix/donut-base-finetuned-cord-v2

ibm-granite/granite-vision-3.3-2b

PaddlePaddle/PP-OCRv5_mobile_det

OleehyO/TexTeller

hezarai/crnn-base-fa-v2

optimum-intel-internal-testing/pix2struct-tiny-random

microsoft/trocr-small-printed

laion/mscoco_finetuned_CoCa-ViT-L-14-laion2B-s13B-b90k

Riksarkivet/trocr-base-handwritten-hist-swe-2

microsoft/trocr-base-stage1

PaddlePaddle/RT-DETR-L_wired_table_cell_det

microsoft/trocr-small-handwritten

PaddlePaddle/PP-DocLayout_plus-L

PaddlePaddle/RT-DETR-L_wireless_table_cell_det

optimum-intel-internal-testing/trocr-small-handwritten

zhiyuanyou/DeQA-Score-Mix3

PaddlePaddle/PP-OCRv5_mobile_rec

microsoft/git-base

PaddlePaddle/PP-LCNet_x1_0_table_cls

microsoft/trocr-small-stage1

PaddlePaddle/PP-DocBlockLayout

google/pix2struct-textcaps-base

IAMJB/chexpert-mimic-cxr-findings-baseline

PaddlePaddle/SLANeXt_wired

IAMJB/chexpert-mimic-cxr-impression-baseline

PaddlePaddle/SLANet_plus

PaddlePaddle/PP-FormulaNet_plus-L

fxmarty/pix2struct-tiny-random

PaddlePaddle/PP-Chart2Table

Xenova/vit-gpt2-image-captioning

microsoft/git-large-coco

mradermacher/Qwen2.5-VL-7B-Abliterated-Caption-it-GGUF

PaddlePaddle/korean_PP-OCRv5_mobile_rec

nyu-visionx/Cambrian-S-7B

noctrex/LightOnOCR-2-1B-GGUF

PaddlePaddle/PP-OCRv4_mobile_det

mradermacher/Hulu-Med-30A3-i1-GGUF

mrrtmob/kiri-ocr

PaddlePaddle/eslav_PP-OCRv5_mobile_rec

facebook/nougat-small

ydshieh/vit-gpt2-coco-en

FireRedTeam/FireRed-OCR

naver-clova-ix/donut-base-finetuned-rvlcdip

PaddlePaddle/PP-LCNet_x0_25_textline_ori

mlx-community/GLM-OCR-bf16

microsoft/git-base-coco

PaddlePaddle/PP-OCRv3_mobile_det

PaddlePaddle/en_PP-OCRv4_mobile_rec

breezedeus/pix2text-mfd

noctrex/Chandra-OCR-GGUF

xtuner/llava-phi-3-mini-gguf

mradermacher/Qwen3-VL-8B-Abliterated-Caption-it-i1-GGUF