Nirman.online | Premium AI Directory

Salesforce

Salesforce/blip-vqa-base

No description available.

❓ visual-question-answering 696,508

dandelin

dandelin/vilt-b32-finetuned-vqa

No description available.

❓ visual-question-answering 69,353

DAMO-NLP-SG

DAMO-NLP-SG/VideoLLaMA2.1-7B-AV

No description available.

❓ visual-question-answering 44,456

Salesforce

Salesforce/blip-vqa-capfilt-large

No description available.

❓ visual-question-answering 23,834

google

google/deplot

No description available.

❓ visual-question-answering 12,715

TIGER-Lab

TIGER-Lab/VideoScore2

No description available.

❓ visual-question-answering 10,393

DAMO-NLP-SG

DAMO-NLP-SG/VideoLLaMA2-7B

No description available.

❓ visual-question-answering 8,985

chaoyinshe

chaoyinshe/llava-med-v1.5-mistral-7b-hf

No description available.

❓ visual-question-answering 5,958

openbmb

openbmb/MiniCPM-V-2

No description available.

❓ visual-question-answering 3,745

internlm

internlm/internlm-xcomposer2d5-7b

No description available.

❓ visual-question-answering 2,200

google

google/pix2struct-docvqa-base

No description available.

❓ visual-question-answering 1,844

TIGER-Lab

TIGER-Lab/VideoScore

No description available.

❓ visual-question-answering 1,549

internlm

internlm/internlm-xcomposer2-vl-7b

No description available.

❓ visual-question-answering 1,538

internlm

internlm/internlm-xcomposer2-4khd-7b

No description available.

❓ visual-question-answering 1,537

google

google/pix2struct-ai2d-base

No description available.

❓ visual-question-answering 1,343

openbmb

openbmb/MiniCPM-V

No description available.

❓ visual-question-answering 1,175

google

google/pix2struct-chartqa-base

No description available.

❓ visual-question-answering 861

second-state

second-state/MiniCPM-V-4_5-GGUF

openbmb/MiniCPM-V-45 - LlamaEdge version: coming soon - Prompt template - Prompt type: `minicpmv` - Prompt string ```text {systemmessage} {u...

❓ visual-question-answering 696

openbmb

openbmb/MiniCPM-Llama3-V-2_5-int4

No description available.

❓ visual-question-answering 676

microsoft

microsoft/git-base-textvqa

GIT is a Transformer decoder conditioned on both CLIP image tokens and text tokens. The model is trained using "teacher forcing" on a lot of...

❓ visual-question-answering 611

zenlm

zenlm/zen-designer-235b-a22b-thinking

No description available.

❓ visual-question-answering 583

zenlm

zenlm/zen-designer-235b-a22b-instruct

No description available.

❓ visual-question-answering 581

mPLUG

mPLUG/mPLUG-Owl3-2B-241014

No description available.

❓ visual-question-answering 577

TIGER-Lab

TIGER-Lab/VL-Rethinker-72B

No description available.

❓ visual-question-answering 575

DAMO-NLP-SG

DAMO-NLP-SG/VideoLLaMA3-7B-Image

No description available.

❓ visual-question-answering 497

mradermacher

mradermacher/MemOCR-7B-GGUF

No description available.

❓ visual-question-answering 465

Lin-Chen

Lin-Chen/sharegpt4video-8b

Model type: sharegpt4video-8b is an open-source video chatbot trained by fine-tuning the entire model on open-source video instruction data....

❓ visual-question-answering 396

RussRobin

RussRobin/SpatialBot-3B

No description available.

❓ visual-question-answering 391

sdasd112132

sdasd112132/Vision-8B-MiniCPM-2_5-Uncensored-and-Detailed-4bit

No description available.

❓ visual-question-answering 388

GeorgyGUF

GeorgyGUF/INFRL-Qwen2.5-VL-72B-Preview-ggufs-fully-quantized

INFRL-Qwen2.5-VL-72B-Preview improves visual reasoning upon Qwen2.5-VL-72B-Instruct model. - As of March 25th, 2025, INFRL-Qwen2.5-VL-72B-Pr...

❓ visual-question-answering 365

google

google/matcha-chartqa

No description available.

❓ visual-question-answering 360

Cylingo

Cylingo/Xinyuan-VL-2B

No description available.

❓ visual-question-answering 345

second-state

second-state/MiniCPM-Llama3-V-2_5-GGUF

openbmb/MiniCPM-Llama3-V-25 - LlamaEdge version: coming soon...

❓ visual-question-answering 328

gaianet

gaianet/MiniCPM-V-4_5-GGUF

No description available.

❓ visual-question-answering 319

AI-Safeguard

AI-Safeguard/Ivy-VL-llava

!logo.jpg Ivy-VL is a lightweight multimodal model with only 3B parameters. It accepts both image and text inputs to generate text outputs....

❓ visual-question-answering 318

erax-ai

erax-ai/EraX-VL-7B-V1.5

- vi - en - zh basemodel: - Qwen/Qwen2-VL-7B-Instruct libraryname: transformers - erax - multimodal - erax-vl-7B - insurance - ocr - vietnam...

❓ visual-question-answering 316

second-state

second-state/MiniCPM-V-2_6-GGUF

No description available.

❓ visual-question-answering 313

second-state

second-state/MiniCPM-V-4-GGUF

No description available.

❓ visual-question-answering 295

ybelkada

ybelkada/blip2-opt-2.7b-fp16-sharded

No description available....

❓ visual-question-answering 282

BAAI

BAAI/Aquila-VL-2B-llava-qwen

No description available.

❓ visual-question-answering 274

mPLUG

mPLUG/mPLUG-Owl3-7B-241101

No description available.

❓ visual-question-answering 252

ivelin

ivelin/donut-refexp-combined-v1

No description available.

❓ visual-question-answering 248

google

google/pix2struct-docvqa-large

No description available.

❓ visual-question-answering 240

google

google/matcha-base

No description available.

❓ visual-question-answering 214

google

google/matcha-plotqa-v2

No description available.

❓ visual-question-answering 196

microsoft

microsoft/git-base-vqav2

GIT is a Transformer decoder conditioned on both CLIP image tokens and text tokens. The model is trained using "teacher forcing" on a lot of...

❓ visual-question-answering 192

erax-ai

erax-ai/EraX-VL-2B-V1.5

- vi - en - zh basemodel: - Qwen/Qwen2-VL-2B-Instruct libraryname: transformers - erax - multimodal - erax-vl-2B - insurance - ocr - vietnam...

❓ visual-question-answering 176

mradermacher

mradermacher/TreeVGR-7B-CI-i1-GGUF

No description available.

❓ visual-question-answering 167

openbmb

openbmb/OmniLMM-12B

No description available.

❓ visual-question-answering 167

gaianet

gaianet/MiniCPM-Llama3-V-2_5-GGUF

No description available.

❓ visual-question-answering 167

google

google/matcha-chart2text-statista

No description available.

❓ visual-question-answering 141

Results for "visual-question-answering"

Salesforce/blip-vqa-base

dandelin/vilt-b32-finetuned-vqa

DAMO-NLP-SG/VideoLLaMA2.1-7B-AV

Salesforce/blip-vqa-capfilt-large

google/deplot

TIGER-Lab/VideoScore2

DAMO-NLP-SG/VideoLLaMA2-7B

chaoyinshe/llava-med-v1.5-mistral-7b-hf

openbmb/MiniCPM-V-2

internlm/internlm-xcomposer2d5-7b

google/pix2struct-docvqa-base

TIGER-Lab/VideoScore

internlm/internlm-xcomposer2-vl-7b

internlm/internlm-xcomposer2-4khd-7b

google/pix2struct-ai2d-base

openbmb/MiniCPM-V

google/pix2struct-chartqa-base

second-state/MiniCPM-V-4_5-GGUF

openbmb/MiniCPM-Llama3-V-2_5-int4

microsoft/git-base-textvqa

zenlm/zen-designer-235b-a22b-thinking

zenlm/zen-designer-235b-a22b-instruct

mPLUG/mPLUG-Owl3-2B-241014

TIGER-Lab/VL-Rethinker-72B

DAMO-NLP-SG/VideoLLaMA3-7B-Image

mradermacher/MemOCR-7B-GGUF

Lin-Chen/sharegpt4video-8b

RussRobin/SpatialBot-3B

sdasd112132/Vision-8B-MiniCPM-2_5-Uncensored-and-Detailed-4bit

GeorgyGUF/INFRL-Qwen2.5-VL-72B-Preview-ggufs-fully-quantized

google/matcha-chartqa

Cylingo/Xinyuan-VL-2B

second-state/MiniCPM-Llama3-V-2_5-GGUF

gaianet/MiniCPM-V-4_5-GGUF

AI-Safeguard/Ivy-VL-llava

erax-ai/EraX-VL-7B-V1.5

second-state/MiniCPM-V-2_6-GGUF

second-state/MiniCPM-V-4-GGUF

ybelkada/blip2-opt-2.7b-fp16-sharded

BAAI/Aquila-VL-2B-llava-qwen

mPLUG/mPLUG-Owl3-7B-241101

ivelin/donut-refexp-combined-v1

google/pix2struct-docvqa-large

google/matcha-base

google/matcha-plotqa-v2

microsoft/git-base-vqav2

erax-ai/EraX-VL-2B-V1.5

mradermacher/TreeVGR-7B-CI-i1-GGUF

openbmb/OmniLMM-12B

gaianet/MiniCPM-Llama3-V-2_5-GGUF

google/matcha-chart2text-statista