Results for "visual-question-answering"

51 matches found.

Salesforce

Salesforce/blip-vqa-base

No description available.

❓ visual-question-answering 696,508
dandelin

dandelin/vilt-b32-finetuned-vqa

No description available.

❓ visual-question-answering 69,353
DAMO-NLP-SG

DAMO-NLP-SG/VideoLLaMA2.1-7B-AV

No description available.

❓ visual-question-answering 44,456
Salesforce

Salesforce/blip-vqa-capfilt-large

No description available.

❓ visual-question-answering 23,834
google

google/deplot

No description available.

❓ visual-question-answering 12,715
TIGER-Lab

TIGER-Lab/VideoScore2

No description available.

❓ visual-question-answering 10,393
DAMO-NLP-SG

DAMO-NLP-SG/VideoLLaMA2-7B

No description available.

❓ visual-question-answering 8,985
chaoyinshe

chaoyinshe/llava-med-v1.5-mistral-7b-hf

No description available.

❓ visual-question-answering 5,958
openbmb

openbmb/MiniCPM-V-2

No description available.

❓ visual-question-answering 3,745
internlm

internlm/internlm-xcomposer2d5-7b

No description available.

❓ visual-question-answering 2,200
google

google/pix2struct-docvqa-base

No description available.

❓ visual-question-answering 1,844
TIGER-Lab

TIGER-Lab/VideoScore

No description available.

❓ visual-question-answering 1,549
internlm

internlm/internlm-xcomposer2-vl-7b

No description available.

❓ visual-question-answering 1,538
internlm

internlm/internlm-xcomposer2-4khd-7b

No description available.

❓ visual-question-answering 1,537
google

google/pix2struct-ai2d-base

No description available.

❓ visual-question-answering 1,343
openbmb

openbmb/MiniCPM-V

No description available.

❓ visual-question-answering 1,175
google

google/pix2struct-chartqa-base

No description available.

❓ visual-question-answering 861
second-state

second-state/MiniCPM-V-4_5-GGUF

openbmb/MiniCPM-V-45 - LlamaEdge version: coming soon - Prompt template - Prompt type: `minicpmv` - Prompt string ```text {systemmessage} {u...

❓ visual-question-answering 696
openbmb

openbmb/MiniCPM-Llama3-V-2_5-int4

No description available.

❓ visual-question-answering 676
microsoft

microsoft/git-base-textvqa

GIT is a Transformer decoder conditioned on both CLIP image tokens and text tokens. The model is trained using "teacher forcing" on a lot of...

❓ visual-question-answering 611
zenlm

zenlm/zen-designer-235b-a22b-thinking

No description available.

❓ visual-question-answering 583
zenlm

zenlm/zen-designer-235b-a22b-instruct

No description available.

❓ visual-question-answering 581
mPLUG

mPLUG/mPLUG-Owl3-2B-241014

No description available.

❓ visual-question-answering 577
TIGER-Lab

TIGER-Lab/VL-Rethinker-72B

No description available.

❓ visual-question-answering 575
DAMO-NLP-SG

DAMO-NLP-SG/VideoLLaMA3-7B-Image

No description available.

❓ visual-question-answering 497
mradermacher

mradermacher/MemOCR-7B-GGUF

No description available.

❓ visual-question-answering 465
Lin-Chen

Lin-Chen/sharegpt4video-8b

Model type: sharegpt4video-8b is an open-source video chatbot trained by fine-tuning the entire model on open-source video instruction data....

❓ visual-question-answering 396
RussRobin

RussRobin/SpatialBot-3B

No description available.

❓ visual-question-answering 391
sdasd112132

sdasd112132/Vision-8B-MiniCPM-2_5-Uncensored-and-Detailed-4bit

No description available.

❓ visual-question-answering 388
GeorgyGUF

GeorgyGUF/INFRL-Qwen2.5-VL-72B-Preview-ggufs-fully-quantized

INFRL-Qwen2.5-VL-72B-Preview improves visual reasoning upon Qwen2.5-VL-72B-Instruct model. - As of March 25th, 2025, INFRL-Qwen2.5-VL-72B-Pr...

❓ visual-question-answering 365
google

google/matcha-chartqa

No description available.

❓ visual-question-answering 360
Cylingo

Cylingo/Xinyuan-VL-2B

No description available.

❓ visual-question-answering 345
second-state

second-state/MiniCPM-Llama3-V-2_5-GGUF

openbmb/MiniCPM-Llama3-V-25 - LlamaEdge version: coming soon...

❓ visual-question-answering 328
gaianet

gaianet/MiniCPM-V-4_5-GGUF

No description available.

❓ visual-question-answering 319
AI-Safeguard

AI-Safeguard/Ivy-VL-llava

!logo.jpg Ivy-VL is a lightweight multimodal model with only 3B parameters.  It accepts both image and text inputs to generate text outputs....

❓ visual-question-answering 318
erax-ai

erax-ai/EraX-VL-7B-V1.5

- vi - en - zh basemodel: - Qwen/Qwen2-VL-7B-Instruct libraryname: transformers - erax - multimodal - erax-vl-7B - insurance - ocr - vietnam...

❓ visual-question-answering 316
second-state

second-state/MiniCPM-V-2_6-GGUF

No description available.

❓ visual-question-answering 313
second-state

second-state/MiniCPM-V-4-GGUF

No description available.

❓ visual-question-answering 295
ybelkada

ybelkada/blip2-opt-2.7b-fp16-sharded

No description available....

❓ visual-question-answering 282
BAAI

BAAI/Aquila-VL-2B-llava-qwen

No description available.

❓ visual-question-answering 274
mPLUG

mPLUG/mPLUG-Owl3-7B-241101

No description available.

❓ visual-question-answering 252
ivelin

ivelin/donut-refexp-combined-v1

No description available.

❓ visual-question-answering 248
google

google/pix2struct-docvqa-large

No description available.

❓ visual-question-answering 240
google

google/matcha-base

No description available.

❓ visual-question-answering 214
google

google/matcha-plotqa-v2

No description available.

❓ visual-question-answering 196
microsoft

microsoft/git-base-vqav2

GIT is a Transformer decoder conditioned on both CLIP image tokens and text tokens. The model is trained using "teacher forcing" on a lot of...

❓ visual-question-answering 192
erax-ai

erax-ai/EraX-VL-2B-V1.5

- vi - en - zh basemodel: - Qwen/Qwen2-VL-2B-Instruct libraryname: transformers - erax - multimodal - erax-vl-2B - insurance - ocr - vietnam...

❓ visual-question-answering 176
mradermacher

mradermacher/TreeVGR-7B-CI-i1-GGUF

No description available.

❓ visual-question-answering 167
openbmb

openbmb/OmniLMM-12B

No description available.

❓ visual-question-answering 167
gaianet

gaianet/MiniCPM-Llama3-V-2_5-GGUF

No description available.

❓ visual-question-answering 167
google

google/matcha-chart2text-statista

No description available.

❓ visual-question-answering 141