Results for "visual-question-answering"
51 matches found.
Salesforce/blip-vqa-base
No description available.
dandelin/vilt-b32-finetuned-vqa
No description available.
DAMO-NLP-SG/VideoLLaMA2.1-7B-AV
No description available.
Salesforce/blip-vqa-capfilt-large
No description available.
google/deplot
No description available.
TIGER-Lab/VideoScore2
No description available.
DAMO-NLP-SG/VideoLLaMA2-7B
No description available.
chaoyinshe/llava-med-v1.5-mistral-7b-hf
No description available.
openbmb/MiniCPM-V-2
No description available.
internlm/internlm-xcomposer2d5-7b
No description available.
google/pix2struct-docvqa-base
No description available.
TIGER-Lab/VideoScore
No description available.
internlm/internlm-xcomposer2-vl-7b
No description available.
internlm/internlm-xcomposer2-4khd-7b
No description available.
google/pix2struct-ai2d-base
No description available.
openbmb/MiniCPM-V
No description available.
google/pix2struct-chartqa-base
No description available.
second-state/MiniCPM-V-4_5-GGUF
openbmb/MiniCPM-V-45 - LlamaEdge version: coming soon - Prompt template - Prompt type: `minicpmv` - Prompt string ```text {systemmessage} {u...
openbmb/MiniCPM-Llama3-V-2_5-int4
No description available.
microsoft/git-base-textvqa
GIT is a Transformer decoder conditioned on both CLIP image tokens and text tokens. The model is trained using "teacher forcing" on a lot of...
zenlm/zen-designer-235b-a22b-thinking
No description available.
zenlm/zen-designer-235b-a22b-instruct
No description available.
mPLUG/mPLUG-Owl3-2B-241014
No description available.
TIGER-Lab/VL-Rethinker-72B
No description available.
DAMO-NLP-SG/VideoLLaMA3-7B-Image
No description available.
mradermacher/MemOCR-7B-GGUF
No description available.
Lin-Chen/sharegpt4video-8b
Model type: sharegpt4video-8b is an open-source video chatbot trained by fine-tuning the entire model on open-source video instruction data....
RussRobin/SpatialBot-3B
No description available.
sdasd112132/Vision-8B-MiniCPM-2_5-Uncensored-and-Detailed-4bit
No description available.
GeorgyGUF/INFRL-Qwen2.5-VL-72B-Preview-ggufs-fully-quantized
INFRL-Qwen2.5-VL-72B-Preview improves visual reasoning upon Qwen2.5-VL-72B-Instruct model. - As of March 25th, 2025, INFRL-Qwen2.5-VL-72B-Pr...
google/matcha-chartqa
No description available.
Cylingo/Xinyuan-VL-2B
No description available.
second-state/MiniCPM-Llama3-V-2_5-GGUF
openbmb/MiniCPM-Llama3-V-25 - LlamaEdge version: coming soon...
gaianet/MiniCPM-V-4_5-GGUF
No description available.
AI-Safeguard/Ivy-VL-llava
!logo.jpg Ivy-VL is a lightweight multimodal model with only 3B parameters. It accepts both image and text inputs to generate text outputs....
erax-ai/EraX-VL-7B-V1.5
- vi - en - zh basemodel: - Qwen/Qwen2-VL-7B-Instruct libraryname: transformers - erax - multimodal - erax-vl-7B - insurance - ocr - vietnam...
second-state/MiniCPM-V-2_6-GGUF
No description available.
second-state/MiniCPM-V-4-GGUF
No description available.
ybelkada/blip2-opt-2.7b-fp16-sharded
No description available....
BAAI/Aquila-VL-2B-llava-qwen
No description available.
mPLUG/mPLUG-Owl3-7B-241101
No description available.
ivelin/donut-refexp-combined-v1
No description available.
google/pix2struct-docvqa-large
No description available.
google/matcha-base
No description available.
google/matcha-plotqa-v2
No description available.
microsoft/git-base-vqav2
GIT is a Transformer decoder conditioned on both CLIP image tokens and text tokens. The model is trained using "teacher forcing" on a lot of...
erax-ai/EraX-VL-2B-V1.5
- vi - en - zh basemodel: - Qwen/Qwen2-VL-2B-Instruct libraryname: transformers - erax - multimodal - erax-vl-2B - insurance - ocr - vietnam...
mradermacher/TreeVGR-7B-CI-i1-GGUF
No description available.
openbmb/OmniLMM-12B
No description available.
gaianet/MiniCPM-Llama3-V-2_5-GGUF
No description available.
google/matcha-chart2text-statista
No description available.