Results for "image-text-to-text"

100 matches found.

Qwen

Qwen/Qwen3-VL-2B-Instruct

No description available.

📖 image-text-to-text 13,328,669
Qwen

Qwen/Qwen3-VL-8B-Instruct

No description available.

📖 image-text-to-text 8,019,083
Qwen

Qwen/Qwen2.5-VL-7B-Instruct

No description available.

📖 image-text-to-text 4,086,666
vikhyatk

vikhyatk/moondream2

No description available.

📖 image-text-to-text 3,973,800
deepseek-ai

deepseek-ai/DeepSeek-OCR

No description available.

📖 image-text-to-text 3,589,752
Qwen

Qwen/Qwen2-VL-2B-Instruct

No description available.

📖 image-text-to-text 3,446,907
llava-hf

llava-hf/llava-1.5-7b-hf

Model type: LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It i...

📖 image-text-to-text 3,202,279
Qwen

Qwen/Qwen2.5-VL-3B-Instruct

No description available.

📖 image-text-to-text 2,722,310
google

google/gemma-3-4b-it

No description available.

📖 image-text-to-text 2,147,597
Qwen

Qwen/Qwen3-VL-30B-A3B-Instruct

No description available.

📖 image-text-to-text 2,064,489
moonshotai

moonshotai/Kimi-K2.5

No description available.

📖 image-text-to-text 1,936,885
Qwen

Qwen/Qwen3-VL-235B-A22B-Thinking

No description available.

📖 image-text-to-text 1,818,591
deepseek-ai

deepseek-ai/DeepSeek-OCR-2

No description available.

📖 image-text-to-text 1,640,793
Qwen

Qwen/Qwen2-VL-7B-Instruct

No description available.

📖 image-text-to-text 1,634,324
google

google/gemma-3-27b-it

No description available.

📖 image-text-to-text 1,493,679
Qwen

Qwen/Qwen3-VL-4B-Instruct

No description available.

📖 image-text-to-text 1,482,594
Qwen

Qwen/Qwen3.5-397B-A17B

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 397B...

📖 image-text-to-text 1,338,447
google

google/gemma-3-12b-it

No description available.

📖 image-text-to-text 1,334,645
microsoft

microsoft/Florence-2-large

No description available.

📖 image-text-to-text 1,172,282
Qwen

Qwen/Qwen2-VL-7B-Instruct-AWQ

No description available.

📖 image-text-to-text 1,115,503
OpenGVLab

OpenGVLab/InternVL2-2B

InternVL 2.0 is a multimodal large language model series, featuring models of various sizes. For each size, we release instruction-tuned mod...

📖 image-text-to-text 1,081,198
nvidia

nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1

No description available.

📖 image-text-to-text 1,035,149
Qwen

Qwen/Qwen2.5-VL-7B-Instruct-AWQ

No description available.

📖 image-text-to-text 1,001,731
Qwen

Qwen/Qwen3.5-35B-A3B

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 35B ...

📖 image-text-to-text 885,293
liuhaotian

liuhaotian/llava-v1.5-7b

Model type: LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It i...

📖 image-text-to-text 874,844
unsloth

unsloth/Qwen3.5-35B-A3B-GGUF

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 35B ...

📖 image-text-to-text 792,060
OpenGVLab

OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview-HF

No description available.

📖 image-text-to-text 720,909
mlx-community

mlx-community/gemma-3-4b-it-qat-4bit

No description available.

📖 image-text-to-text 689,742
llava-hf

llava-hf/llava-v1.6-mistral-7b-hf

LLaVa combines a pre-trained large language model with a pre-trained vision encoder for multimodal chatbot use cases. LLaVA 1.6 improves on ...

📖 image-text-to-text 634,728
microsoft

microsoft/Phi-3.5-vision-instruct

No description available.

📖 image-text-to-text 609,596
OpenGVLab

OpenGVLab/InternVL2-1B

InternVL 2.0 is a multimodal large language model series, featuring models of various sizes. For each size, we release instruction-tuned mod...

📖 image-text-to-text 572,531
google

google/translategemma-12b-it

No description available.

📖 image-text-to-text 560,424
llava-hf

llava-hf/llava-onevision-qwen2-0.5b-ov-hf

Model type: LLaVA-Onevision is an open-source multimodal LLM trained by fine-tuning Qwen2 on GPT-generated multimodal instruction-following ...

📖 image-text-to-text 552,843
Qwen

Qwen/Qwen3.5-35B-A3B-FP8

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 35B ...

📖 image-text-to-text 551,036
Qwen

Qwen/Qwen3-VL-32B-Instruct

No description available.

📖 image-text-to-text 538,222
tencent

tencent/HunyuanOCR

No description available.

📖 image-text-to-text 509,560
nvidia

nvidia/NVIDIA-Nemotron-Parse-v1.1

No description available.

📖 image-text-to-text 499,666
Salesforce

Salesforce/blip2-opt-2.7b

BLIP-2 consists of 3 models: a CLIP-like image encoder, a Querying Transformer (Q-Former) and a large language model. The authors initialize...

📖 image-text-to-text 472,964
Qwen

Qwen/Qwen3.5-27B

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 27B ...

📖 image-text-to-text 467,468
MBZUAI

MBZUAI/AIN

No description available.

📖 image-text-to-text 457,771
OpenGVLab

OpenGVLab/InternVL2-8B

InternVL 2.0 is a multimodal large language model series, featuring models of various sizes. For each size, we release instruction-tuned mod...

📖 image-text-to-text 454,131
nanonets

nanonets/Nanonets-OCR2-3B

No description available.

📖 image-text-to-text 425,601
trl-internal-testing

trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration

No description available.

📖 image-text-to-text 409,887
deepseek-ai

deepseek-ai/deepseek-vl2-tiny

No description available.

📖 image-text-to-text 397,671
Qwen

Qwen/Qwen3-VL-32B-Instruct-FP8

No description available.

📖 image-text-to-text 397,060
Qwen

Qwen/Qwen3-VL-8B-Instruct-FP8

No description available.

📖 image-text-to-text 366,829
unsloth

unsloth/Qwen3.5-27B-GGUF

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 27B ...

📖 image-text-to-text 352,050
zai-org

zai-org/GLM-4.1V-9B-Thinking

📖 View the GLM-4.1V-9B-Thinking paper. 📍 Using GLM-4.1V-9B-Thinking API at Zhipu Foundation Model Open Platform Vision-Language Models (VLMs...

📖 image-text-to-text 351,072
allenai

allenai/olmOCR-2-7B-1025

No description available.

📖 image-text-to-text 347,459
Qwen

Qwen/Qwen2.5-VL-72B-Instruct

No description available.

📖 image-text-to-text 343,079
Qwen

Qwen/Qwen3.5-9B

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 9B -...

📖 image-text-to-text 340,783
lmstudio-community

lmstudio-community/gemma-3-4b-it-GGUF

No description available.

📖 image-text-to-text 322,332
microsoft

microsoft/Florence-2-base

No description available.

📖 image-text-to-text 318,446
Qwen

Qwen/Qwen3-VL-235B-A22B-Instruct

No description available.

📖 image-text-to-text 306,862
Qwen

Qwen/Qwen3-VL-30B-A3B-Instruct-FP8

No description available.

📖 image-text-to-text 299,219
HuggingFaceTB

HuggingFaceTB/SmolVLM-256M-Instruct

No description available.

📖 image-text-to-text 297,182
google

google/gemma-3n-E2B-it

No description available.

📖 image-text-to-text 290,239
unsloth

unsloth/Qwen3.5-9B-GGUF

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 9B -...

📖 image-text-to-text 283,069
lightonai

lightonai/LightOnOCR-2-1B

No description available.

📖 image-text-to-text 282,423
lmstudio-community

lmstudio-community/GLM-4.6V-Flash-MLX-4bit

No description available.

📖 image-text-to-text 278,114
allenai

allenai/olmOCR-2-7B-1025-FP8

No description available.

📖 image-text-to-text 276,827
lmstudio-community

lmstudio-community/GLM-4.6V-Flash-MLX-8bit

No description available.

📖 image-text-to-text 273,483
lmstudio-community

lmstudio-community/GLM-4.6V-Flash-MLX-6bit

No description available.

📖 image-text-to-text 270,525
allenai

allenai/Molmo2-8B

No description available.

📖 image-text-to-text 267,024
meta-llama

meta-llama/Llama-3.2-11B-Vision-Instruct

- en - de - fr - it - pt - hi - es - th libraryname: transformers pipelinetag: image-text-to-text - facebook - meta - pytorch - llama - llam...

📖 image-text-to-text 257,551
unsloth

unsloth/Qwen3.5-122B-A10B-GGUF

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 122B...

📖 image-text-to-text 255,318
HuggingFaceTB

HuggingFaceTB/SmolVLM2-500M-Video-Instruct

No description available.

📖 image-text-to-text 253,742
OpenGVLab

OpenGVLab/InternVL3_5-14B

No description available.

📖 image-text-to-text 245,995
pytorch

pytorch/gemma-3-27b-it-AWQ-INT4

No description available.

📖 image-text-to-text 239,887
moonshotai

moonshotai/Kimi-VL-A3B-Instruct

No description available.

📖 image-text-to-text 238,613
Qwen

Qwen/Qwen3.5-397B-A17B-FP8

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 397B...

📖 image-text-to-text 236,306
lmstudio-community

lmstudio-community/Qwen3-VL-4B-Instruct-MLX-4bit

No description available.

📖 image-text-to-text 229,152
stelterlab

stelterlab/Mistral-Small-3.2-24B-Instruct-2506-FP8

No description available.

📖 image-text-to-text 228,868
lmstudio-community

lmstudio-community/Qwen3-VL-4B-Instruct-MLX-8bit

No description available.

📖 image-text-to-text 223,349
lmstudio-community

lmstudio-community/Qwen3-VL-4B-Instruct-MLX-6bit

No description available.

📖 image-text-to-text 222,664
lmstudio-community

lmstudio-community/Qwen3-VL-4B-Instruct-MLX-5bit

No description available.

📖 image-text-to-text 222,569
Qwen

Qwen/Qwen2.5-VL-32B-Instruct

No description available.

📖 image-text-to-text 222,493
abhishekchohan

abhishekchohan/gemma-3-12b-it-quantized-W4A16

No description available.

📖 image-text-to-text 220,205
Qwen

Qwen/Qwen3.5-27B-FP8

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 27B ...

📖 image-text-to-text 216,491
rednote-hilab

rednote-hilab/dots.ocr

No description available.

📖 image-text-to-text 214,509
nvidia

nvidia/Cosmos-Reason2-8B

No description available.

📖 image-text-to-text 213,697
Qwen

Qwen/Qwen2.5-VL-3B-Instruct-AWQ

No description available.

📖 image-text-to-text 212,714
cyankiwi

cyankiwi/Qwen3-VL-4B-Instruct-AWQ-4bit

No description available.

📖 image-text-to-text 209,238
meta-llama

meta-llama/Llama-4-Scout-17B-16E-Instruct

libraryname: transformers - ar - de - en - es - fr - hi - id - it - pt - th - tl - vi basemodel: - meta-llama/Llama-4-Scout-17B-16E - facebo...

📖 image-text-to-text 205,474
unsloth

unsloth/Qwen3-VL-4B-Instruct-GGUF

> [!NOTE] > Includes Unsloth chat template fixes! > See our Qwen3-VL collection for all versions including GGUF, 4-bit & 16-bit formats. Lea...

📖 image-text-to-text 204,366
ggml-org

ggml-org/gemma-3-12b-it-GGUF

No description available.

📖 image-text-to-text 197,066
lmstudio-community

lmstudio-community/Qwen3-VL-8B-Instruct-MLX-4bit

No description available.

📖 image-text-to-text 196,257
google

google/medgemma-4b-it

No description available.

📖 image-text-to-text 189,746
unsloth

unsloth/Pixtral-12B-2409-bnb-4bit

No description available.

📖 image-text-to-text 188,428
lmstudio-community

lmstudio-community/Qwen3-VL-8B-Instruct-MLX-8bit

No description available.

📖 image-text-to-text 187,718
Qwen

Qwen/Qwen3.5-0.8B

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 0.8B...

📖 image-text-to-text 187,548
Qwen

Qwen/Qwen3-VL-8B-Thinking

No description available.

📖 image-text-to-text 186,830
lmstudio-community

lmstudio-community/Qwen3-VL-8B-Instruct-MLX-6bit

No description available.

📖 image-text-to-text 185,370
lmstudio-community

lmstudio-community/Qwen3-VL-8B-Instruct-MLX-5bit

No description available.

📖 image-text-to-text 185,205
cyankiwi

cyankiwi/Qwen3.5-35B-A3B-AWQ-4bit

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 35B ...

📖 image-text-to-text 182,972
Qwen

Qwen/Qwen3-VL-2B-Instruct-FP8

No description available.

📖 image-text-to-text 181,761
Qwen

Qwen/Qwen3.5-122B-A10B

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 122B...

📖 image-text-to-text 175,976
Qwen

Qwen/Qwen3-VL-30B-A3B-Thinking

No description available.

📖 image-text-to-text 172,208
kakaocorp

kakaocorp/kanana-1.5-v-3b-instruct

Developed by: Unified Foundation Model (UFO) TF at Kakao - Language(s) : ['en', 'ko'] - Model Architecture: kanana-1.5-v-3b-instruct has 3.6...

📖 image-text-to-text 169,382
unsloth

unsloth/gemma-3-27b-it-GGUF

See our collection for all versions of Gemma 3 including GGUF, 4-bit & 16-bit formats. Read our Guide to see how to Run Gemma 3 correctly....

📖 image-text-to-text 166,737