Nirman.online | Premium AI Directory

Qwen

Qwen/Qwen3-VL-2B-Instruct

No description available.

📖 image-text-to-text 13,328,669

Qwen

Qwen/Qwen3-VL-8B-Instruct

No description available.

📖 image-text-to-text 8,019,083

Qwen

Qwen/Qwen2.5-VL-7B-Instruct

No description available.

📖 image-text-to-text 4,086,666

vikhyatk

vikhyatk/moondream2

No description available.

📖 image-text-to-text 3,973,800

deepseek-ai

deepseek-ai/DeepSeek-OCR

No description available.

📖 image-text-to-text 3,589,752

Qwen

Qwen/Qwen2-VL-2B-Instruct

No description available.

📖 image-text-to-text 3,446,907

llava-hf

llava-hf/llava-1.5-7b-hf

Model type: LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It i...

📖 image-text-to-text 3,202,279

Qwen

Qwen/Qwen2.5-VL-3B-Instruct

No description available.

📖 image-text-to-text 2,722,310

google

google/gemma-3-4b-it

No description available.

📖 image-text-to-text 2,147,597

Qwen

Qwen/Qwen3-VL-30B-A3B-Instruct

No description available.

📖 image-text-to-text 2,064,489

moonshotai

moonshotai/Kimi-K2.5

No description available.

📖 image-text-to-text 1,936,885

Qwen

Qwen/Qwen3-VL-235B-A22B-Thinking

No description available.

📖 image-text-to-text 1,818,591

deepseek-ai

deepseek-ai/DeepSeek-OCR-2

No description available.

📖 image-text-to-text 1,640,793

Qwen

Qwen/Qwen2-VL-7B-Instruct

No description available.

📖 image-text-to-text 1,634,324

google

google/gemma-3-27b-it

No description available.

📖 image-text-to-text 1,493,679

Qwen

Qwen/Qwen3-VL-4B-Instruct

No description available.

📖 image-text-to-text 1,482,594

Qwen

Qwen/Qwen3.5-397B-A17B

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 397B...

📖 image-text-to-text 1,338,447

google

google/gemma-3-12b-it

No description available.

📖 image-text-to-text 1,334,645

microsoft

microsoft/Florence-2-large

No description available.

📖 image-text-to-text 1,172,282

Qwen

Qwen/Qwen2-VL-7B-Instruct-AWQ

No description available.

📖 image-text-to-text 1,115,503

OpenGVLab

OpenGVLab/InternVL2-2B

InternVL 2.0 is a multimodal large language model series, featuring models of various sizes. For each size, we release instruction-tuned mod...

📖 image-text-to-text 1,081,198

nvidia

nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1

No description available.

📖 image-text-to-text 1,035,149

Qwen

Qwen/Qwen2.5-VL-7B-Instruct-AWQ

No description available.

📖 image-text-to-text 1,001,731

Qwen

Qwen/Qwen3.5-35B-A3B

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 35B ...

📖 image-text-to-text 885,293

liuhaotian

liuhaotian/llava-v1.5-7b

Model type: LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It i...

📖 image-text-to-text 874,844

unsloth

unsloth/Qwen3.5-35B-A3B-GGUF

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 35B ...

📖 image-text-to-text 792,060

OpenGVLab

OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview-HF

No description available.

📖 image-text-to-text 720,909

mlx-community

mlx-community/gemma-3-4b-it-qat-4bit

No description available.

📖 image-text-to-text 689,742

llava-hf

llava-hf/llava-v1.6-mistral-7b-hf

LLaVa combines a pre-trained large language model with a pre-trained vision encoder for multimodal chatbot use cases. LLaVA 1.6 improves on ...

📖 image-text-to-text 634,728

microsoft

microsoft/Phi-3.5-vision-instruct

No description available.

📖 image-text-to-text 609,596

OpenGVLab

OpenGVLab/InternVL2-1B

InternVL 2.0 is a multimodal large language model series, featuring models of various sizes. For each size, we release instruction-tuned mod...

📖 image-text-to-text 572,531

google

google/translategemma-12b-it

No description available.

📖 image-text-to-text 560,424

llava-hf

llava-hf/llava-onevision-qwen2-0.5b-ov-hf

Model type: LLaVA-Onevision is an open-source multimodal LLM trained by fine-tuning Qwen2 on GPT-generated multimodal instruction-following ...

📖 image-text-to-text 552,843

Qwen

Qwen/Qwen3.5-35B-A3B-FP8

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 35B ...

📖 image-text-to-text 551,036

Qwen

Qwen/Qwen3-VL-32B-Instruct

No description available.

📖 image-text-to-text 538,222

tencent

tencent/HunyuanOCR

No description available.

📖 image-text-to-text 509,560

nvidia

nvidia/NVIDIA-Nemotron-Parse-v1.1

No description available.

📖 image-text-to-text 499,666

Salesforce

Salesforce/blip2-opt-2.7b

BLIP-2 consists of 3 models: a CLIP-like image encoder, a Querying Transformer (Q-Former) and a large language model. The authors initialize...

📖 image-text-to-text 472,964

Qwen

Qwen/Qwen3.5-27B

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 27B ...

📖 image-text-to-text 467,468

MBZUAI

MBZUAI/AIN

No description available.

📖 image-text-to-text 457,771

OpenGVLab

OpenGVLab/InternVL2-8B

InternVL 2.0 is a multimodal large language model series, featuring models of various sizes. For each size, we release instruction-tuned mod...

📖 image-text-to-text 454,131

nanonets

nanonets/Nanonets-OCR2-3B

No description available.

📖 image-text-to-text 425,601

trl-internal-testing

trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration

No description available.

📖 image-text-to-text 409,887

deepseek-ai

deepseek-ai/deepseek-vl2-tiny

No description available.

📖 image-text-to-text 397,671

Qwen

Qwen/Qwen3-VL-32B-Instruct-FP8

No description available.

📖 image-text-to-text 397,060

Qwen

Qwen/Qwen3-VL-8B-Instruct-FP8

No description available.

📖 image-text-to-text 366,829

unsloth

unsloth/Qwen3.5-27B-GGUF

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 27B ...

📖 image-text-to-text 352,050

zai-org

zai-org/GLM-4.1V-9B-Thinking

📖 View the GLM-4.1V-9B-Thinking paper. 📍 Using GLM-4.1V-9B-Thinking API at Zhipu Foundation Model Open Platform Vision-Language Models (VLMs...

📖 image-text-to-text 351,072

allenai

allenai/olmOCR-2-7B-1025

No description available.

📖 image-text-to-text 347,459

Qwen

Qwen/Qwen2.5-VL-72B-Instruct

No description available.

📖 image-text-to-text 343,079

Qwen

Qwen/Qwen3.5-9B

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 9B -...

📖 image-text-to-text 340,783

lmstudio-community

lmstudio-community/gemma-3-4b-it-GGUF

No description available.

📖 image-text-to-text 322,332

microsoft

microsoft/Florence-2-base

No description available.

📖 image-text-to-text 318,446

Qwen

Qwen/Qwen3-VL-235B-A22B-Instruct

No description available.

📖 image-text-to-text 306,862

Qwen

Qwen/Qwen3-VL-30B-A3B-Instruct-FP8

No description available.

📖 image-text-to-text 299,219

HuggingFaceTB

HuggingFaceTB/SmolVLM-256M-Instruct

No description available.

📖 image-text-to-text 297,182

google

google/gemma-3n-E2B-it

No description available.

📖 image-text-to-text 290,239

unsloth

unsloth/Qwen3.5-9B-GGUF

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 9B -...

📖 image-text-to-text 283,069

lightonai

lightonai/LightOnOCR-2-1B

No description available.

📖 image-text-to-text 282,423

lmstudio-community

lmstudio-community/GLM-4.6V-Flash-MLX-4bit

No description available.

📖 image-text-to-text 278,114

allenai

allenai/olmOCR-2-7B-1025-FP8

No description available.

📖 image-text-to-text 276,827

lmstudio-community

lmstudio-community/GLM-4.6V-Flash-MLX-8bit

No description available.

📖 image-text-to-text 273,483

lmstudio-community

lmstudio-community/GLM-4.6V-Flash-MLX-6bit

No description available.

📖 image-text-to-text 270,525

allenai

allenai/Molmo2-8B

No description available.

📖 image-text-to-text 267,024

meta-llama

meta-llama/Llama-3.2-11B-Vision-Instruct

- en - de - fr - it - pt - hi - es - th libraryname: transformers pipelinetag: image-text-to-text - facebook - meta - pytorch - llama - llam...

📖 image-text-to-text 257,551

unsloth

unsloth/Qwen3.5-122B-A10B-GGUF

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 122B...

📖 image-text-to-text 255,318

HuggingFaceTB

HuggingFaceTB/SmolVLM2-500M-Video-Instruct

No description available.

📖 image-text-to-text 253,742

OpenGVLab

OpenGVLab/InternVL3_5-14B

No description available.

📖 image-text-to-text 245,995

pytorch

pytorch/gemma-3-27b-it-AWQ-INT4

No description available.

📖 image-text-to-text 239,887

moonshotai

moonshotai/Kimi-VL-A3B-Instruct

No description available.

📖 image-text-to-text 238,613

Qwen

Qwen/Qwen3.5-397B-A17B-FP8

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 397B...

📖 image-text-to-text 236,306

lmstudio-community

lmstudio-community/Qwen3-VL-4B-Instruct-MLX-4bit

No description available.

📖 image-text-to-text 229,152

stelterlab

stelterlab/Mistral-Small-3.2-24B-Instruct-2506-FP8

No description available.

📖 image-text-to-text 228,868

lmstudio-community

lmstudio-community/Qwen3-VL-4B-Instruct-MLX-8bit

No description available.

📖 image-text-to-text 223,349

lmstudio-community

lmstudio-community/Qwen3-VL-4B-Instruct-MLX-6bit

No description available.

📖 image-text-to-text 222,664

lmstudio-community

lmstudio-community/Qwen3-VL-4B-Instruct-MLX-5bit

No description available.

📖 image-text-to-text 222,569

Qwen

Qwen/Qwen2.5-VL-32B-Instruct

No description available.

📖 image-text-to-text 222,493

abhishekchohan

abhishekchohan/gemma-3-12b-it-quantized-W4A16

No description available.

📖 image-text-to-text 220,205

Qwen

Qwen/Qwen3.5-27B-FP8

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 27B ...

📖 image-text-to-text 216,491

rednote-hilab

rednote-hilab/dots.ocr

No description available.

📖 image-text-to-text 214,509

nvidia

nvidia/Cosmos-Reason2-8B

No description available.

📖 image-text-to-text 213,697

Qwen

Qwen/Qwen2.5-VL-3B-Instruct-AWQ

No description available.

📖 image-text-to-text 212,714

cyankiwi

cyankiwi/Qwen3-VL-4B-Instruct-AWQ-4bit

No description available.

📖 image-text-to-text 209,238

meta-llama

meta-llama/Llama-4-Scout-17B-16E-Instruct

libraryname: transformers - ar - de - en - es - fr - hi - id - it - pt - th - tl - vi basemodel: - meta-llama/Llama-4-Scout-17B-16E - facebo...

📖 image-text-to-text 205,474

unsloth

unsloth/Qwen3-VL-4B-Instruct-GGUF

> [!NOTE] > Includes Unsloth chat template fixes! > See our Qwen3-VL collection for all versions including GGUF, 4-bit & 16-bit formats. Lea...

📖 image-text-to-text 204,366

ggml-org

ggml-org/gemma-3-12b-it-GGUF

No description available.

📖 image-text-to-text 197,066

lmstudio-community

lmstudio-community/Qwen3-VL-8B-Instruct-MLX-4bit

No description available.

📖 image-text-to-text 196,257

google

google/medgemma-4b-it

No description available.

📖 image-text-to-text 189,746

unsloth

unsloth/Pixtral-12B-2409-bnb-4bit

No description available.

📖 image-text-to-text 188,428

lmstudio-community

lmstudio-community/Qwen3-VL-8B-Instruct-MLX-8bit

No description available.

📖 image-text-to-text 187,718

Qwen

Qwen/Qwen3.5-0.8B

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 0.8B...

📖 image-text-to-text 187,548

Qwen

Qwen/Qwen3-VL-8B-Thinking

No description available.

📖 image-text-to-text 186,830

lmstudio-community

lmstudio-community/Qwen3-VL-8B-Instruct-MLX-6bit

No description available.

📖 image-text-to-text 185,370

lmstudio-community

lmstudio-community/Qwen3-VL-8B-Instruct-MLX-5bit

No description available.

📖 image-text-to-text 185,205

cyankiwi

cyankiwi/Qwen3.5-35B-A3B-AWQ-4bit

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 35B ...

📖 image-text-to-text 182,972

Qwen

Qwen/Qwen3-VL-2B-Instruct-FP8

No description available.

📖 image-text-to-text 181,761

Qwen

Qwen/Qwen3.5-122B-A10B

Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 122B...

📖 image-text-to-text 175,976

Qwen

Qwen/Qwen3-VL-30B-A3B-Thinking

No description available.

📖 image-text-to-text 172,208

kakaocorp

kakaocorp/kanana-1.5-v-3b-instruct

Developed by: Unified Foundation Model (UFO) TF at Kakao - Language(s) : ['en', 'ko'] - Model Architecture: kanana-1.5-v-3b-instruct has 3.6...

📖 image-text-to-text 169,382

unsloth

unsloth/gemma-3-27b-it-GGUF

See our collection for all versions of Gemma 3 including GGUF, 4-bit & 16-bit formats. Read our Guide to see how to Run Gemma 3 correctly....

📖 image-text-to-text 166,737

Results for "image-text-to-text"

Qwen/Qwen3-VL-2B-Instruct

Qwen/Qwen3-VL-8B-Instruct

Qwen/Qwen2.5-VL-7B-Instruct

vikhyatk/moondream2

deepseek-ai/DeepSeek-OCR

Qwen/Qwen2-VL-2B-Instruct

llava-hf/llava-1.5-7b-hf

Qwen/Qwen2.5-VL-3B-Instruct

google/gemma-3-4b-it

Qwen/Qwen3-VL-30B-A3B-Instruct

moonshotai/Kimi-K2.5

Qwen/Qwen3-VL-235B-A22B-Thinking

deepseek-ai/DeepSeek-OCR-2

Qwen/Qwen2-VL-7B-Instruct

google/gemma-3-27b-it

Qwen/Qwen3-VL-4B-Instruct

Qwen/Qwen3.5-397B-A17B

google/gemma-3-12b-it

microsoft/Florence-2-large

Qwen/Qwen2-VL-7B-Instruct-AWQ

OpenGVLab/InternVL2-2B

nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1

Qwen/Qwen2.5-VL-7B-Instruct-AWQ

Qwen/Qwen3.5-35B-A3B

liuhaotian/llava-v1.5-7b

unsloth/Qwen3.5-35B-A3B-GGUF

OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview-HF

mlx-community/gemma-3-4b-it-qat-4bit

llava-hf/llava-v1.6-mistral-7b-hf

microsoft/Phi-3.5-vision-instruct

OpenGVLab/InternVL2-1B

google/translategemma-12b-it

llava-hf/llava-onevision-qwen2-0.5b-ov-hf

Qwen/Qwen3.5-35B-A3B-FP8

Qwen/Qwen3-VL-32B-Instruct

tencent/HunyuanOCR

nvidia/NVIDIA-Nemotron-Parse-v1.1

Salesforce/blip2-opt-2.7b

Qwen/Qwen3.5-27B

MBZUAI/AIN

OpenGVLab/InternVL2-8B

nanonets/Nanonets-OCR2-3B

trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration

deepseek-ai/deepseek-vl2-tiny

Qwen/Qwen3-VL-32B-Instruct-FP8

Qwen/Qwen3-VL-8B-Instruct-FP8

unsloth/Qwen3.5-27B-GGUF

zai-org/GLM-4.1V-9B-Thinking

allenai/olmOCR-2-7B-1025

Qwen/Qwen2.5-VL-72B-Instruct

Qwen/Qwen3.5-9B

lmstudio-community/gemma-3-4b-it-GGUF

microsoft/Florence-2-base

Qwen/Qwen3-VL-235B-A22B-Instruct

Qwen/Qwen3-VL-30B-A3B-Instruct-FP8

HuggingFaceTB/SmolVLM-256M-Instruct

google/gemma-3n-E2B-it

unsloth/Qwen3.5-9B-GGUF

lightonai/LightOnOCR-2-1B

lmstudio-community/GLM-4.6V-Flash-MLX-4bit

allenai/olmOCR-2-7B-1025-FP8

lmstudio-community/GLM-4.6V-Flash-MLX-8bit

lmstudio-community/GLM-4.6V-Flash-MLX-6bit

allenai/Molmo2-8B

meta-llama/Llama-3.2-11B-Vision-Instruct

unsloth/Qwen3.5-122B-A10B-GGUF

HuggingFaceTB/SmolVLM2-500M-Video-Instruct

OpenGVLab/InternVL3_5-14B

pytorch/gemma-3-27b-it-AWQ-INT4

moonshotai/Kimi-VL-A3B-Instruct

Qwen/Qwen3.5-397B-A17B-FP8

lmstudio-community/Qwen3-VL-4B-Instruct-MLX-4bit

stelterlab/Mistral-Small-3.2-24B-Instruct-2506-FP8

lmstudio-community/Qwen3-VL-4B-Instruct-MLX-8bit

lmstudio-community/Qwen3-VL-4B-Instruct-MLX-6bit

lmstudio-community/Qwen3-VL-4B-Instruct-MLX-5bit

Qwen/Qwen2.5-VL-32B-Instruct

abhishekchohan/gemma-3-12b-it-quantized-W4A16

Qwen/Qwen3.5-27B-FP8