Results for "image-text-to-text"
100 matches found.
Qwen/Qwen3-VL-2B-Instruct
No description available.
Qwen/Qwen3-VL-8B-Instruct
No description available.
Qwen/Qwen2.5-VL-7B-Instruct
No description available.
vikhyatk/moondream2
No description available.
deepseek-ai/DeepSeek-OCR
No description available.
Qwen/Qwen2-VL-2B-Instruct
No description available.
llava-hf/llava-1.5-7b-hf
Model type: LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It i...
Qwen/Qwen2.5-VL-3B-Instruct
No description available.
google/gemma-3-4b-it
No description available.
Qwen/Qwen3-VL-30B-A3B-Instruct
No description available.
moonshotai/Kimi-K2.5
No description available.
Qwen/Qwen3-VL-235B-A22B-Thinking
No description available.
deepseek-ai/DeepSeek-OCR-2
No description available.
Qwen/Qwen2-VL-7B-Instruct
No description available.
google/gemma-3-27b-it
No description available.
Qwen/Qwen3-VL-4B-Instruct
No description available.
Qwen/Qwen3.5-397B-A17B
Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 397B...
google/gemma-3-12b-it
No description available.
microsoft/Florence-2-large
No description available.
Qwen/Qwen2-VL-7B-Instruct-AWQ
No description available.
OpenGVLab/InternVL2-2B
InternVL 2.0 is a multimodal large language model series, featuring models of various sizes. For each size, we release instruction-tuned mod...
nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1
No description available.
Qwen/Qwen2.5-VL-7B-Instruct-AWQ
No description available.
Qwen/Qwen3.5-35B-A3B
Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 35B ...
liuhaotian/llava-v1.5-7b
Model type: LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It i...
unsloth/Qwen3.5-35B-A3B-GGUF
Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 35B ...
OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview-HF
No description available.
mlx-community/gemma-3-4b-it-qat-4bit
No description available.
llava-hf/llava-v1.6-mistral-7b-hf
LLaVa combines a pre-trained large language model with a pre-trained vision encoder for multimodal chatbot use cases. LLaVA 1.6 improves on ...
microsoft/Phi-3.5-vision-instruct
No description available.
OpenGVLab/InternVL2-1B
InternVL 2.0 is a multimodal large language model series, featuring models of various sizes. For each size, we release instruction-tuned mod...
google/translategemma-12b-it
No description available.
llava-hf/llava-onevision-qwen2-0.5b-ov-hf
Model type: LLaVA-Onevision is an open-source multimodal LLM trained by fine-tuning Qwen2 on GPT-generated multimodal instruction-following ...
Qwen/Qwen3.5-35B-A3B-FP8
Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 35B ...
Qwen/Qwen3-VL-32B-Instruct
No description available.
tencent/HunyuanOCR
No description available.
nvidia/NVIDIA-Nemotron-Parse-v1.1
No description available.
Salesforce/blip2-opt-2.7b
BLIP-2 consists of 3 models: a CLIP-like image encoder, a Querying Transformer (Q-Former) and a large language model. The authors initialize...
Qwen/Qwen3.5-27B
Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 27B ...
MBZUAI/AIN
No description available.
OpenGVLab/InternVL2-8B
InternVL 2.0 is a multimodal large language model series, featuring models of various sizes. For each size, we release instruction-tuned mod...
nanonets/Nanonets-OCR2-3B
No description available.
trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration
No description available.
deepseek-ai/deepseek-vl2-tiny
No description available.
Qwen/Qwen3-VL-32B-Instruct-FP8
No description available.
Qwen/Qwen3-VL-8B-Instruct-FP8
No description available.
unsloth/Qwen3.5-27B-GGUF
Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 27B ...
zai-org/GLM-4.1V-9B-Thinking
📖 View the GLM-4.1V-9B-Thinking paper. 📍 Using GLM-4.1V-9B-Thinking API at Zhipu Foundation Model Open Platform Vision-Language Models (VLMs...
allenai/olmOCR-2-7B-1025
No description available.
Qwen/Qwen2.5-VL-72B-Instruct
No description available.
Qwen/Qwen3.5-9B
Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 9B -...
lmstudio-community/gemma-3-4b-it-GGUF
No description available.
microsoft/Florence-2-base
No description available.
Qwen/Qwen3-VL-235B-A22B-Instruct
No description available.
Qwen/Qwen3-VL-30B-A3B-Instruct-FP8
No description available.
HuggingFaceTB/SmolVLM-256M-Instruct
No description available.
google/gemma-3n-E2B-it
No description available.
unsloth/Qwen3.5-9B-GGUF
Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 9B -...
lightonai/LightOnOCR-2-1B
No description available.
lmstudio-community/GLM-4.6V-Flash-MLX-4bit
No description available.
allenai/olmOCR-2-7B-1025-FP8
No description available.
lmstudio-community/GLM-4.6V-Flash-MLX-8bit
No description available.
lmstudio-community/GLM-4.6V-Flash-MLX-6bit
No description available.
allenai/Molmo2-8B
No description available.
meta-llama/Llama-3.2-11B-Vision-Instruct
- en - de - fr - it - pt - hi - es - th libraryname: transformers pipelinetag: image-text-to-text - facebook - meta - pytorch - llama - llam...
unsloth/Qwen3.5-122B-A10B-GGUF
Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 122B...
HuggingFaceTB/SmolVLM2-500M-Video-Instruct
No description available.
OpenGVLab/InternVL3_5-14B
No description available.
pytorch/gemma-3-27b-it-AWQ-INT4
No description available.
moonshotai/Kimi-VL-A3B-Instruct
No description available.
Qwen/Qwen3.5-397B-A17B-FP8
Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 397B...
lmstudio-community/Qwen3-VL-4B-Instruct-MLX-4bit
No description available.
stelterlab/Mistral-Small-3.2-24B-Instruct-2506-FP8
No description available.
lmstudio-community/Qwen3-VL-4B-Instruct-MLX-8bit
No description available.
lmstudio-community/Qwen3-VL-4B-Instruct-MLX-6bit
No description available.
lmstudio-community/Qwen3-VL-4B-Instruct-MLX-5bit
No description available.
Qwen/Qwen2.5-VL-32B-Instruct
No description available.
abhishekchohan/gemma-3-12b-it-quantized-W4A16
No description available.
Qwen/Qwen3.5-27B-FP8
Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 27B ...
rednote-hilab/dots.ocr
No description available.
nvidia/Cosmos-Reason2-8B
No description available.
Qwen/Qwen2.5-VL-3B-Instruct-AWQ
No description available.
cyankiwi/Qwen3-VL-4B-Instruct-AWQ-4bit
No description available.
meta-llama/Llama-4-Scout-17B-16E-Instruct
libraryname: transformers - ar - de - en - es - fr - hi - id - it - pt - th - tl - vi basemodel: - meta-llama/Llama-4-Scout-17B-16E - facebo...
unsloth/Qwen3-VL-4B-Instruct-GGUF
> [!NOTE] > Includes Unsloth chat template fixes! > See our Qwen3-VL collection for all versions including GGUF, 4-bit & 16-bit formats. Lea...
ggml-org/gemma-3-12b-it-GGUF
No description available.
lmstudio-community/Qwen3-VL-8B-Instruct-MLX-4bit
No description available.
google/medgemma-4b-it
No description available.
unsloth/Pixtral-12B-2409-bnb-4bit
No description available.
lmstudio-community/Qwen3-VL-8B-Instruct-MLX-8bit
No description available.
Qwen/Qwen3.5-0.8B
Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 0.8B...
Qwen/Qwen3-VL-8B-Thinking
No description available.
lmstudio-community/Qwen3-VL-8B-Instruct-MLX-6bit
No description available.
lmstudio-community/Qwen3-VL-8B-Instruct-MLX-5bit
No description available.
cyankiwi/Qwen3.5-35B-A3B-AWQ-4bit
Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 35B ...
Qwen/Qwen3-VL-2B-Instruct-FP8
No description available.
Qwen/Qwen3.5-122B-A10B
Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 122B...
Qwen/Qwen3-VL-30B-A3B-Thinking
No description available.
kakaocorp/kanana-1.5-v-3b-instruct
Developed by: Unified Foundation Model (UFO) TF at Kakao - Language(s) : ['en', 'ko'] - Model Architecture: kanana-1.5-v-3b-instruct has 3.6...
unsloth/gemma-3-27b-it-GGUF
See our collection for all versions of Gemma 3 including GGUF, 4-bit & 16-bit formats. Read our Guide to see how to Run Gemma 3 correctly....