Results for "video-text-to-text"

55 matches found.

DAMO-NLP-SG

DAMO-NLP-SG/VideoLLaMA3-7B

No description available.

🎥 video-text-to-text 103,618
llava-hf

llava-hf/LLaVA-NeXT-Video-7B-hf

No description available.

🎥 video-text-to-text 91,811
Kwai-Keye

Kwai-Keye/Keye-VL-8B-Preview

No description available.

🎥 video-text-to-text 64,185
lmms-lab

lmms-lab/LLaVA-Video-7B-Qwen2

- lmms-lab/LLaVA-OneVision-Data - lmms-lab/LLaVA-Video-178K - en libraryname: transformers - accuracy - multimodal pipelinetag: video-text-t...

🎥 video-text-to-text 55,871
Kwai-Keye

Kwai-Keye/Keye-VL-1_5-8B

No description available.

🎥 video-text-to-text 52,740
DAMO-NLP-SG

DAMO-NLP-SG/VideoLLaMA2.1-7B-16F

No description available.

🎥 video-text-to-text 24,981
zai-org

zai-org/cogvlm2-llama3-caption

No description available.

🎥 video-text-to-text 11,120
OpenGVLab

OpenGVLab/InternVideo2_5_Chat_8B

- en libraryname: transformers - accuracy - multimodal pipelinetag: video-text-to-text - name: InternVideo2.5 results: - task: type: multimo...

🎥 video-text-to-text 5,317
TIGER-Lab

TIGER-Lab/VideoScore-v1.1

No description available.

🎥 video-text-to-text 4,005
lmms-lab

lmms-lab/LLaVA-NeXT-Video-7B-DPO

Model type: LLaVA-Next-Video is an open-source chatbot trained by fine-tuning LLM on multimodal instruction-following data. This model is th...

🎥 video-text-to-text 2,548
OpenGVLab

OpenGVLab/VideoChat-Flash-Qwen2-7B_res448

- en libraryname: transformers - accuracy - multimodal pipelinetag: video-text-to-text - name: VideoChat-Flash-Qwen2-7Bres448 results: - tas...

🎥 video-text-to-text 2,412
DAMO-NLP-SG

DAMO-NLP-SG/VideoLLaMA3-2B

No description available.

🎥 video-text-to-text 2,303
lmms-lab

lmms-lab/LLaVA-NeXT-Video-7B

Model type: LLaVA-Next-Video is an open-source chatbot trained by fine-tuning LLM on multimodal instruction-following data. This model is th...

🎥 video-text-to-text 2,077
mlx-community

mlx-community/SmolVLM2-500M-Video-Instruct-mlx

No description available.

🎥 video-text-to-text 1,979
llava-hf

llava-hf/LLaVA-NeXT-Video-7B-DPO-hf

No description available.

🎥 video-text-to-text 1,935
PhilipC

PhilipC/HumanOmniV2

No description available.

🎥 video-text-to-text 1,869
OpenGVLab

OpenGVLab/VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B

- en libraryname: transformers - accuracy - multimodal pipelinetag: video-text-to-text - name: VideoChat-Flash-Qwen25-7BInternVideo2-1B resu...

🎥 video-text-to-text 1,414
Video-R1

Video-R1/Video-R1-7B

No description available.

🎥 video-text-to-text 1,312
chenjoya

chenjoya/videollm-online-8b-v1plus

LLM: meta-llama/Meta-Llama-3-8B-Instruct Vision Strategy: Frame Encoder: google/siglip-large-patch16-384 Frame Tokens: CLS Token + Avg Poole...

🎥 video-text-to-text 1,138
Diankun

Diankun/Spatial-MLLM-subset-sft

No description available.

🎥 video-text-to-text 1,110
allenai

allenai/Molmo2-VideoPoint-4B

No description available.

🎥 video-text-to-text 964
OpenGVLab

OpenGVLab/VideoChat-Flash-Qwen2_5-2B_res448

- en libraryname: transformers - accuracy - multimodal pipelinetag: video-text-to-text - name: VideoChat-Flash-Qwen25-15Bres448 results: - t...

🎥 video-text-to-text 934
TencentARC

TencentARC/TimeLens-8B

No description available.

🎥 video-text-to-text 860
TencentARC

TencentARC/ARC-Hunyuan-Video-7B

No description available.

🎥 video-text-to-text 834
mlx-community

mlx-community/SmolVLM2-256M-Video-Instruct-mlx

No description available.

🎥 video-text-to-text 786
Video-R1

Video-R1/Qwen2.5-VL-7B-COT-SFT

No description available.

🎥 video-text-to-text 691
Diankun

Diankun/Spatial-MLLM-v1.1-Instruct-135K

No description available.

🎥 video-text-to-text 556
Zhang199

Zhang199/TinyLLaVA-Video-Qwen2.5-3B-Group-16-512

No description available.

🎥 video-text-to-text 541
Skywork

Skywork/SkyCaptioner-V1

No description available.

🎥 video-text-to-text 482
prithivMLmods

prithivMLmods/SAGE-MM-Qwen2.5-VL-7B-SFT_RL-GGUF

No description available.

🎥 video-text-to-text 474
mradermacher

mradermacher/SmolVLM2-2.2B-Instruct-GGUF

No description available.

🎥 video-text-to-text 439
OpenGVLab

OpenGVLab/InternVL_2_5_HiCo_R16

- en libraryname: transformers - accuracy - multimodal pipelinetag: video-text-to-text - name: InternVL2.5HiCoR16 results: - task: type: mul...

🎥 video-text-to-text 426
Alibaba-DAMO-Academy

Alibaba-DAMO-Academy/PixelRefer-7B

No description available.

🎥 video-text-to-text 395
OpenGVLab

OpenGVLab/VideoChat-R1_7B

No description available.

🎥 video-text-to-text 371
mradermacher

mradermacher/SmolVLM2-2.2B-Instruct-i1-GGUF

No description available.

🎥 video-text-to-text 335
llava-hf

llava-hf/LLaVA-NeXT-Video-34B-hf

No description available.

🎥 video-text-to-text 303
VITA-MLLM

VITA-MLLM/VITA-1.5

This repository contains the model of the paper VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction....

🎥 video-text-to-text 291
Chat-UniVi

Chat-UniVi/Chat-UniVi-7B-v1.5

No description available.

🎥 video-text-to-text 279
OpenGVLab

OpenGVLab/InternVideo2-Chat-8B

No description available.

🎥 video-text-to-text 277
prithivMLmods

prithivMLmods/KAIROS-MM-Qwen2.5-VL-7B-RL-AIO-GGUF

No description available.

🎥 video-text-to-text 256
TIGER-Lab

TIGER-Lab/Vamba-Qwen2-VL-7B

No description available.

🎥 video-text-to-text 236
yaolily

yaolily/TimeChat-Captioner-GRPO-7B

No description available.

🎥 video-text-to-text 236
MLAdaptiveIntelligence

MLAdaptiveIntelligence/LLaVAction-0.5B

No description available.

🎥 video-text-to-text 235
Mungert

Mungert/SkyCaptioner-V1-GGUF

No description available.

🎥 video-text-to-text 225
OpenGVLab

OpenGVLab/VideoChat2_HD_stage4_Mistral_7B_hf

No description available.

🎥 video-text-to-text 222
tsinghua-ee

tsinghua-ee/video-SALMONN-2

No description available.

🎥 video-text-to-text 212
Rihong

Rihong/VideoChat2_HD_Infinity_Mistral_7B

No description available.

🎥 video-text-to-text 196
OpenGVLab

OpenGVLab/VideoChat-Flash-Qwen2-7B_res224

- en libraryname: transformers - accuracy - multimodal pipelinetag: video-text-to-text - name: VideoChat-Flash-Qwen2-7Bres448 results: - tas...

🎥 video-text-to-text 193
chancharikm

chancharikm/qwen2.5-vl-7b-cam-motion

This model is a fine-tuned version of Qwen/Qwen2.5-VL-7B-Instruct on the current most, high-quality camera motion dataset that is publically...

🎥 video-text-to-text 181
QiWang98

QiWang98/VideoRFT-3B

No description available.

🎥 video-text-to-text 172
QiWang98

QiWang98/VideoRFT

No description available.

🎥 video-text-to-text 170
BAAI

BAAI/Video-XL-2

No description available.

🎥 video-text-to-text 169
TencentARC

TencentARC/GRPO-CARE

No description available.

🎥 video-text-to-text 169
prithivMLmods

prithivMLmods/SAGE-MM-Qwen3-VL-4B-SFT_RL-GGUF

No description available.

🎥 video-text-to-text 162
chancharikm

chancharikm/qwen2.5-vl-72b-cam-motion

This model is a fine-tuned version of Qwen/Qwen2.5-VL-72B-Instruct on the current most, high-quality camera motion dataset that is publicall...

🎥 video-text-to-text 156