Results for "video-text-to-text"
55 matches found.
DAMO-NLP-SG/VideoLLaMA3-7B
No description available.
llava-hf/LLaVA-NeXT-Video-7B-hf
No description available.
Kwai-Keye/Keye-VL-8B-Preview
No description available.
lmms-lab/LLaVA-Video-7B-Qwen2
- lmms-lab/LLaVA-OneVision-Data - lmms-lab/LLaVA-Video-178K - en libraryname: transformers - accuracy - multimodal pipelinetag: video-text-t...
Kwai-Keye/Keye-VL-1_5-8B
No description available.
DAMO-NLP-SG/VideoLLaMA2.1-7B-16F
No description available.
zai-org/cogvlm2-llama3-caption
No description available.
OpenGVLab/InternVideo2_5_Chat_8B
- en libraryname: transformers - accuracy - multimodal pipelinetag: video-text-to-text - name: InternVideo2.5 results: - task: type: multimo...
TIGER-Lab/VideoScore-v1.1
No description available.
lmms-lab/LLaVA-NeXT-Video-7B-DPO
Model type: LLaVA-Next-Video is an open-source chatbot trained by fine-tuning LLM on multimodal instruction-following data. This model is th...
OpenGVLab/VideoChat-Flash-Qwen2-7B_res448
- en libraryname: transformers - accuracy - multimodal pipelinetag: video-text-to-text - name: VideoChat-Flash-Qwen2-7Bres448 results: - tas...
DAMO-NLP-SG/VideoLLaMA3-2B
No description available.
lmms-lab/LLaVA-NeXT-Video-7B
Model type: LLaVA-Next-Video is an open-source chatbot trained by fine-tuning LLM on multimodal instruction-following data. This model is th...
mlx-community/SmolVLM2-500M-Video-Instruct-mlx
No description available.
llava-hf/LLaVA-NeXT-Video-7B-DPO-hf
No description available.
PhilipC/HumanOmniV2
No description available.
OpenGVLab/VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B
- en libraryname: transformers - accuracy - multimodal pipelinetag: video-text-to-text - name: VideoChat-Flash-Qwen25-7BInternVideo2-1B resu...
Video-R1/Video-R1-7B
No description available.
chenjoya/videollm-online-8b-v1plus
LLM: meta-llama/Meta-Llama-3-8B-Instruct Vision Strategy: Frame Encoder: google/siglip-large-patch16-384 Frame Tokens: CLS Token + Avg Poole...
Diankun/Spatial-MLLM-subset-sft
No description available.
allenai/Molmo2-VideoPoint-4B
No description available.
OpenGVLab/VideoChat-Flash-Qwen2_5-2B_res448
- en libraryname: transformers - accuracy - multimodal pipelinetag: video-text-to-text - name: VideoChat-Flash-Qwen25-15Bres448 results: - t...
TencentARC/TimeLens-8B
No description available.
TencentARC/ARC-Hunyuan-Video-7B
No description available.
mlx-community/SmolVLM2-256M-Video-Instruct-mlx
No description available.
Video-R1/Qwen2.5-VL-7B-COT-SFT
No description available.
Diankun/Spatial-MLLM-v1.1-Instruct-135K
No description available.
Zhang199/TinyLLaVA-Video-Qwen2.5-3B-Group-16-512
No description available.
Skywork/SkyCaptioner-V1
No description available.
prithivMLmods/SAGE-MM-Qwen2.5-VL-7B-SFT_RL-GGUF
No description available.
mradermacher/SmolVLM2-2.2B-Instruct-GGUF
No description available.
OpenGVLab/InternVL_2_5_HiCo_R16
- en libraryname: transformers - accuracy - multimodal pipelinetag: video-text-to-text - name: InternVL2.5HiCoR16 results: - task: type: mul...
Alibaba-DAMO-Academy/PixelRefer-7B
No description available.
OpenGVLab/VideoChat-R1_7B
No description available.
mradermacher/SmolVLM2-2.2B-Instruct-i1-GGUF
No description available.
llava-hf/LLaVA-NeXT-Video-34B-hf
No description available.
VITA-MLLM/VITA-1.5
This repository contains the model of the paper VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction....
Chat-UniVi/Chat-UniVi-7B-v1.5
No description available.
OpenGVLab/InternVideo2-Chat-8B
No description available.
prithivMLmods/KAIROS-MM-Qwen2.5-VL-7B-RL-AIO-GGUF
No description available.
TIGER-Lab/Vamba-Qwen2-VL-7B
No description available.
yaolily/TimeChat-Captioner-GRPO-7B
No description available.
MLAdaptiveIntelligence/LLaVAction-0.5B
No description available.
Mungert/SkyCaptioner-V1-GGUF
No description available.
OpenGVLab/VideoChat2_HD_stage4_Mistral_7B_hf
No description available.
tsinghua-ee/video-SALMONN-2
No description available.
Rihong/VideoChat2_HD_Infinity_Mistral_7B
No description available.
OpenGVLab/VideoChat-Flash-Qwen2-7B_res224
- en libraryname: transformers - accuracy - multimodal pipelinetag: video-text-to-text - name: VideoChat-Flash-Qwen2-7Bres448 results: - tas...
chancharikm/qwen2.5-vl-7b-cam-motion
This model is a fine-tuned version of Qwen/Qwen2.5-VL-7B-Instruct on the current most, high-quality camera motion dataset that is publically...
QiWang98/VideoRFT-3B
No description available.
QiWang98/VideoRFT
No description available.
BAAI/Video-XL-2
No description available.
TencentARC/GRPO-CARE
No description available.
prithivMLmods/SAGE-MM-Qwen3-VL-4B-SFT_RL-GGUF
No description available.
chancharikm/qwen2.5-vl-72b-cam-motion
This model is a fine-tuned version of Qwen/Qwen2.5-VL-72B-Instruct on the current most, high-quality camera motion dataset that is publicall...