Marvis-AI
Marvis-AI/marvis-tts-100m-v0.2
Marvis is built on the Sesame CSM-1B (Conversational Speech Model) architecture, a multimodal transformer that operates directly on Residual...
Model Documentation
Introduction
[code]Marvis is a cutting-edge conversational speech model designed to enable real-time streaming text-to-speech synthesis. Built with efficiency and accessibility in mind, Marvis addresses the growing need for high-quality, real-time voice synthesis that can run on consumer devices such as Apple Silicon, iPhones, iPads, Macs and others.
Key Features
Supported Languages
Currently optimized for English, French, and German.
Quick Start
Using MLX
Real audio streaming:
bash
pip install -U mlx-audio
mlx_audio.tts.generate --model Marvis-AI/marvis-tts-100m-v0.2 --stream \
--text "Marvis TTS is a new text-to-speech model that provides fast streaming on edge devices."
Voice cloning:
bash
mlx_audio.tts.generate --model Marvis-AI/marvis-tts-100m-v0.2 --stream \
--text "Marvis TTS is a new text-to-speech model that provides fast streaming on edge devices." --ref_audio ./conversational_a.wav
You can pass any audio to clone the voice from or select sample audio file from here.
Model Description
Marvis is built on the Sesame CSM-1B (Conversational Speech Model) architecture, a multimodal transformer that operates directly on Residual Vector Quantization (RVQ) tokens and uses Kyutai's mimi codec. The architecture enables end-to-end training while maintaining low-latency generation and employs a dual-transformer approach:
Unlike models that require text chunking based on regex patterns, Marvis processes entire text sequences contextually, resulting in more natural speech flow and intonation.
Use Cases
Legal and Ethical Considerations:
License & Agreement
* Apache 2.0
Citation
If you use Marvis in your research or applications, please cite:
bibtex
@misc{marvis-tts-2025,
title={Marvis-TTS: Efficient Real-time Voice Cloning with Streaming Speech Synthesis},
author={Prince Canuma and Lucas Newman},
year={2025}
}
Acknowledgments
Special thanks to Sesame and Kyutai for their groundbreaking open-source contributions that inspired our work, and to the broader open-source community for their unwavering support and collaboration.
---
Version: 0.2
Release Date: 20/10/2025
Creators: Prince Canuma & Lucas Newman
Files & Weights
| Filename | Size | Action |
|---|---|---|
| model.safetensors | 1.37 GB |