Supertone
Supertone/supertonic
No description available.
Model Documentation
Supertonic — Lightning Fast, On-Device TTS
Supertonic is a lightning-fast, on-device text-to-speech system designed for extreme performance with minimal computational overhead. Powered by ONNX Runtime, it runs entirely on your device—no cloud, no API calls, no privacy concerns.
> 🎧 Try it now: Experience Supertonic in your browser with our Interactive Demo, or Hugging Face app or get started with pre-trained models from Hugging Face Hub
> 🛠 GitHub Repository > To use Supertonic most easily, visit the official GitHub repository: > https://github.com/supertone-inc/supertonic > You’ll find multi-language example codes.
Table of Contents
Why Supertonic?
Language Support
We provide ready-to-use TTS inference examples across multiple ecosystems:
| Language/Platform | Path | Description | |-------------------|------|-------------| | [Python] |
py/ | ONNX Runtime inference |
| [Node.js] | nodejs/ | Server-side JavaScript |
| [Browser] | web/ | WebGPU/WASM inference |
| [Java] | java/ | Cross-platform JVM |
| [C++] | cpp/ | High-performance C++ |
| [C#] | csharp/ | .NET ecosystem |
| [Go] | go/ | Go implementation |
| [Swift] | swift/ | macOS applications |
| [iOS] | ios/ | Native iOS apps |
| [Rust] | rust/ | Memory-safe systems |
| [Flutter] | flutter/ | Cross-platform apps |> For detailed usage instructions, please refer to the README.md in each language directory.
Getting Started
First, clone the repository:
bash
git clone https://github.com/supertone-inc/supertonic.git
cd supertonic
Prerequisites
Before running the examples, download the ONNX models and preset voices, and place them in the
assets directory:bash
git clone https://huggingface.co/Supertone/supertonic assets
> Note: The Hugging Face repository uses Git LFS. Please ensure Git LFS is installed and initialized before cloning or pulling large model files. >
brew install git-lfs && git lfs installhttps://git-lfs.com for installersTechnical Details
Performance
We evaluated Supertonic's performance (with 2 inference steps) using two key metrics across input texts of varying lengths: Short (59 chars), Mid (152 chars), and Long (266 chars).
Metrics:
Characters per Second
| System | Short (59 chars) | Mid (152 chars) | Long (266 chars) | |--------|-----------------|----------------|-----------------| | Supertonic (M4 proAPI ElevenLabs Flash v2.5 | 144 | 209 | 287 |
| API OpenAI TTS-1 | 37 | 55 | 82 |
| API Gemini 2.5 Flash TTS | 12 | 18 | 24 |
| API Supertone Sona speech 1 | 38 | 64 | 92 |
| Open Kokoro | 104 | 107 | 117 |
| Open NeuTTS Air | 37 | 42 | 47 |> Notes: >
API = Cloud-based API services (measured from Seoul)
> Open = Open-source models
> Supertonic (M4 pro Real-time Factor
| System | Short (59 chars) | Mid (152 chars) | Long (266 chars) | |--------|-----------------|----------------|-----------------| | Supertonic (M4 pro
API ElevenLabs Flash v2.5 | 0.133 | 0.077 | 0.057 |
| API OpenAI TTS-1 | 0.471 | 0.302 | 0.201 |
| API Gemini 2.5 Flash TTS | 1.060 | 0.673 | 0.541 |
| API Supertone Sona speech 1 | 0.372 | 0.206 | 0.163 |
| Open Kokoro | 0.144 | 0.124 | 0.126 |
| Open NeuTTS Air | 0.390 | 0.338 | 0.343 |Additional Performance Data (5-step inference)
Characters per Second (5-step)
| System | Short (59 chars) | Mid (152 chars) | Long (266 chars) | |--------|-----------------|----------------|-----------------| | Supertonic (M4 pro
Real-time Factor (5-step)
| System | Short (59 chars) | Mid (152 chars) | Long (266 chars) | |--------|-----------------|----------------|-----------------| | Supertonic (M4 pro
License
This project’s sample code is released under the MIT License.
The accompanying model is released under the OpenRAIL-M License.
This model was trained using PyTorch, which is licensed under the BSD 3-Clause License but is not redistributed with this project.
Copyright (c) 2025 Supertone Inc.
Files & Weights
| Filename | Size | Action |
|---|