charactr

charactr/vocos-mel-24khz

No description available.

Model Documentation

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis



Audio samples | Paper [abs] [pdf]

Vocos is a fast neural vocoder designed to synthesize audio waveforms from acoustic features. Trained using a Generative Adversarial Network (GAN) objective, Vocos can generate waveforms in a single forward pass. Unlike other typical GAN-based vocoders, Vocos does not model audio samples in the time domain. Instead, it generates spectral coefficients, facilitating rapid audio reconstruction through inverse Fourier transform.

Installation



To use Vocos only in inference mode, install it using:

bash
pip install vocos


If you wish to train the model, install it with additional dependencies:

bash
pip install vocos[train]


Usage



Reconstruct audio from mel-spectrogram



python
import torch

from vocos import Vocos

vocos = Vocos.from_pretrained("charactr/vocos-mel-24khz")

mel = torch.randn(1, 100, 256)

B, C, T

audio = vocos.decode(mel)


Copy-synthesis from a file:

python
import torchaudio

y, sr = torchaudio.load(YOUR_AUDIO_FILE) if y.size(0) > 1:

mix to mono

y = y.mean(dim=0, keepdim=True) y = torchaudio.functional.resample(y, orig_freq=sr, new_freq=24000) y_hat = vocos(y)


Citation



If this code contributes to your research, please cite our work:


@article{siuzdak2023vocos,
  title={Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis},
  author={Siuzdak, Hubert},
  journal={arXiv preprint arXiv:2306.00814},
  year={2023}
}


License



The code in this repository is released under the MIT license.

Files & Weights

FilenameSizeAction
pytorch_model.bin 0.05 GB