Results for "audio-classification"
100 matches found.
laion/clap-htsat-fused
No description available.
audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim
No description available.
speechbrain/emotion-recognition-wav2vec2-IEMOCAP
No description available.
MIT/ast-finetuned-audioset-10-10-0.4593
The Audio Spectrogram Transformer is equivalent to ViT, but applied on audio. Audio is first turned into an image (as a spectrogram), after ...
xbgoose/hubert-large-speech-emotion-recognition-russian-dusha-finetuned
No description available.
audeering/wav2vec2-large-robust-24-ft-age-gender
No description available.
mo-thecreator/Deepfake-audio-detection
More information needed...
prithivMLmods/Common-Voice-Gender-Detection
No description available.
superb/wav2vec2-base-superb-er
This is a ported version of S3PRL's Wav2Vec2 for the SUPERB Emotion Recognition task. The base model is wav2vec2-base, which is pretrained o...
OpenMuQ/MuQ-large-msd-iter
No description available.
speechbrain/lang-id-voxlingua107-ecapa
This is a spoken language recognition model trained on the VoxLingua107 dataset using SpeechBrain. The model uses the ECAPA-TDNN architectur...
facebook/audiobox-aesthetics
No description available.
OpenMuQ/MuQ-MuLan-large
No description available.
JaesungHuh/voice-gender-classifier
No description available.
m-a-p/MERT-v1-95M
No description available.
m-a-p/MERT-v1-330M
No description available.
facebook/mms-lid-1024
Developed by: Vineel Pratap et al. - Model type: Multi-Lingual Automatic Speech Recognition model - Language(s): 1024 languages, see support...
facebook/mms-lid-256
Developed by: Vineel Pratap et al. - Model type: Multi-Lingual Automatic Speech Recognition model - Language(s): 256 languages, see supporte...
bvallegc/wav2vec2_spoof_dection1-finetuned-spoofing-classifier
No description available....
firdhokk/speech-emotion-recognition-with-openai-whisper-large-v3
No description available.
MIT/ast-finetuned-audioset-14-14-0.443
The Audio Spectrogram Transformer is equivalent to ViT, but applied on audio. Audio is first turned into an image (as a spectrogram), after ...
jakeBland/wav2vec-vm-finetune
This model builds on wav2vec2-xls-r-300m, a self-supervised speech model trained on large-scale multilingual data. We fine-tuned it on the f...
facebook/mms-lid-126
Developed by: Vineel Pratap et al. - Model type: Multi-Lingual Automatic Speech Recognition model - Language(s): 126 languages, see supporte...
ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition
More information needed...
facebook/mms-lid-4017
Developed by: Vineel Pratap et al. - Model type: Multi-Lingual Automatic Speech Recognition model - Language(s): 4017 languages, see support...
alefiury/wav2vec2-large-xlsr-53-gender-recognition-librispeech
No description available.
DBD-research-group/AST-BirdSet-XCM
No description available.
speechbrain/spkrec-xvect-voxceleb
No description available.
padmalcom/wav2vec2-large-nonverbalvocalization-classification
This language indendent wav2vec2 classification model is based on this dataset....
SeyedAli/Musical-genres-Classification-Hubert-V1
More information needed...
speechbrain/lang-id-commonlanguage_ecapa
No description available.
superb/wav2vec2-base-superb-ks
This is a ported version of S3PRL's Wav2Vec2 for the SUPERB Keyword Spotting task. The base model is wav2vec2-base, which is pretrained on 1...
Gustking/wav2vec2-large-xlsr-deepfake-audio-classification
pipelinetag: audio-classification...
audeering/wav2vec2-large-robust-6-ft-age-gender
No description available.
bookbot/distil-ast-audioset
No description available.
awsaf49/sonics-spectttra-alpha-120s
No description available.
MIT/ast-finetuned-audioset-16-16-0.442
The Audio Spectrogram Transformer is equivalent to ViT, but applied on audio. Audio is first turned into an image (as a spectrogram), after ...
MIT/ast-finetuned-audioset-12-12-0.447
The Audio Spectrogram Transformer is equivalent to ViT, but applied on audio. Audio is first turned into an image (as a spectrogram), after ...
superb/hubert-base-superb-er
This is a ported version of S3PRL's Hubert for the SUPERB Emotion Recognition task. The base model is hubert-base-ls960, which is pretrained...
DBD-research-group/AST-BirdSet-XCL
No description available.
mispeech/dasheng-base
No description available.
Jzuluaga/accent-id-commonaccent_xlsr-en-english
No description available.
firdhokk/speech-emotion-recognition-with-facebook-wav2vec2-large-xlsr-53
No description available.
3loi/SER-Odyssey-Baseline-WavLM-Categorical
The model was trained on MSP-Podcast for the Odyssey 2024 Emotion Recognition competition baseline This particular model is the categorical ...
superb/hubert-large-superb-er
This is a ported version of S3PRL's Hubert for the SUPERB Emotion Recognition task. The base model is hubert-large-ll60k, which is pretraine...
Dpngtm/wav2vec2-emotion-recognition
Model Architecture: Wav2Vec2 with a frozen CNN feature extractor and a trainable sequence classification head. - Language: English - Task: S...
mtg-upf/discogs-maest-30s-pw-129e-519l
No description available.
tiantiaf/whisper-large-v3-narrow-accent
This model includes the implementation of narrow accent classification described in Vox-Profile: A Speech Foundation Model Benchmark for Cha...
tiantiaf/wavlm-large-age-sex
This model includes the implementation of age and sex classification described in Vox-Profile: A Speech Foundation Model Benchmark for Chara...
tiantiaf/whisper-large-v3-speech-flow
This model includes the implementation of speech fluency classification described in Vox-Profile: A Speech Foundation Model Benchmark for Ch...
tiantiaf/whisper-large-v3-voice-quality
This model includes the implementation of voice quality classification described in Vox-Profile: A Speech Foundation Model Benchmark for Cha...
HTill/flexEAT-base_epoch30_pretrain
⚠️ Codebase Update: Input Flexibility & Fine-Tuning Preparation...
dima806/music_genres_classification
Music genre classification is a fundamental and versatile application in many various domains. Some possible use cases for music genre class...
tiantiaf/whisper-large-v3-msp-podcast-emotion
This model includes the implementation of categorical emotion classification described in Vox-Profile: A Speech Foundation Model Benchmark f...
tiantiaf/whisper-large-v3-msp-podcast-emotion-dim
This model includes the implementation of dimensional emotion classification described in Vox-Profile: A Speech Foundation Model Benchmark f...
tiantiaf/whisper-large-v3-broad-accent
This model includes the implementation of broader accent classification described in Vox-Profile: A Speech Foundation Model Benchmark for Ch...
7wolf/wav2ast-gender-classification
More information needed...
Aniemore/wav2vec2-xlsr-53-russian-emotion-recognition
No description available.
anton-l/wav2vec2-random-tiny-classifier
No description available....
Jzuluaga/accent-id-commonaccent_ecapa
- en - audio-classification - speechbrain - embeddings - Accent - Identification - pytorch - ECAPA-TDNN - TDNN - CommonAccent - CommonVoice ...
superb/wav2vec2-large-superb-er
This is a ported version of S3PRL's Wav2Vec2 for the SUPERB Emotion Recognition task. The base model is wav2vec2-large-lv60, which is pretra...
MelodyMachine/Deepfake-audio-detection-V2
More information needed...
Khoa/w2v-speech-emotion-recognition
This model is fine-tuned for recognizing emotions in English speech using the Wav2Vec2 architecture. It is capable of detecting the followin...
SeaBenSea/hubert-large-turkish-speech-emotion-recognition
No description available.
griko/gender_cls_svm_ecapa_voxceleb
Input: Audio file (will be converted to 16kHz, mono, single channel) - Output: Gender prediction ("male" or "female") - Speaker embedding: 1...
superb/wav2vec2-base-superb-sid
This is a ported version of S3PRL's Wav2Vec2 for the SUPERB Speaker Identification task. The base model is wav2vec2-base, which is pretraine...
Speech-Arena-2025/DF_Arena_1B_V_1
No description available.
hzhongresearch/yamnetp_ahead_ds
No description available.
Wiam/wav2vec2-lg-xlsr-en-speech-emotion-recognition-finetuned-ravdess-v8
More information needed...
MIT/ast-finetuned-speech-commands-v2
The Audio Spectrogram Transformer is equivalent to ViT, but applied on audio. Audio is first turned into an image (as a spectrogram), after ...
somosnlp-hackathon-2022/wav2vec2-base-finetuned-sentiment-classification-MESD
This model was trained to classify underlying sentiment of Spanish audio/speech....
aufklarer/WeSpeaker-ResNet34-LM-MLX
No description available.
prithivMLmods/Speech-Emotion-Classification
No description available.
xmj2002/hubert-base-ch-speech-emotion-recognition
This model uses TencentGameMate/chinese-hubert-base) as the pre-training model for training on the CASIA dataset....
garystafford/wav2vec2-deepfake-voice-detector
No description available.
ALM/wav2vec2-base-audioset
pipelinetag: audio-classification - music - audio - speech - audio-representation-learning - arch-benchmark - general-audio...
MarekCech/GenreVim-Music-Classification-DistilHuBERT
This model is finetuned version of ntu-spml/distilhubert for music genre classification. - Blues - Classical music - Country music - Drum & ...
awsaf49/sonics-spectttra-gamma-5s
No description available.
alkiskoudounas/voc2vec-hubert-ls-pt
voc2vec-hubert is built upon the HuBERT framework and follows its pre-training setup. The pre-training datasets include: AudioSet (vocalizat...
WasuratS/distilhubert-finetuned-gtzan
Distilhubert is distilled version of the HuBERT and pretrained on data set with 16k frequency. Architecture of this model is CTC or Connecti...
chrisjay/afrospeech-wav2vec-all-6
No description available.
mispeech/ced-base
No description available.
lewtun/distilhubert-finetuned-gtzan
More information needed...
pedromatias97/genre-recognizer-finetuned-gtzan_dset
More information needed...
dima806/english_accents_classification
Returns common English accent given a voice audio sample....
Bagus/wav2vec2-xlsr-japanese-speech-emotion-recognition
No description available.
MIT/ast-finetuned-audioset-10-10-0.448
The Audio Spectrogram Transformer is equivalent to ViT, but applied on audio. Audio is first turned into an image (as a spectrogram), after ...
Krithika-p/my_awesome_emotions_model
More information needed...
Hatman/audio-emotion-detection
A model that returns Labels for Angry, Disgusted, Fearful, Happy, Neutral, Sad, Suprised. All aduio was trained at a sampling rate of 16000 ...
KELONMYOSA/wav2vec2-xls-r-300m-emotion-ru
No description available.
lugan/SynTTS-Commands-Media-Benchmarks
No description available.
MTUCI/AASIST3
No description available.
gaunernst/vit_base_patch16_1024_128.audiomae_as2m_ft_as20k
Model Type: Audio classification / feature backbone - Papers: - Masked Autoencoders that Listen: https://arxiv.org/abs/2207.06405 - Pretrain...
anton-l/wav2vec2-base-superb-sv
No description available.
Aniemore/wavlm-emotion-russian-resd
No description available....
tiantiaf/wavlm-large-categorical-emotion
This model includes the implementation of categorical emotion classification described in Vox-Profile: A Speech Foundation Model Benchmark f...
ALM/hubert-base-audioset
pipelinetag: audio-classification - music - audio - speech - audio-representation-learning - arch-benchmark - general-audio...
Hemgg/Deepfake-audio-detection
The model is fintune on facebook/wav2vec2-base...
DunnBC22/wav2vec2-base-Speech_Emotion_Recognition
This model predicts the emotion of the person speaking in the audio sample. For more information on how it was created, check out the follow...
forwarder1121/voice-based-stress-recognition
Model name: Voice-Based Stress Recognition (StudentNet) - Repository: https://huggingface.co/forwarder1121/voice-based-stress-recognition - ...