Kaldi-compatible feature extraction with PyTorch, supporting CUDA, batch processing, chunk proces...
Zero-shot multimodal punctuation insertion and truecasing using Whisper
Fast inference engine for Transformer models
Code for paper in "ECAPA-TDNN Based Depression Detection from Clinical Speech"
Attempt at tracking states of the arts and recent results (bibliography) on speech recognition.
Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train...
Efficient Training of Audio Transformers with Patchout
Sequence modeling benchmarks and temporal convolutional networks
SpeechIO Leaderboard: a large, robust, comprehensive, benchmarking platform for Automatic Speech ...