2024
A Simple HMM with Self-Supervised Representations for Phone Segmentation.
CoRR, 2024
Estimating the Completeness of Discrete Speech Units.
CoRR, 2024
Property Neurons in Self-Supervised Speech Transformers.
CoRR, 2024
DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models.
CoRR, 2024
2023
Improving Seq2Seq TTS Frontends With Transcribed Speech Audio.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
Conditioning and Sampling in Variational Diffusion Models for Speech Super-Resolution.
Proceedings of the IEEE International Conference on Acoustics, 2023
Learning Dependencies of Discrete Speech Representations with Neural Hidden Markov Models.
Proceedings of the IEEE International Conference on Acoustics, 2023
Towards Matching Phones and Speech Representations.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
MelHuBERT: A Simplified Hubert on Mel Spectrograms.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
2022
Autoregressive Predictive Coding: A Comprehensive Study.
IEEE J. Sel. Top. Signal Process., 2022
Compressing Transformer-based self-supervised models for speech processing.
CoRR, 2022
MelHuBERT: A simplified HuBERT on Mel spectrogram.
CoRR, 2022
Autoregressive Co-Training for Learning Discrete Speech Representations.
CoRR, 2022
On Compressing Sequences for Self-Supervised Speech Models.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
Autoregressive Co-Training for Learning Discrete Speech Representation.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Phonetic Analysis of Self-supervised Representations of English Speech.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Supervised Attention in Sequence-to-Sequence Models for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022
2020
Vector-Quantized Autoregressive Predictive Coding.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Audio-Visual Calibration with Polynomial Regression for 2-D Projection Using SVD-PHAT.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
2019
Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification.
IEEE ACM Trans. Audio Speech Lang. Process., 2019
VoiceID Loss: Speech Enhancement for Speaker Verification.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
A Deep Residual Network for Large-Scale Acoustic Scene Analysis.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
An Unsupervised Autoregressive Model for Speech Representation Learning.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
2018
On The Inductive Bias of Words in Acoustics-to-Word Models.
CoRR, 2018
On Training Recurrent Networks with Truncated Backpropagation Through time in Speech Recognition.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018
Frame-Level Speaker Embeddings for Text-Independent Speaker Recognition and Analysis of End-to-End Model.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018
A Study of Enhancement, Augmentation and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
2017
ASR for Under-Resourced Languages From Probabilistic Transcription.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
IEEE ACM Trans. Audio Speech Lang. Process., 2017
End-to-End Neural Segmental Models for Speech Recognition.
IEEE J. Sel. Top. Signal Process., 2017
Lexicon-free fingerspelling recognition from video: Data, models, and signer adaptation.
Comput. Speech Lang., 2017
Sequence Prediction with Neural Segmental Models.
CoRR, 2017
Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
2016
End-to-end training approaches for discriminative segmental models.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016
Triphone State-Tying via Deep Canonical Correlation Analysis.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Efficient Segmental Cascades for Speech Recognition.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Adapting ASR for under-resourced languages using mismatched transcriptions.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Signer-independent fingerspelling recognition with deep neural network adaptation.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
2015
Discriminative segmental cascades for feature-rich phone recognition.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015
2014
A comparison of training approaches for discriminative segmental models.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Log-linear dialog manager.
Proceedings of the IEEE International Conference on Acoustics, 2014
2012
Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach.
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, July 8-14, 2012, Jeju Island, Korea, 2012
2010
An initial attempt for phoneme recognition using Structured Support Vector Machine (SVM).
Proceedings of the IEEE International Conference on Acoustics, 2010
2009
Spoken term detection from bilingual spontaneous speech using code-switched lattice-based structures for words and subword units.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009