2024

A Simple HMM with Self-Supervised Representations for Phone Segmentation.

[DOI]

,

CoRR, 2024

Estimating the Completeness of Discrete Speech Units.

[DOI]

,

CoRR, 2024

Property Neurons in Self-Supervised Speech Transformers.

[DOI]

,

,

,

CoRR, 2024

DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models.

[DOI]

,

,

CoRR, 2024

2023

Improving Seq2Seq TTS Frontends With Transcribed Speech Audio.

[DOI]

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Conditioning and Sampling in Variational Diffusion Models for Speech Super-Resolution.

[DOI]

,

,

György Fazekas

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Learning Dependencies of Discrete Speech Representations with Neural Hidden Markov Models.

[DOI]

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Towards Matching Phones and Speech Representations.

[DOI]

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

MelHuBERT: A Simplified Hubert on Mel Spectrograms.

[DOI]

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Autoregressive Predictive Coding: A Comprehensive Study.

[DOI]

,

,

,

,

IEEE J. Sel. Top. Signal Process., 2022

Compressing Transformer-based self-supervised models for speech processing.

[DOI]

,

Tsung-Huan Yang

,

,

Kuang-Ming Chen

,

,

,

CoRR, 2022

MelHuBERT: A simplified HuBERT on Mel spectrogram.

[DOI]

,

,

CoRR, 2022

Autoregressive Co-Training for Learning Discrete Speech Representations.

[DOI]

,

CoRR, 2022

On Compressing Sequences for Self-Supervised Speech Models.

[DOI]

,

,

,

Shinji Watanabe

,

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Autoregressive Co-Training for Learning Discrete Speech Representation.

[DOI]

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Phonetic Analysis of Self-supervised Representations of English Speech.

[DOI]

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Supervised Attention in Sequence-to-Sequence Models for Speech Recognition.

[DOI]

,

Proceedings of the IEEE International Conference on Acoustics, 2022

2020

Vector-Quantized Autoregressive Predictive Coding.

[DOI]

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Audio-Visual Calibration with Polynomial Regression for 2-D Projection Using SVD-PHAT.

[DOI]

François Grondin

,

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification.

[DOI]

Achintya Kumar Sarkar

,

,

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2019

VoiceID Loss: Speech Enhancement for Speaker Verification.

[DOI]

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A Deep Residual Network for Large-Scale Acoustic Scene Analysis.

[DOI]

,

,

François Grondin

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

An Unsupervised Autoregressive Model for Speech Representation Learning.

[DOI]

,

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2018

On The Inductive Bias of Words in Acoustics-to-Word Models.

[DOI]

,

CoRR, 2018

On Training Recurrent Networks with Truncated Backpropagation Through time in Speech Recognition.

[DOI]

,

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Frame-Level Speaker Embeddings for Text-Independent Speaker Recognition and Analysis of End-to-End Model.

[DOI]

,

,

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

A Study of Enhancement, Augmentation and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition.

[DOI]

,

,

François Grondin

,

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition.

[DOI]

,

,

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017

ASR for Under-Resourced Languages From Probabilistic Transcription.

[DOI]

Mark A. Hasegawa-Johnson

,

,

,

Majid Mirbagheri

,

Giovanni M. Di Liberto

,

,

,

,

,

,

Edmund C. Lalor

,

,

,

,

,

Adrian K. C. Lee

IEEE ACM Trans. Audio Speech Lang. Process., 2017

End-to-End Neural Segmental Models for Speech Recognition.

[DOI]

,

,

,

,

,

,

,

IEEE J. Sel. Top. Signal Process., 2017

Lexicon-free fingerspelling recognition from video: Data, models, and signer adaptation.

[DOI]

,

,

,

,

,

Gregory Shakhnarovich

,

,

Comput. Speech Lang., 2017

Sequence Prediction with Neural Segmental Models.

[DOI]

CoRR, 2017

Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition.

[DOI]

Shubham Toshniwal

,

,

,

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016

End-to-end training approaches for discriminative segmental models.

[DOI]

,

,

,

Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Triphone State-Tying via Deep Canonical Correlation Analysis.

[DOI]

,

,

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Efficient Segmental Cascades for Speech Recognition.

[DOI]

,

,

,

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Adapting ASR for under-resourced languages using mismatched transcriptions.

[DOI]

,

,

,

,

,

,

Mark Hasegawa-Johnson

,

Sanjeev Khudanpur

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Signer-independent fingerspelling recognition with deep neural network adaptation.

[DOI]

,

,

,

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Discriminative segmental cascades for feature-rich phone recognition.

[DOI]

,

,

,

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

A comparison of training approaches for discriminative segmental models.

[DOI]

,

,

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Log-linear dialog manager.

[DOI]

,

Shinji Watanabe

,

,

John R. Hershey

Proceedings of the IEEE International Conference on Acoustics, 2014

2012

Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach.

[DOI]

,

,

Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, July 8-14, 2012, Jeju Island, Korea, 2012

2010

An initial attempt for phoneme recognition using Structured Support Vector Machine (SVM).

[DOI]

,

,

Proceedings of the IEEE International Conference on Acoustics, 2010

2009

Spoken term detection from bilingual spontaneous speech using code-switched lattice-based structures for words and subword units.

[DOI]

,

,

,

Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009