Srikanth Ronanki

Srikanth Vishnubhotla

Daniel Garcia-Romero

Kyu J. Han

CoRR, 2024

SpeechVerse: A Large-scale Generalizable Audio Language Model.

[BibT_eX]

[DOI]

Xilai Li

Karel Mundnich

Monica Sunkara

Kyu J. Han

CoRR, 2024

SpeechGuard: Exploring the Adversarial Robustness of Multi-modal Large Language Models.

[BibT_eX]

[DOI]

Raghuveer Peri

Srikanth Vishnubhotla

Daniel Garcia-Romero

Kyu J. Han

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

Dynamic Chuck Convolution for Unified Streaming And Non-streaming Conformer ASR.

[BibT_eX]

[DOI]

CoRR, 2023

DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Metric-Driven Approach to Conformer Layer Pruning for Efficient ASR Inference.

[BibT_eX]

[DOI]

Karthik Gopalakrishnan

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Dynamic Chunk Convolution for Unified Streaming and Non-Streaming Conformer ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

AdaBERT-CTC: Leveraging BERT-CTC for Text-Only Domain Adaptation in ASR.

[BibT_eX]

[DOI]

Tyler Vuong

Karel Mundnich

Veera Raghavendra Elluru

Sravan Bodapati

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: EMNLP 2023, 2023

Retrieve and Copy: Scaling ASR Personalization to Large Catalogs.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: EMNLP 2023, 2023

Generalized Zero-Shot Audio-to-Intent Classification.

[BibT_eX]

[DOI]

Veera Raghavendra Elluru

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Device Directedness with Contextual Cues for Spoken Dialog Systems.

[BibT_eX]

[DOI]

Sravan Bodapati

CoRR, 2022

Personalization of CTC Speech Recognition Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Contextual Acoustic Barge-In Classification for Spoken Dialog Systems.

[BibT_eX]

[DOI]

Sravan Bodapati

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

"What's The Context?" : Long Context NLM Adaptation for ASR Rescoring in Conversational Agents.

[BibT_eX]

[DOI]

CoRR, 2021

Adapting Long Context NLM for ASR Rescoring in Conversational Agents.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Transformer-Transducers for Code-Switched Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Robust Prediction of Punctuation and Truecasing for Medical ASR.

[BibT_eX]

[DOI]

CoRR, 2020

Multimodal Semi-Supervised Learning Framework for Punctuation Prediction in Conversational Speech.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019

Prosody generation for text-to-speech synthesis.

[BibT_eX]

[DOI]

PhD thesis, 2019

The ASVspoof 2019 database.

[BibT_eX]

[DOI]

CoRR, 2019

In Other News: a Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data.

[BibT_eX]

[DOI]

Nishant Prateek

Mateusz Lajszczak

Roberto Barra-Chicote

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Fine-Grained Robust Prosody Transfer for Single-Speaker Neural Text-To-Speech.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Effect of Data Reduction on Sequence-to-sequence Neural TTS.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Effect of data reduction on sequence-to-sequence neural TTS.

[BibT_eX]

[DOI]

CoRR, 2018

Learning Interpretable Control Dimensions for Speech Synthesis by Using External Data.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017

A Hierarchical Encoder-Decoder Model for Statistical Parametric Speech Synthesis.

[BibT_eX]

[DOI]

Oliver Watts

Simon King

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

The CSTR entry to the Blizzard Challenge 2017.

[BibT_eX]

[DOI]

Proceedings of the Blizzard Challenge 2017, Stockholm, Sweden, August 25, 2017, 2017

2016

A Demonstration of the Merlin Open Source Neural Network Speech Synthesis System.

[BibT_eX]

[DOI]

Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

DNN-based Speech Synthesis for Indian Languages from ASCII text.

[BibT_eX]

[DOI]