Zhehuai Chen

Orcid: 0000-0003-4400-5340

According to our database¹, Zhehuai Chen authored at least 54 papers between 2015 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts.

[BibT_eX]

[DOI]

CoRR, 2024

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks.

[BibT_eX]

[DOI]

Fabian Ritter Gutierrez

CoRR, 2024

Anticipating Future with Large Language Model for Simultaneous Machine Translation.

[BibT_eX]

[DOI]

CoRR, 2024

VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data.

[BibT_eX]

[DOI]

CoRR, 2024

EMMeTT: Efficient Multimodal Machine Translation Training.

[BibT_eX]

[DOI]

CoRR, 2024

Chain-of-Thought Prompting for Speech Translation.

[BibT_eX]

[DOI]

CoRR, 2024

Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5.

[BibT_eX]

[DOI]

CoRR, 2024

Less is More: Accurate Speech Recognition & Translation without Web-Scale Data.

[BibT_eX]

[DOI]

CoRR, 2024

DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment.

[BibT_eX]

[DOI]

CoRR, 2024

Instruction Data Generation and Unsupervised Adaptation for Speech Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators.

[BibT_eX]

[DOI]

CoRR, 2024

High-precision Voice Search Query Correction via Retrievable Speech-text Embedings.

[BibT_eX]

[DOI]

CoRR, 2024

Transducers with Pronunciation-Aware Embeddings for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

SALM: Speech-Augmented Language Model with in-Context Learning for Speech Recognition and Translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages.

[BibT_eX]

[DOI]

CoRR, 2023

Using Text Injection to Improve Recognition of Personal Identifiers in Speech.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Understanding Shared Speech-Text Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Accelerating RNN-T Training and Inference Using CTC Guidance.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Accelerating RNN-T Training and Inference Using CTC guidance.

[BibT_eX]

[DOI]

CoRR, 2022

Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data.

[BibT_eX]

[DOI]

CoRR, 2022

JOIST: A Joint Speech and Text Streaming Model for ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Maestro-U: Leveraging Joint Speech-Text Representation Learning for Zero Supervised Speech ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Unsupervised Data Selection via Discrete Speech Representation for ASR.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

MAESTRO: Matched Speech Text Representations through Modality Matching.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Tts4pretrain 2.0: Advancing the use of Text and Speech in ASR Pretraining with Consistency and Contrastive Losses.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Semi-Supervision in ASR: Sequential MixMatch and Factorized TTS-Based Augmentation.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Conformer Parrotron: A Faster and Stronger End-to-End Speech Conversion and Recognition Model for Atypical Speech.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

An Asynchronous WFST-Based Decoder for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Injecting Text in Self-Supervised Speech Pretraining.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Modular End-to-End Automatic Speech Recognition Framework for Acoustic-to-Word Model.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

SCADA: Stochastic, Consistent and Adversarial Data Augmentation to Improve ASR.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Improving Speech Recognition Using GAN-Based Speech Synthesis and Contrastive Unspoken Text Selection.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Improving Speech Recognition Using Consistent Predictions on Synthesized Speech.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

End-to-end Contextual Speech Recognition Using Class Language Models and a Token Passing Decoder.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Incremental Lattice Determinization for WFST Decoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Progressive Joint Modeling in Unsupervised Single-Channel Overlapped Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2018

Sequence discriminative training for deep learning based acoustic keyword spotting.

[BibT_eX]

[DOI]

Zhehuai Chen

Yanmin Qian

Kai Yu

Speech Commun., 2018

Linguistic Search Optimization for Deep Learning Based LVCSR.

[BibT_eX]

[DOI]

Zhehuai Chen

CoRR, 2018

Knowledge Distillation for Sequence Model.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

A GPU-based WFST Decoder with Exact Lattice Generation.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

On Modular Training of Neural Acoustics-to-Word Model for LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Sequence Modeling in Unsupervised Single-Channel Overlapped Speech Recognition.

[BibT_eX]

[DOI]

Zhehuai Chen

Jasha Droppo

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Phone Synchronous Speech Recognition With CTC Lattices.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2017

A Unified Confidence Measure Framework Using Auxiliary Normalization Graph.

[BibT_eX]

[DOI]

Zhehuai Chen

Yanmin Qian

Kai Yu

Proceedings of the Intelligence Science and Big Data Engineering, 2017

Confidence measures for CTC-based phone synchronous decoding.

[BibT_eX]

[DOI]

Zhehuai Chen

Yimeng Zhuang

Kai Yu

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Multi-view LSTM Language Model with Word-Synchronized Auxiliary Feature for LVCSR.

[BibT_eX]

[DOI]

Proceedings of the Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, 2017

2016

Directed automatic speech transcription error correction using bidirectional LSTM.

[BibT_eX]

[DOI]

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Phone Synchronous Decoding with CTC Lattice.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

2015

An investigation of context clustering for statistical speech synthesis with deep neural network.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Zhehuai Chen

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...