Zhehuai Chen

Orcid: 0000-0003-4400-5340

According to our database1, Zhehuai Chen authored at least 52 papers between 2015 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Anticipating Future with Large Language Model for Simultaneous Machine Translation.
CoRR, 2024

VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning.
CoRR, 2024

Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data.
CoRR, 2024

EMMeTT: Efficient Multimodal Machine Translation Training.
CoRR, 2024

Chain-of-Thought Prompting for Speech Translation.
CoRR, 2024

Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition.
CoRR, 2024

BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5.
CoRR, 2024

Less is More: Accurate Speech Recognition & Translation without Web-Scale Data.
CoRR, 2024

DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment.
CoRR, 2024

Instruction Data Generation and Unsupervised Adaptation for Speech Language Models.
CoRR, 2024

GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators.
CoRR, 2024

High-precision Voice Search Query Correction via Retrievable Speech-text Embedings.
CoRR, 2024

Transducers with Pronunciation-Aware Embeddings for Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

SALM: Speech-Augmented Language Model with in-Context Learning for Speech Recognition and Translation.
Proceedings of the IEEE International Conference on Acoustics, 2024

GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages.
CoRR, 2023

Using Text Injection to Improve Recognition of Personal Identifiers in Speech.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Understanding Shared Speech-Text Representations.
Proceedings of the IEEE International Conference on Acoustics, 2023

Accelerating RNN-T Training and Inference Using CTC Guidance.
Proceedings of the IEEE International Conference on Acoustics, 2023

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-to-Speech.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
Accelerating RNN-T Training and Inference Using CTC guidance.
CoRR, 2022

Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data.
CoRR, 2022

JOIST: A Joint Speech and Text Streaming Model for ASR.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Maestro-U: Leveraging Joint Speech-Text Representation Learning for Zero Supervised Speech ASR.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Unsupervised Data Selection via Discrete Speech Representation for ASR.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

MAESTRO: Matched Speech Text Representations through Modality Matching.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Tts4pretrain 2.0: Advancing the use of Text and Speech in ASR Pretraining with Consistency and Contrastive Losses.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Semi-Supervision in ASR: Sequential MixMatch and Factorized TTS-Based Augmentation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Conformer Parrotron: A Faster and Stronger End-to-End Speech Conversion and Recognition Model for Atypical Speech.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

An Asynchronous WFST-Based Decoder for Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Injecting Text in Self-Supervised Speech Pretraining.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Modular End-to-End Automatic Speech Recognition Framework for Acoustic-to-Word Model.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

SCADA: Stochastic, Consistent and Adversarial Data Augmentation to Improve ASR.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Improving Speech Recognition Using GAN-Based Speech Synthesis and Contrastive Unspoken Text Selection.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Improving Speech Recognition Using Consistent Predictions on Synthesized Speech.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

End-to-end Contextual Speech Recognition Using Class Language Models and a Token Passing Decoder.
Proceedings of the IEEE International Conference on Acoustics, 2019

Incremental Lattice Determinization for WFST Decoders.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
Progressive Joint Modeling in Unsupervised Single-Channel Overlapped Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Sequence discriminative training for deep learning based acoustic keyword spotting.
Speech Commun., 2018

Linguistic Search Optimization for Deep Learning Based LVCSR.
CoRR, 2018

Knowledge Distillation for Sequence Model.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

A GPU-based WFST Decoder with Exact Lattice Generation.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

On Modular Training of Neural Acoustics-to-Word Model for LVCSR.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Sequence Modeling in Unsupervised Single-Channel Overlapped Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Phone Synchronous Speech Recognition With CTC Lattices.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

A Unified Confidence Measure Framework Using Auxiliary Normalization Graph.
Proceedings of the Intelligence Science and Big Data Engineering, 2017

Confidence measures for CTC-based phone synchronous decoding.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Multi-view LSTM Language Model with Word-Synchronized Auxiliary Feature for LVCSR.
Proceedings of the Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, 2017

2016
Directed automatic speech transcription error correction using bidirectional LSTM.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Phone Synchronous Decoding with CTC Lattice.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

2015
An investigation of context clustering for statistical speech synthesis with deep neural network.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015


  Loading...