Yao Qian

Orcid: 0000-0003-1855-9630

According to our database1, Yao Qian authored at least 144 papers between 2001 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
GDNet: a low-light image enhancement network based on Ghost-Block and unique image decomposition.
J. Supercomput., January, 2025

2024
RDGMEF: a multi-exposure image fusion framework based on Retinex decompostion and guided filter.
Neural Comput. Appl., July, 2024

GAN-GA: infrared and visible image fusion generative adversarial network based on global awareness.
Appl. Intell., July, 2024

EgeFusion: Towards Edge Gradient Enhancement in Infrared and Visible Image Fusion With Multi-Scale Transform.
IEEE Trans. Computational Imaging, 2024

CFNet: An infrared and visible image compression fusion network.
Pattern Recognit., 2024

DANT-GAN: A dual attention-based of nested training network for infrared and visible image fusion.
Digit. Signal Process., 2024

Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation.
CoRR, 2024

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers.
CoRR, 2024

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation.
CoRR, 2024

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations.
CoRR, 2024

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction.
CoRR, 2023

ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation.
CoRR, 2023

i-Code Studio: A Configurable and Composable Framework for Integrative AI.
CoRR, 2023

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data.
CoRR, 2023

ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Adapting Multi-Lingual ASR Models for Handling Multiple Talkers.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Code-Switching Text Generation and Injection in Mandarin-English ASR.
Proceedings of the IEEE International Conference on Acoustics, 2023

DATA2VEC-SG: Improving Self-Supervised Learning Representations for Speech Generation Tasks.
Proceedings of the IEEE International Conference on Acoustics, 2023

Target Sound Extraction with Variable Cross-Modality Clues.
Proceedings of the IEEE International Conference on Acoustics, 2023

i-Code: An Integrative and Composable Multimodal Learning Framework.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.
IEEE J. Sel. Top. Signal Process., 2022

The Microsoft System for VoxCeleb Speaker Recognition Challenge 2022.
CoRR, 2022

Deploying self-supervised learning in the wild for hybrid automatic speech recognition.
CoRR, 2022

A Comprehensive Study on Self-Supervised Distillation for Speaker Representation Learning.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving Self-Supervised Learning for Speech Recognition with Intermediate Layer Supervision.
Proceedings of the IEEE International Conference on Acoustics, 2022

Optimizing Alignment of Speech and Language Latent Spaces for End-To-End Speech Recognition and Understanding.
Proceedings of the IEEE International Conference on Acoustics, 2022

Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction.
Proceedings of the IEEE International Conference on Acoustics, 2022

Wav2vec-Switch: Contrastive Learning from Original-Noisy Speech Pairs for Robust Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Unispeech-Sat: Universal Speech Representation Learning With Speaker Aware Pre-Training.
Proceedings of the IEEE International Conference on Acoustics, 2022

Large-Scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2022

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Self-Supervised Learning for speech recognition with Intermediate layer supervision.
CoRR, 2021

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.
CoRR, 2021

Multilingual Speech Recognition using Knowledge Transfer across Learning Processes.
CoRR, 2021

SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing.
CoRR, 2021

Automated Scoring of Spontaneous Speech from Young Learners of English Using Transformers.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Automatic Detection of Word-Level Reading Errors in Non-native English Speech Based on ASR Output.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data.
Proceedings of the 38th International Conference on Machine Learning, 2021

Speech-Language Pre-Training for End-to-End Spoken Language Understanding.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Spoken Language Understanding of Human-Machine Conversations for Language Learning Applications.
J. Signal Process. Syst., 2020

Discriminative Transfer Learning for Optimizing ASR and Semantic Labeling in Task-Oriented Spoken Dialog.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019
A Bipolar-Input Thermoelectric Energy-Harvesting Interface With Boost/Flyback Hybrid Converter and On-Chip Cold Starter.
IEEE J. Solid State Circuits, 2019

To Trust, or Not to Trust? A Study of Human Bias in Automated Video Interview Assessments.
CoRR, 2019

Scoring Interactional Aspects of Human-Machine Dialog for Language Learning and Assessment using Text Features.
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue, 2019

An 84% Peak Efficiency Bipolar-Input Boost/Flyback Hybrid Converter With MPPT and on-Chip Cold Starter for Thermoelectric Energy Harvesting.
Proceedings of the IEEE International Solid- State Circuits Conference, 2019

Automatic Detection of Off-Topic Spoken Responses Using Very Deep Convolutional Neural Networks.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Automated Estimation of Oral Reading Fluency During Summer Camp e-Book Reading with MyTurnToRead.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Are Humans Biased in Assessment of Video Interviews?
Proceedings of the Adjunct of the 2019 International Conference on Multimodal Interaction, 2019

Neural Approaches to Automated Speech Scoring of Monologue and Dialogue Responses.
Proceedings of the IEEE International Conference on Acoustics, 2019

Application of an Automatic Plagiarism Detection System in a Large-scale Assessment of English Speaking Proficiency.
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, 2019

Using Very Deep Convolutional Neural Networks to Automatically Detect Plagiarized Spoken Responses.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Native Language Identification from Raw Waveforms Using Deep Convolutional Neural Networks with Attentive Pooling.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
An On-Chip Transformer-Based Self-Startup Hybrid SIDITO Converter for Thermoelectric Energy Harvesting.
IEEE Trans. Circuits Syst. II Express Briefs, 2018

Exploring End-To-End Attention-Based Neural Networks For Native Language Identification.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

A Prompt-Aware Neural Network Approach to Content-Based Scoring of Non-Native Spontaneous Speech.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Automatic Turn-Level Language Identification for Code-Switched Spanish-English Dialog.
Proceedings of the 9th International Workshop on Spoken Dialogue System Technology, 2018

From Speech Signals to Semantics - Tagging Performance at Acoustic, Phonetic and Word Levels.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Unusable Spoken Response Detection with BLSTM Neural Networks.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Improvements to an Automated Content Scoring System for Spoken CALL Responses: the ETS Submission to the Second Spoken CALL Shared Task.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

End-to-End Neural Network Based Automated Speech Scoring.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
A SIDIDO DC-DC Converter With Dual-Mode and Programmable-Capacitor-Array MPPT Control for Thermoelectric Energy Harvesting.
IEEE Trans. Circuits Syst. II Express Briefs, 2017

Developing speech processing technologies for shared book reading with a computer.
Proceedings of the 6th International Workshop on Child Computer Interaction, 2017

Using an Automated Content Scoring Engine for Spoken CALL Responses: The ETS submission for the Spoken CALL Challenge.
Proceedings of the 7th ISCA International Workshop on Speech and Language Technology in Education, 2017

Improving Sub-Phone Modeling for Better Native Language Identification with Non-Native English Speech.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Bidirectional LSTM-RNN for Improving Automated Assessment of Non-Native Children's Speech.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A Report on the 2017 Native Language Identification Shared Task.
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, 2017

Exploring ASR-free end-to-end modeling to improve spoken language understanding in a cloud-based dialog system.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Improving native language (L1) identifation with better VAD and TDNN trained separately on native and non-native English corpora.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2016
Modeling F0 trajectories in hierarchically structured deep neural networks.
Speech Commun., 2016

Improving DNN-Based Automatic Recognition of Non-native Children Speech with Adult Speech.
Proceedings of the 5th Workshop on Child Computer Interaction, 2016

Learning Distributed Word Representations For Bidirectional LSTM Recurrent Neural Network.
Proceedings of the NAACL HLT 2016, 2016

Self-Adaptive DNN for Improving Spoken Language Proficiency Assessment.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Noise and Metadata Sensitive Bottleneck Features for Improving Speaker Recognition with Non-Native Speech Input.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

A comparison of ASR and human errors for transcription of non-native spontaneous speech.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Speaker and language factorization in DNN-based TTS synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Unsupervised speaker adaptation for DNN-based TTS synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers.
Speech Commun., 2015

A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding.
CoRR, 2015

Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Recurrent Neural Network.
CoRR, 2015

An improved DNN-based approach to mispronunciation detection and diagnosis of L2 learners' speech.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2015

Sequence generation error (SGE) minimization based deep neural networks training for text-to-speech synthesis.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Word embedding for recurrent neural network based TTS synthesis.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Using bidirectional lstm recurrent neural networks to learn high-level abstractions of sequential features for automated scoring of non-native spontaneous speech.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014
Cross-Dialectal Voice Conversion with Neural Networks.
IEICE Trans. Inf. Syst., 2014

Dynamic facial expression recognition based on K-order emotional intensity model.
Proceedings of the 2014 IEEE International Conference on Robotics and Biomimetics, 2014

4.3 An 87%-peak-efficiency DVS-capable single-inductor 4-output DC-DC buck converter with ripple-based adaptive off-time control.
Proceedings of the 2014 IEEE International Conference on Solid-State Circuits Conference, 2014

Pitch transformation in neural network based voice conversion.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

A new Neural Network based logistic regression classifier for improving mispronunciation detection of L2 language learners.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Modeling DCT parameterized F0 trajectory at intonation phrase level with DNN or decision tree.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Sequence error (SE) minimization training of neural network for voice conversion.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

TTS synthesis with bidirectional LSTM based recurrent neural networks.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

On the training aspects of Deep Neural Network (DNN) for parametric TTS synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2014

A DNN-based acoustic modeling of tonal language and its application to Mandarin pronunciation training.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013
A Unified Trajectory Tiling Approach to High Quality Speech Rendering.
IEEE Trans. Speech Audio Process., 2013

A new preprocessing algorithm and local binary pattern based facial expression recognition.
Proceedings of the 2013 IEEE/SICE International Symposium on System Integration, 2013

A new DNN-based high quality pronunciation evaluation for computer-aided language learning (CALL).
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

A fast table lookup based, statistical model driven non-uniform unit selection TTS.
Proceedings of the IEEE International Conference on Acoustics, 2013

VCCS controlled LDO with small on-chip capacitor.
Proceedings of the IEEE 10th International Conference on ASIC, 2013

2012
Computer-Assisted Audiovisual Language Learning.
Computer, 2012

Tip tap tones: mobile microtraining of mandarin sounds.
Proceedings of the Mobile HCI '12, 2012

Break index labeling of mandarin text via syntactic-to-prosodic tree mapping.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

A unified trajectory tiling approach to high quality TTS and cross-lingual voice transformation.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Pitch accent detection and prediction with DCT features and CRF model.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Turning a Monolingual Speaker into Multilingual for a Mixed-language TTS.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

2011
Improved Prosody Generation by Maximizing Joint Probability of State and Longer Units.
IEEE Trans. Speech Audio Process., 2011

A New Phonetic Candidate Generator for Improving Search Query Efficiency.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

A frame mapping based HMM approach to cross-lingual voice transformation.
Proceedings of the IEEE International Conference on Acoustics, 2011

Improved F0 modeling and generation in voice conversion.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
Automatic prosody prediction and detection with Conditional Random Field (CRF) models.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Formant-based frequency warping for improving speaker adaptation in HMM TTS.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

An HMM trajectory tiling (HTT) approach to high quality TTS.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Improved modeling for F0 generation and V/U decision in HMM-based TTS.
Proceedings of the IEEE International Conference on Acoustics, 2010

RIch-context Unit Selection (RUS) approach to high quality TTS.
Proceedings of the IEEE International Conference on Acoustics, 2010

An HMM Trajectory Tiling (HTT) Approach to High Quality TTS - Microsoft Entry to Blizzard Challenge 2010.
Proceedings of the Blizzard Challenge 2010, Kansai Science City, Japan, September 25, 2010, 2010

2009
A Cross-Language State Sharing and Mapping Approach to Bilingual (Mandarin-English) TTS.
IEEE Trans. Speech Audio Process., 2009

A Multi-Space Distribution (MSD) and two-stream tone modeling approach to Mandarin speech recognition.
Speech Commun., 2009

Rich context modeling for high quality HMM-based TTS.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

A minimum v/u error approach to F0 generation in HMM-based TTS.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Improved prosody generation by maximizing joint likelihood of state and longer units.
Proceedings of the IEEE International Conference on Acoustics, 2009

State mapping for cross-language speaker adaptation in TTS.
Proceedings of the IEEE International Conference on Acoustics, 2009

2008
Tone-enhanced generalized character posterior probability (GCPP) for Cantonese LVCSR.
Comput. Speech Lang., 2008

Modeling and Generating Tone Contour with Phrase Intonation for Mandarin Chinese Speech.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

HMM-Based Mixed-Language (Mandarin-English) Speech Synthesis.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Prosody for Mandarin speech recognition: a comparative study of read and spontaneous speech.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

A real-time text to audio-visual speech synthesis system.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Generating natural F0 trajectory with additive trees.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Duration refinement by jointly optimizing state and longer unit likelihood.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

A cross-language state mapping approach to bilingual (Mandarin-English) TTS.
Proceedings of the IEEE International Conference on Acoustics, 2008

2007
An HMM-based bilingual (Mandarin-English) TTS.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

Robust F0 modeling for Mandarin speech recognition in noise.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

2006
Improved Mandarin Speech Recognition by Lattice Rescoring with Enhanced Tone Models.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

An HMM-Based Mandarin Chinese Text-To-Speech System.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

A multi-space distribution (MSD) approach to speech recognition of tonal languages.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

2004
Analysis and modeling of F0 contours for cantonese text-to-speech.
ACM Trans. Asian Lang. Inf. Process., 2004

Tone information as a confidence measure for improving Cantonese LVCSR.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

2003
Overlapped di-tone modeling for tone recognition in continuous Cantonese speech.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

2002
Acoustical F0 analysis of continuous cantonese speech.
Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002

Assigning phrase accent to Chinese Text-to-Speech system.
Proceedings of the IEEE International Conference on Acoustics, 2002

2001
Locating Boundaries for Prosodic Constituents in Unrestricted Mandarin Texts.
Int. J. Comput. Linguistics Chin. Lang. Process., 2001

Segmenting unrestricted Chinese text into prosodic words instead of lexical words.
Proceedings of the IEEE International Conference on Acoustics, 2001


  Loading...