Dong Yu
Orcid: 0000-0003-0520-6844Affiliations:
- Tencent AI Lab, China
- Microsoft Research, Redmond, WA, USA (1998 - 2017)
- University of Idaho, Moscow, ID, USA (PhD)
According to our database1,
Dong Yu
authored at least 471 papers
between 2003 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
On csauthors.net:
Bibliography
2024
Enhanced Acoustic Howling Suppression via Hybrid Kalman Filter and Deep Learning Models.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning.
CoRR, 2024
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search.
CoRR, 2024
CoRR, 2024
CoRR, 2024
HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows.
CoRR, 2024
CoRR, 2024
CoRR, 2024
CoRR, 2024
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis.
CoRR, 2024
CoRR, 2024
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning.
CoRR, 2024
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning.
CoRR, 2024
MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions.
CoRR, 2024
CoRR, 2024
Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models.
CoRR, 2024
Collaborative decoding of critical tokens for boosting factuality of large language models.
CoRR, 2024
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024
A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024
Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
UniX-Encoder: A Universal X-Channel Speech Encoder for AD-HOC Microphone Array Speech Processing.
Proceedings of the IEEE International Conference on Acoustics, 2024
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2024, 2024
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2024, 2024
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024
Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
Proceedings of the Findings of the Association for Computational Linguistics, 2024
Proceedings of the Findings of the Association for Computational Linguistics, 2024
Proceedings of the Findings of the Association for Computational Linguistics, 2024
Proceedings of the Findings of the Association for Computational Linguistics, 2024
Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
SportsMetrics: Blending Text and Numerical Data to Understand Information Fusion in LLMs.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
2023
IEEE Signal Process. Mag., July, 2023
Search-engine-augmented dialogue response generation with cheaply supervised query production.
Artif. Intell., June, 2023
IEEE ACM Trans. Audio Speech Lang. Process., 2023
IEEE ACM Trans. Audio Speech Lang. Process., 2023
IEEE ACM Trans. Audio Speech Lang. Process., 2023
Unsupervised TTS Acoustic Modeling for TTS With Conditional Disentangled Sequential VAE.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
Trans. Assoc. Comput. Linguistics, 2023
Discover, Explain, Improve: An Automatic Slice Detection Benchmark for Natural Language Processing.
Trans. Assoc. Comput. Linguistics, 2023
CoRR, 2023
CoRR, 2023
TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for Human-Aligned LLMs.
CoRR, 2023
RIR-SF: Room Impulse Response Based Spatial Feature for Multi-channel Multi-talker ASR.
CoRR, 2023
Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions.
CoRR, 2023
CoRR, 2023
A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation.
CoRR, 2023
3D Neural Beamforming for Multi-channel Speech Separation Against Location Uncertainty.
CoRR, 2023
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt.
CoRR, 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Zoneformer: On-device Neural Beamformer For In-car Multi-zone Speech Separation, Enhancement and Echo Cancellation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the Findings of the Association for Computational Linguistics: IJCNLP-AACL 2023, 2023
Proceedings of the Eleventh International Conference on Learning Representations, 2023
Proceedings of the Eleventh International Conference on Learning Representations, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023
Bridging Continuous and Discrete Spaces: Interpretable Sentence Representation Learning via Compositional Operations.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023
How do Words Contribute to Sentence Semantics? Revisiting Sentence Embeddings with a Perturbation Method.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Neuralecho: Hybrid of Full-Band and Sub-Band Recurrent Neural Network For Acoustic Echo Cancellation and Speech Enhancement.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
Going Beyond Sentence Embeddings: A Token-Level Matching Algorithm for Calculating Semantic Textual Similarity.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023
Bi-level Finetuning with Task-dependent Similarity Structure for Low-resource Training.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
Prosody-TTS: Improving Prosody with Masked Autoencoder and Conditional Diffusion Model For Expressive Text-to-Speech.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
2022
ACM Trans. Graph., 2022
IEEE Signal Process. Lett., 2022
C3-DINO: Joint Contrastive and Non-Contrastive Self-Supervised Learning for Speaker Verification.
IEEE J. Sel. Top. Signal Process., 2022
Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition.
Comput. Speech Lang., 2022
An investigation of neural uncertainty estimation for target speaker extraction equipped RNN transducer.
Comput. Speech Lang., 2022
Discover, Explanation, Improvement: Automatic Slice Detection Framework for Natural Language Processing.
CoRR, 2022
UTTS: Unsupervised TTS with Conditional Disentangled Sequential Variational Auto-encoder.
CoRR, 2022
NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement.
CoRR, 2022
DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs.
CoRR, 2022
Proceedings of the Uncertainty in Artificial Intelligence, 2022
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
Progressive Contrastive Learning for Self-Supervised Text-Independent Speaker Verification.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022
BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis.
Proceedings of the Tenth International Conference on Learning Representations, 2022
Proceedings of the IEEE International Conference on Data Mining Workshops, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
VCVTS: Multi-Speaker Video-to-Speech Synthesis Via Cross-Modal Knowledge Transfer from Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2022
Consistent Training and Decoding for End-to-End Speech Recognition Using Lattice-Free MMI.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
DP-DWA: Dual-Path Dynamic Weight Attention Network With Streaming Dfsmn-San For Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022
Referee: Towards Reference-Free Cross-Speaker Style Transfer with Low-Quality Data for Expressive Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2022
Robust Disentangled Variational Speech Representation Learning for Zero-Shot Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2022
C-MORE: Pretraining to Answer Open-Domain Questions by Consulting Millions of References.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2022
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
2021
IEEE ACM Trans. Audio Speech Lang. Process., 2021
IEEE ACM Trans. Audio Speech Lang. Process., 2021
IEEE ACM Trans. Audio Speech Lang. Process., 2021
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Complex Neural Spatial Filter: Enhancing Multi-Channel Target Speech Separation in Complex Domain.
IEEE Signal Process. Lett., 2021
Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning.
CoRR, 2021
CoRR, 2021
WPD++: An Improved Neural Beamformer for Simultaneous Speech Separation and Dereverberation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Neural Mask based Multi-channel Convolutional Beamforming for Joint Dereverberation, Echo Cancellation and Denoising.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Proceedings of the Natural Language Processing and Chinese Computing, 2021
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Self-Supervised Text-Independent Speaker Verification Using Prototypical Momentum Contrastive Learning.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization.
Proceedings of the IEEE International Conference on Acoustics, 2021
Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Sandglasset: A Light Multi-Granularity Self-Attentive Network for Time-Domain Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2021
Learned Transferable Architectures Can Surpass Hand-Designed Architectures for Large Scale Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021
Self-Teaching Machines to Read and Comprehend with Large-Scale Multi-Subject Question-Answering Data.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021
Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021
Proceedings of the 3rd Conference on Automated Knowledge Base Construction, 2021
Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021
NaturalConv: A Chinese Dialogue Dataset Towards Multi-turn Topic-driven Conversation.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
Tune-In: Training Under Negative Environments with Interference for Attention Networks Simulating Cocktail Party Effect.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
2020
IEEE ACM Trans. Audio Speech Lang. Process., 2020
Trans. Assoc. Comput. Linguistics, 2020
Neural Networks, 2020
Audio-Visual Speech Separation and Dereverberation With a Two-Stage Multimodal Network.
IEEE J. Sel. Top. Signal Process., 2020
IEEE J. Sel. Top. Signal Process., 2020
TexSmart: A Text Understanding System for Fine-Grained NER and Enhanced Semantic Analysis.
CoRR, 2020
Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training.
CoRR, 2020
CoRR, 2020
Proceedings of the Twenty-Ninth Text REtrieval Conference, 2020
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020
DurIAN-SC: Duration Informed Attention Network Based Singing Voice Conversion System.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Far-Field Location Guided Target Speech Extraction Using End-to-End Speech Recognition Objectives.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Multi-Level Deep Neural Network Adaptation for Speaker Verification Using MMD and Consistency Regularization.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Mixup-breakdown: A Consistency Training Method for Improving Generalization of Speech Separation Models.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Speaker-Aware Target Speaker Enhancement by Jointly Learning with Speaker Embedding Extraction.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020
Proceedings of the Computer Vision - ECCV 2020, 2020
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020
MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
2019
Trans. Assoc. Comput. Linguistics, 2019
Erratum to: Past review, current progress, and challenges ahead on the cocktail party problem.
Frontiers Inf. Technol. Electron. Eng., 2019
CoRR, 2019
Multi-Document Summarization with Determinantal Point Processes and Contextualized Representations.
CoRR, 2019
CoRR, 2019
CoRR, 2019
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Extract, Adapt and Recognize: An End-to-End Neural Network for Corrupted Monaural Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019
Unsupervised Speech Recognition via Segmental Empirical Output Distribution Matching.
Proceedings of the 7th International Conference on Learning Representations, 2019
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Enhancing Hybrid Self-attention Structure with Relative-position-aware Bias for Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2019
Joint Training of Complex Ratio Mask Based Beamformer and Acoustic Model for Noise Robust Asr.
Proceedings of the IEEE International Conference on Acoustics, 2019
A Comparison of Lattice-free Discriminative Training Criteria for Purely Sequence-trained Neural Network Acoustic Models.
Proceedings of the IEEE International Conference on Acoustics, 2019
Quasi-fully Convolutional Neural Network with Variational Inference for Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2019
Learning Discriminative Features in Sequence Training without Requiring Framewise Labelled Data.
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Component Fusion: Learning Replaceable Language Model Component for End-to-end Speech Recognition System.
Proceedings of the IEEE International Conference on Acoustics, 2019
Boundary Discriminative Large Margin Cosine Loss for Text-independent Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019
Proceedings of the 23rd Conference on Computational Natural Language Learning, 2019
Proceedings of the 23rd Conference on Computational Natural Language Learning, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Syllable-Dependent Discriminative Learning for Small Footprint Text-Dependent Speaker Verification.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Alleviate Cross-chunk Permutation through Chunk-level Speaker Embedding for Blind Speech Separation.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019
Proceedings of the 2nd Workshop on Machine Reading for Question Answering, 2019
2018
Speech Commun., 2018
Frontiers Inf. Technol. Electron. Eng., 2018
An Exploration of Directly Using Word as ACOUSTIC Modeling Unit for Speech Recognition.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Deep Extractor Network for Target Speaker Recovery from Single Channel Speech Mixtures.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Monaural Multi-Talker Speech Recognition with Attention Mechanism and Gated Convolutional Networks.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Adaptive Permutation Invariant Training with Auxiliary Information for Monaural Multi-Talker Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Knowledge Transfer in Permutation Invariant Training for Single-Channel Multi-Talker Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018
2017
IEEE ACM Trans. Audio Speech Lang. Process., 2017
Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2017
IEEE CAA J. Autom. Sinica, 2017
Multi-talker Speech Separation and Tracing with Permutation Invariant Training of Deep Recurrent Neural Networks.
CoRR, 2017
Joint separation and denoising of noisy multi-talker speech using recurrent neural networks and permutation invariant training.
Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing, 2017
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Permutation invariant training of deep models for speaker-independent multi-talker speech separation.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017
Discriminative Beamforming with Phase-Aware Neural Networks for Speech Enhancement and Recognition.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017
2016
Neural Network Based Multi-Factor Aware Joint Training for Robust Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2016
Proceedings of the NAACL HLT 2016, 2016
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Prediction-adaptation-correction recurrent neural networks for low-resource language speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Integrated adaptation with multi-factor joint-learning for far-field speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
2015
IEEE ACM Trans. Audio Speech Lang. Process., 2015
IEEE ACM Trans. Audio Speech Lang. Process., 2015
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Improving speech recognition in reverberation using a room-aware deep neural network and multi-task learning.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015
2014
IEEE ACM Trans. Audio Speech Lang. Process., 2014
A fast maximum likelihood nonlinear feature transformation method for GMM-HMM speaker adaptation.
Neurocomputing, 2014
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014
An introduction to computational networks and the computational network toolkit (invited talk).
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Multi-accent deep neural network acoustic model with accent-specific top layer using the KLD-regularized model adaptation.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
A comparative analytic study on the Gaussian mixture and context dependent deep neural network hidden Markov models.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network.
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
2013
The Deep Tensor Neural Network With Applications to Large Vocabulary Speech Recognition.
IEEE Trans. Speech Audio Process., 2013
Modeling Spectral Envelopes Using Restricted Boltzmann Machines and Deep Belief Networks for Statistical Parametric Speech Synthesis.
IEEE Trans. Speech Audio Process., 2013
IEEE Signal Process. Lett., 2013
Neurocomputing, 2013
Proceedings of the 1st International Conference on Learning Representations, 2013
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Semi-supervised GMM and DNN acoustic model training with multi-system combination and confidence re-calibration.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Exploring convolutional neural network structures and optimization techniques for speech recognition.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013
Error back propagation for sequence training of Context-Dependent Deep NetworkS for conversational speech transcription.
Proceedings of the IEEE International Conference on Acoustics, 2013
Proceedings of the IEEE International Conference on Acoustics, 2013
Modeling spectral envelopes using restricted Boltzmann machines for statistical parametric speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2013
Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers.
Proceedings of the IEEE International Conference on Acoustics, 2013
Proceedings of the IEEE International Conference on Acoustics, 2013
A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion.
Proceedings of the IEEE International Conference on Acoustics, 2013
Proceedings of the IEEE International Conference on Acoustics, 2013
2012
Introduction to the Special Section on Deep Learning for Speech and Language Processing.
IEEE Trans. Speech Audio Process., 2012
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition.
IEEE Trans. Speech Audio Process., 2012
Pattern Recognit. Lett., 2012
Adaptation of context-dependent deep neural networks for automatic speech recognition.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012
Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Exploiting sparseness in deep neural networks for large vocabulary speech recognition.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
A deep architecture with bilinear modeling of hidden representations: Applications to phonetic recognition.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
2011
IEEE ACM Trans. Audio Speech Lang. Process., 2011
Deep Learning and Its Applications to Signal and Information Processing [Exploratory DSP].
IEEE Signal Process. Mag., 2011
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Proceedings of the IEEE International Conference on Acoustics, 2011
Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011
2010
IEEE J. Sel. Top. Signal Process., 2010
Active learning and semi-supervised learning for speech recognition: A unified framework using the global entropy reduction maximization criterion.
Comput. Speech Lang., 2010
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Investigation of full-sequence training of deep belief networks for speech recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Word confidence calibration using a maximum entropy model with constraints on confidence and word distributions.
Proceedings of the IEEE International Conference on Acoustics, 2010
Proceedings of the IEEE International Conference on Acoustics, 2010
Proceedings of the IEEE International Conference on Acoustics, 2010
2009
A Novel Framework and Training Algorithm for Variable-Parameter Hidden Markov Models.
IEEE Trans. Speech Audio Process., 2009
IEEE Signal Process. Mag., 2009
Pattern Recognit. Lett., 2009
A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions.
Comput. Speech Lang., 2009
Hidden conditional random field with distribution constraints for phone classification.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
Proceedings of the IEEE International Conference on Acoustics, 2009
Discriminative pronounciation learning using phonetic decoder and minimum-classification-error criterion.
Proceedings of the IEEE International Conference on Acoustics, 2009
Proceedings of the IEEE International Conference on Acoustics, 2009
Proceedings of the IEEE International Conference on Acoustics, 2009
Proceedings of the IEEE International Conference on Acoustics, 2009
2008
Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor.
IEEE Trans. Speech Audio Process., 2008
IEEE Trans. Speech Audio Process., 2008
Large-margin minimum classification error training: A theoretical risk minimization perspective.
Comput. Speech Lang., 2008
Improvements on Mel-Frequency Cepstrum Minimum-Mean-Square-Error Noise Suppressor for Robust Speech Recognition.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Parameter clustering and sharing in variable-parameter HMMs for noise robust speech recognition.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
Discriminative training of variable-parameter HMMs for noise robust speech recognition.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
A minimum-mean-square-error noise reduction algorithm on Mel-frequency cepstra for robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008
Proceedings of the IEEE International Conference on Acoustics, 2008
HMM adaptation using a phase-sensitive acoustic distortion model for environment-robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008
2007
Speaker-adaptive learning of resonance targets in a hidden trajectory model of speech coarticulation.
Comput. Speech Lang., 2007
Improving the quality of alerts and predicting intruder's next goal with Hidden Colored Petri-Net.
Comput. Networks, 2007
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue, 2007
Proceedings of the First IEEE International Conference on Semantic Computing (ICSC 2007), 2007
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Handling phonetic context and speaker variation in a structure-based speech recognizer.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Large-Margin Minimum Classification Error Training for Large-Scale Speech Recognition Tasks.
Proceedings of the IEEE International Conference on Acoustics, 2007
A Discriminative Training Framework using N-Best Speech Recognition Transcriptions and Scores for Spoken Utterance Classification.
Proceedings of the IEEE International Conference on Acoustics, 2007
Use of Differential Cepstra as Acoustic Features in Hidden Trajectory Modeling for Phonetic Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2007
High-performance hmm adaptation with joint compensation of additive and convolutive distortions via Vector Taylor Series.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007
2006
A bidirectional target-filtering model of speech coarticulation and reduction: two-stage implementation for phonetic recognition.
IEEE Trans. Speech Audio Process., 2006
A lattice search technique for a long-contextual-span hidden trajectory model of speech.
Speech Commun., 2006
An effective and efficient utterance verification technology using word n-gram filler models.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
Use of incrementally regulated discriminative margins in MCE training for speech recognition.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
A time-synchronous phonetic decoder for a long-contextual-Span hidden trajectory model.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006
2005
J. VLSI Signal Process., 2005
Semiautomatic Improvements of System-Initiative Spoken Dialog Applications Using Interactive Clustering.
IEEE Trans. Speech Audio Process., 2005
Evaluation of a long-contextual-Span hidden trajectory model and phonetic recognizer using a* lattice search.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005
Learning statistically characterized resonance targets in a hidden trajectory model of speech coarticulation and reduction.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005
A Hidden Trajectory Model with Bi-directional Target-Filtering: Cascaded vs. Integrated Implementation for Phonetic Recognition.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005
Alert confidence fusion in intrusion detection systems with extended Dempster-Shafer theory.
Proceedings of the 43nd Annual Southeast Regional Conference, 2005
2004
Proceedings of the 8th International Conference on Spoken Language Processing, 2004
Proceedings of the 37th Hawaii International Conference on System Sciences (HICSS-37 2004), 2004
Proceedings of the Applied Cryptography and Network Security, 2004
2003
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003