Dong Yu

Orcid: 0000-0003-0520-6844

Affiliations:
  • Tencent AI Lab, China
  • Microsoft Research, Redmond, WA, USA (1998 - 2017)
  • University of Idaho, Moscow, ID, USA (PhD)


According to our database1, Dong Yu authored at least 471 papers between 2003 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Enhanced Acoustic Howling Suppression via Hybrid Kalman Filter and Deep Learning Models.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

SRC-gAudio: Sampling-Rate-Controlled Audio Generation.
CoRR, 2024

Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning.
CoRR, 2024

SePPO: Semi-Policy Preference Optimization for Diffusion Alignment.
CoRR, 2024

DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search.
CoRR, 2024

DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects.
CoRR, 2024

DeFine: Enhancing LLM Decision-Making with Factor Profiles and Analogical Reasoning.
CoRR, 2024

Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks.
CoRR, 2024

HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows.
CoRR, 2024

Video-to-Audio Generation with Fine-grained Temporal Semantics.
CoRR, 2024

Preference Alignment Improves Language Model-Based TTS.
CoRR, 2024

EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer.
CoRR, 2024

Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots.
CoRR, 2024

Towards Diverse and Efficient Audio Captioning via Diffusion Models.
CoRR, 2024

STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment.
CoRR, 2024

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?
CoRR, 2024

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis.
CoRR, 2024

Comparing Discrete and Continuous Space LLMs for Speech Recognition.
CoRR, 2024

Advancing Multi-talker ASR Performance with Large Language Models.
CoRR, 2024

SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models.
CoRR, 2024

DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems.
CoRR, 2024

Video-to-Audio Generation with Hidden Alignment.
CoRR, 2024

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning.
CoRR, 2024

LiteSearch: Efficacious Tree Search for LLM.
CoRR, 2024

Scaling Synthetic Data Creation with 1,000,000,000 Personas.
CoRR, 2024

Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning.
CoRR, 2024

MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions.
CoRR, 2024

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing.
CoRR, 2024

Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models.
CoRR, 2024

Conceptual and Unbiased Reasoning in Language Models.
CoRR, 2024

Self-Consistency Boosts Calibration for Math Reasoning.
CoRR, 2024

Can Large Language Models do Analytical Reasoning?
CoRR, 2024

Collaborative decoding of critical tokens for boosting factuality of large language models.
CoRR, 2024

Fine-Grained Self-Endorsement Improves Factuality and Reasoning.
CoRR, 2024

SPECTRUM: Speaker-Enhanced Pre-Training for Long Dialogue Summarization.
CoRR, 2024

From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Polarity Calibration for Opinion Summarization.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Prompt-guided Precise Audio Editing with Diffusion Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

The Trickle-down Impact of Reward Inconsistency on RLHF.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

uSee: Unified Speech Enhancement And Editing with Conditional Diffusion Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

SPATIALCODEC: Neural Spatial Speech Coding.
Proceedings of the IEEE International Conference on Acoustics, 2024

UniX-Encoder: A Universal X-Channel Speech Encoder for AD-HOC Microphone Array Speech Processing.
Proceedings of the IEEE International Conference on Acoustics, 2024

Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

When Reasoning Meets Information Aggregation: A Case Study with Sports Narratives.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Abstraction-of-Thought Makes Language Models Better Reasoners.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Skills-in-Context: Unlocking Compositionality in Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Dense X Retrieval: What Retrieval Granularity Should We Use?
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Inconsistent dialogue responses and how to recover from them.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2024, 2024

Event Semantic Classification in Context.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2024, 2024

MinT: Boosting Generalization in Mathematical Reasoning via Multi-view Fine-tuning.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

A Knowledge Plug-and-Play Test Bed for Open-domain Dialogue Generation.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

MM-LLMs: Recent Advances in MultiModal Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Improving LLM Generations via Fine-Grained Self-Endorsement.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

InFoBench: Evaluating Instruction Following Ability in Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

CLOMO: Counterfactual Logical Modification with Large Language Models.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

SportsMetrics: Blending Text and Numerical Data to Understand Information Fusion in LLMs.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Twenty-Five Years of Evolution in Speech and Language Processing.
IEEE Signal Process. Mag., July, 2023

Search-engine-augmented dialogue response generation with cheaply supervised query production.
Artif. Intell., June, 2023

Neural Target Speech Extraction: An overview.
IEEE Signal Process. Mag., May, 2023

Diffsound: Discrete Diffusion Model for Text-to-Sound Generation.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

D$^{2}$PSG: Multi-Party Dialogue Discourse Parsing as Sequence Generation.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Integrating Lattice-Free MMI Into End-to-End Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Unsupervised TTS Acoustic Modeling for TTS With Conditional Disentangled Sequential VAE.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

OpenFact: Factuality Enhanced Open Knowledge Extraction.
Trans. Assoc. Comput. Linguistics, 2023

Discover, Explain, Improve: An Automatic Slice Detection Benchmark for Natural Language Processing.
Trans. Assoc. Comput. Linguistics, 2023

Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention.
CoRR, 2023

Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models.
CoRR, 2023

TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for Human-Aligned LLMs.
CoRR, 2023

RIR-SF: Room Impulse Response Based Spatial Feature for Multi-channel Multi-talker ASR.
CoRR, 2023

A High Fidelity and Low Complexity Neural Audio Coding.
CoRR, 2023

The Trickle-down Impact of Reward (In-)consistency on RLHF.
CoRR, 2023

Stabilizing RLHF through Advantage Model and Selective Rehearsal.
CoRR, 2023

Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions.
CoRR, 2023

LASER: LLM Agent with State-Space Exploration for Web Navigation.
CoRR, 2023

M3-AUDIODEC: Multi-channel multi-speaker multi-spatial audio codec.
CoRR, 2023

Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models.
CoRR, 2023

A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation.
CoRR, 2023

Make-A-Voice: Unified Voice Synthesis With Discrete Representation.
CoRR, 2023

PIVOINE: Instruction Tuning for Open-world Information Extraction.
CoRR, 2023

Open-Domain Event Graph Induction for Mitigating Framing Bias.
CoRR, 2023

3D Neural Beamforming for Multi-channel Speech Separation Against Location Uncertainty.
CoRR, 2023

InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt.
CoRR, 2023

KalmanNet: A Learnable Kalman Filter for Acoustic Echo Cancellation.
CoRR, 2023

Thrust: Adaptively Propels Large Language Models with External Knowledge.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Compressed MoE ASR Model Based on Knowledge Distillation and Quantization.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Multi-mode Neural Speech Coding Based on Deep Generative Networks.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Bayes Risk Transducer: Transducer with Controllable Alignment Prediction.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Zoneformer: On-device Neural Beamformer For In-car Multi-zone Speech Separation, Enhancement and Echo Cancellation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Unsupervised Multi-document Summarization with Holistic Inference.
Proceedings of the Findings of the Association for Computational Linguistics: IJCNLP-AACL 2023, 2023

Bayes Risk CTC: Controllable CTC Alignment in Sequence-to-Sequence Tasks.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Deep Neural Mel-Subband Beamformer for in-Car Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Trinet: Stabilizing Self-Supervised Learning From Complete or Slow Collapse.
Proceedings of the IEEE International Conference on Acoustics, 2023

More Than Spoken Words: Nonverbal Message Extraction and Generation.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

On the Dimensionality of Sentence Embeddings.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

PIVOINE: Instruction Tuning for Open-world Entity Profiling.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Bridging Continuous and Discrete Spaces: Interpretable Sentence Representation Learning via Compositional Operations.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Friend-training: Learning from Models of Different but Related Tasks.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

How do Words Contribute to Sentence Semantics? Revisiting Sentence Embeddings with a Perturbation Method.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

Neuralkalman: A Learnable Kalman Filter for Acoustic Echo Cancellation.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Neuralecho: Hybrid of Full-Band and Sub-Band Recurrent Neural Network For Acoustic Echo Cancellation and Speech Enhancement.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

SafeConv: Explaining and Correcting Conversational Unsafe Behavior.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

OASum: Large-Scale Open Domain Aspect-based Summarization.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Going Beyond Sentence Embeddings: A Token-Level Matching Algorithm for Calculating Semantic Textual Similarity.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023

Bi-level Finetuning with Task-dependent Similarity Structure for Low-resource Training.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Prosody-TTS: Improving Prosody with Masked Autoencoder and Conditional Diffusion Model For Expressive Text-to-Speech.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Faithful Question Answering with Monte-Carlo Planning.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Generating User-Engaging News Headlines.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
High-Fidelity 3D Digital Human Head Creation from RGB-D Selfies.
ACM Trans. Graph., 2022

Improving Mandarin End-to-End Speech Recognition With Word N-Gram Language Model.
IEEE Signal Process. Lett., 2022

C3-DINO: Joint Contrastive and Non-Contrastive Self-Supervised Learning for Speaker Verification.
IEEE J. Sel. Top. Signal Process., 2022

Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition.
Comput. Speech Lang., 2022

An investigation of neural uncertainty estimation for target speaker extraction equipped RNN transducer.
Comput. Speech Lang., 2022

Discover, Explanation, Improvement: Automatic Slice Detection Framework for Natural Language Processing.
CoRR, 2022

Cross-Lingual Speaker Identification Using Distant Supervision.
CoRR, 2022

UTTS: Unsupervised TTS with Conditional Disentangled Sequential Variational Auto-encoder.
CoRR, 2022

LAE: Language-Aware Encoder for Monolingual and Multilingual ASR.
CoRR, 2022

NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement.
CoRR, 2022

Integrate Lattice-Free MMI into End-to-End Speech Recognition.
CoRR, 2022

DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs.
CoRR, 2022

Meta-learning without data via Wasserstein distributionally-robust model fusion.
Proceedings of the Uncertainty in Artificial Intelligence, 2022

Efficient Text Analysis with Pre-Trained Neural Network Models.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Progressive Contrastive Learning for Self-Supervised Text-Independent Speaker Verification.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

End-to-End Chinese Speaker Identification.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

DDAM '22: 1st International Workshop on Deepfake Detection for Audio Multimedia.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

LAE: Language-Aware Encoder for Monolingual and Multilingual ASR.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Towards Improved Zero-shot Voice Conversion with Conditional DSVAE.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Joint Neural AEC and Beamforming with Double-Talk Detection.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Automatic Prosody Annotation with Pre-Trained Text-Speech Model.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis.
Proceedings of the Tenth International Conference on Learning Representations, 2022

ZeroKBC: A Comprehensive Benchmark for Zero-Shot Knowledge Base Completion.
Proceedings of the IEEE International Conference on Data Mining Workshops, 2022

Towards end-to-end Speaker Diarization with Generalized Neural Speaker Clustering.
Proceedings of the IEEE International Conference on Acoustics, 2022

Speechmoe2: Mixture-of-Experts Model with Improved Routing.
Proceedings of the IEEE International Conference on Acoustics, 2022

Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization.
Proceedings of the IEEE International Conference on Acoustics, 2022

VCVTS: Multi-Speaker Video-to-Speech Synthesis Via Cross-Modal Knowledge Transfer from Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2022

Consistent Training and Decoding for End-to-End Speech Recognition Using Lattice-Free MMI.
Proceedings of the IEEE International Conference on Acoustics, 2022

Multi-Channel Multi-Speaker ASR Using 3D Spatial Feature.
Proceedings of the IEEE International Conference on Acoustics, 2022

Fast-Rir: Fast Neural Diffuse Room Impulse Response Generator.
Proceedings of the IEEE International Conference on Acoustics, 2022

DP-DWA: Dual-Path Dynamic Weight Attention Network With Streaming Dfsmn-San For Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Referee: Towards Reference-Free Cross-Speaker Style Transfer with Low-Quality Data for Expressive Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2022

Robust Disentangled Variational Speech Representation Learning for Zero-Shot Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2022

FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Efficient Zero-shot Event Extraction with Context-Definition Alignment.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Learning a Grammar Inducer from Massive Uncurated Instructional Videos.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Z-LaVI: Zero-Shot Language Solver Fueled by Visual Imagination.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Salience Allocation as Guidance for Abstractive Summarization.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

MetaLogic: Logical Reasoning Explanations with Fine-Grained Structure.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Toward Unifying Text Segmentation and Long Document Summarization.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Cross-lingual Text-to-SQL Semantic Parsing with Representation Mixup.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Learning-by-Narrating: Narrative Pre-Training for Zero-Shot Dialogue Comprehension.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2022

C-MORE: Pretraining to Answer Open-Domain Questions by Consulting Millions of References.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2022

Towards Abstractive Grounded Summarization of Podcast Transcripts.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Variational Graph Autoencoding as Cheap Supervision for AMR Coreference Resolution.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Improving Machine Reading Comprehension with Contextualized Commonsense Knowledge.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Hierarchical Context Tagging for Utterance Rewriting.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Multi-Channel Multi-Frame ADL-MVDR for Target Speech Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Conversational Semantic Role Labeling.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Complex Neural Spatial Filter: Enhancing Multi-Channel Target Speech Separation in Complex Domain.
IEEE Signal Process. Lett., 2021

Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning.
CoRR, 2021

Joint AEC AND Beamforming with Double-Talk Detection using RNN-Transformer.
CoRR, 2021

Bilateral Denoising Diffusion Models.
CoRR, 2021

Generalized RNN beamformer for target speech separation.
CoRR, 2021

WPD++: An Improved Neural Beamformer for Simultaneous Speech Separation and Dereverberation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Neural Mask based Multi-channel Convolutional Beamforming for Joint Dereverberation, Echo Cancellation and Denoising.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Distant Finetuning with Discourse Relations for Stance Classification.
Proceedings of the Natural Language Processing and Chinese Computing, 2021

Video-aided Unsupervised Grammar Induction.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Exploring Cross-lingual Singing Voice Synthesis Using Speech Data.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

MetricNet: Towards Improved Modeling For Non-Intrusive Speech Quality Assessment.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Generalized Spatio-Temporal RNN Beamformer for Target Speech Separation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

MIMO Self-Attentive RNN Beamformer for Multi-Speaker Speech Separation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Multi-Channel Speaker Verification for Single and Multi-Talker Speech.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Towards Robust Speaker Verification with Target Speaker Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2021

ADL-MVDR: All Deep Learning MVDR Beamformer for Target Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2021

Self-Supervised Text-Independent Speaker Verification Using Prototypical Momentum Contrastive Learning.
Proceedings of the IEEE International Conference on Acoustics, 2021

Contrastive Separative Coding for Self-Supervised Representation Learning.
Proceedings of the IEEE International Conference on Acoustics, 2021

Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization.
Proceedings of the IEEE International Conference on Acoustics, 2021

Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation.
Proceedings of the IEEE International Conference on Acoustics, 2021

Replay and Synthetic Speech Detection with Res2Net Architecture.
Proceedings of the IEEE International Conference on Acoustics, 2021

Sandglasset: A Light Multi-Granularity Self-Attentive Network for Time-Domain Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2021

Learned Transferable Architectures Can Surpass Hand-Designed Architectures for Large Scale Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Exophoric Pronoun Resolution in Dialogues with Topic Regularization.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Self-Teaching Machines to Read and Comprehend with Large-Scale Multi-Subject Question-Answering Data.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Instance-adaptive training with noise-robust losses against noisy labels.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

RAST: Domain-Robust Dialogue Rewriting as Sequence Tagging.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Latency-Controlled Neural Architecture Search for Streaming Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

3D Spatial Features for Multi-Channel Target Speech Separation.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Do Boat and Ocean Suggest Beach? Dialogue Summarization with External Knowledge.
Proceedings of the 3rd Conference on Automated Knowledge Base Construction, 2021

TexSmart: A System for Enhanced Natural Language Understanding.
Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

NaturalConv: A Chinese Dialogue Dataset Towards Multi-turn Topic-driven Conversation.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Tune-In: Training Under Negative Environments with Interference for Attention Networks Simulating Cocktail Party Effect.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
A Framework for Adapting DNN Speaker Embedding Across Languages.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension.
Trans. Assoc. Comput. Linguistics, 2020

On the localness modeling for the self-attention based end-to-end speech synthesis.
Neural Networks, 2020

Audio-Visual Speech Separation and Dereverberation With a Two-Stage Multimodal Network.
IEEE J. Sel. Top. Signal Process., 2020

Multi-Modal Multi-Channel Target Speech Separation.
IEEE J. Sel. Top. Signal Process., 2020

TexSmart: A Text Understanding System for Fine-Grained NER and Enhanced Semantic Analysis.
CoRR, 2020

Robust Dialogue Utterance Rewriting as Sequence Tagging.
CoRR, 2020

Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training.
CoRR, 2020

Automatic Summarization of Open-Domain Podcast Episodes.
CoRR, 2020

High-Fidelity 3D Digital Human Creation from RGB-D Selfies.
CoRR, 2020

On the Role of Conceptualization in Commonsense Knowledge Graph Construction.
CoRR, 2020

Automatic Summarization of Open-Domain Podcast Episodes.
Proceedings of the Twenty-Ninth Text REtrieval Conference, 2020

Building Digital Human.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

DurIAN-SC: Duration Informed Attention Network Based Singing Voice Conversion System.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Audio-Visual Multi-Channel Recognition of Overlapped Speech.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

DurIAN: Duration Informed Attention Network for Speech Synthesis.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Neural Spatio-Temporal Beamformer for Target Speech Separation.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Peking Opera Synthesis via Duration Informed Attention Network.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Transferring Source Style in Non-Parallel Voice Conversion.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-to-End Multi-Look Keyword Spotting.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Audio-Visual Recognition of Overlapped Speech for the LRS2 Dataset.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Dfsmn-San with Persistent Memory Model for Automatic Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Improving Reverberant Speech Training Using Diffuse Acoustic Simulation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Far-Field Location Guided Target Speech Extraction Using End-to-End Speech Recognition Objectives.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

End-To-End Accent Conversion Without Using Native Utterances.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Multi-Level Deep Neural Network Adaptation for Speaker Verification Using MMD and Consistency Regularization.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Mixup-breakdown: A Consistency Training Method for Improving Generalization of Speech Separation Models.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Speaker-Aware Target Speaker Enhancement by Jointly Learning with Speaker Embedding Extraction.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Integration of Multi-Look Beamformers for Multi-Channel Keyword Spotting.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A Random Gossip BMUF Process for Neural Language Modeling.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Enhancing End-to-End Multi-Channel Speech Separation Via Spatial Feature Learning.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Pitchnet: Unsupervised Singing Voice Conversion with Pitch Adversarial Network.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Semantic Role Labeling Guided Multi-turn Dialogue ReWriter.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Better Highlighting: Creating Sub-Sentence Summary Highlights.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Comprehensive Image Captioning via Scene Graph Decomposition.
Proceedings of the Computer Vision - ECCV 2020, 2020

The Tencent speech synthesis system for Blizzard Challenge 2020.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

Dialogue-Based Relation Extraction.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Towards Faithful Neural Table-to-Text Generation with Content-Matching Constraints.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

ZPR2: Joint Zero Pronoun Recovery and Resolution using Multi-Task Learning and BERT.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Structural Information Preserving for Graph-to-Text Generation.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Recurrent Chunking Mechanisms for Long-Text Machine Reading Comprehension.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Coordinated Reasoning for Cross-Lingual Knowledge Graph Alignment.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Joint Parsing and Generation for Abstractive Summarization.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Relation Extraction Exploiting Full Dependency Forests.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
DREAM: A Challenge Dataset and Models for Dialogue-Based Reading Comprehension.
Trans. Assoc. Comput. Linguistics, 2019

Erratum to: Past review, current progress, and challenges ahead on the cocktail party problem.
Frontiers Inf. Technol. Electron. Eng., 2019

Synthesising Expressiveness in Peking Opera via Duration Informed Attention Network.
CoRR, 2019

Learning Singing From Speech.
CoRR, 2019

A Unified Framework for Speech Separation.
CoRR, 2019

Multi-Document Summarization with Determinantal Point Processes and Contextualized Representations.
CoRR, 2019

Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention Networks.
CoRR, 2019

Improving Pre-Trained Multilingual Models with Vocabulary Expansion.
CoRR, 2019

DurIAN: Duration Informed Attention Network For Multimodal Synthesis.
CoRR, 2019

Maximizing Mutual Information for Tacotron.
CoRR, 2019

Learning Word Embeddings with Domain Awareness.
CoRR, 2019

End-to-End Multi-Channel Speech Separation.
CoRR, 2019

Probing Prior Knowledge Needed in Challenging Chinese Machine Reading Comprehension.
CoRR, 2019

Improving Question Answering with External Knowledge.
CoRR, 2019

Improving Machine Reading Comprehension with General Reading Strategies.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Improved Speaker-Dependent Separation for CHiME-5 Challenge.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Large Margin Training for Attention Based End-to-End Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Extract, Adapt and Recognize: An End-to-End Neural Network for Corrupted Monaural Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A Comprehensive Study of Speech Separation: Spectrogram vs Waveform Separation.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Unsupervised Neural Aspect Extraction with Sememes.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Unsupervised Speech Recognition via Segmental Empirical Output Distribution Matching.
Proceedings of the 7th International Conference on Learning Representations, 2019

A Fast and Accurate One-Stage Approach to Visual Grounding.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Seq2Seq Attentional Siamese Neural Networks for Text-dependent Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2019

Encrypted Speech Recognition Using Deep Polynomial Networks.
Proceedings of the IEEE International Conference on Acoustics, 2019

Teach an All-rounder with Experts in Different Domains.
Proceedings of the IEEE International Conference on Acoustics, 2019

Enhancing Hybrid Self-attention Structure with Relative-position-aware Bias for Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2019

Joint Training of Complex Ratio Mask Based Beamformer and Acoustic Model for Noise Robust Asr.
Proceedings of the IEEE International Conference on Acoustics, 2019

A Comparison of Lattice-free Discriminative Training Criteria for Purely Sequence-trained Neural Network Acoustic Models.
Proceedings of the IEEE International Conference on Acoustics, 2019

Quasi-fully Convolutional Neural Network with Variational Inference for Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2019

Learning Discriminative Features in Sequence Training without Requiring Framewise Labelled Data.
Proceedings of the IEEE International Conference on Acoustics, 2019

Token-wise Training for Attention Based End-to-end Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Investigating End-to-end Speech Recognition for Mandarin-english Code-switching.
Proceedings of the IEEE International Conference on Acoustics, 2019

Component Fusion: Learning Replaceable Language Model Component for End-to-end Speech Recognition System.
Proceedings of the IEEE International Conference on Acoustics, 2019

Boundary Discriminative Large Margin Cosine Loss for Text-independent Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2019

Multi-band PIT and Model Integration for Improved Multi-channel Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2019

Multiplex Word Embeddings for Selectional Preference Acquisition.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Evidence Sentence Extraction for Machine Reading Comprehension.
Proceedings of the 23rd Conference on Computational Natural Language Learning, 2019

Improving Pre-Trained Multilingual Model with Vocabulary Expansion.
Proceedings of the 23rd Conference on Computational Natural Language Learning, 2019

Improving Speech Enhancement with Phonetic Embedding Features.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Time Domain Audio Visual Speech Separation.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Syllable-Dependent Discriminative Learning for Small Footprint Text-Dependent Speaker Verification.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Alleviate Cross-chunk Permutation through Chunk-level Speaker Embedding for Blind Speech Separation.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Automatic Prosodic Structure Labeling using DNN-BGRU-CRF Hybrid Neural Network.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Prosodic Structure Prediction using Deep Self-attention Neural Network.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Knowledge-aware Pronoun Coreference Resolution.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Cross-lingual Knowledge Graph Alignment via Graph Matching Neural Network.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Improving Question Answering with External Knowledge.
Proceedings of the 2nd Workshop on Machine Reading for Question Answering, 2019

2018
Single-channel multi-talker speech recognition with permutation invariant training.
Speech Commun., 2018

Past review, current progress, and challenges ahead on the cocktail party problem.
Frontiers Inf. Technol. Electron. Eng., 2018

Recent Progresses in Deep Learning based Acoustic Models (Updated).
CoRR, 2018

An Exploration of Directly Using Word as ACOUSTIC Modeling Unit for Speech Recognition.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Improving Attention-Based End-to-End ASR Systems with Sequence-Based Loss Functions.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Speech Super-Resolution Using Parallel WaveNet.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

A Multistage Training Framework for Acoustic-to-Word Model.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Text-Dependent Speech Enhancement for Small-Footprint Robust Keyword Detection.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Deep Extractor Network for Target Speaker Recovery from Single Channel Speech Mixtures.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Deep Discriminative Embeddings for Duration Robust Speaker Verification.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Monaural Multi-Talker Speech Recognition with Attention Mechanism and Gated Convolutional Networks.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Adaptive Permutation Invariant Training with Auxiliary Information for Monaural Multi-Talker Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Knowledge Transfer in Permutation Invariant Training for Single-Channel Multi-Talker Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

XL-NBT: A Cross-lingual Neural Belief Tracking Framework.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

2017
Toward Human Parity in Conversational Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Recent progresses in deep learning based acoustic models.
IEEE CAA J. Autom. Sinica, 2017

Multi-talker Speech Separation and Tracing with Permutation Invariant Training of Deep Recurrent Neural Networks.
CoRR, 2017

Joint separation and denoising of noisy multi-talker speech using recurrent neural networks and permutation invariant training.
Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing, 2017

Recognizing Multi-Talker Speech with Permutation Invariant Training.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Empirical Evaluation of Parallel Training Algorithms on Acoustic Modeling.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Permutation invariant training of deep models for speaker-independent multi-talker speech separation.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

The microsoft 2016 conversational speech recognition system.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Advanced Recurrent Neural Networks for Automatic Speech Recognition.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

Discriminative Beamforming with Phase-Aware Neural Networks for Speech Enhancement and Recognition.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

Sequence-Discriminative Training of Neural Networks.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016
Neural Network Based Multi-Factor Aware Joint Training for Robust Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Achieving Human Parity in Conversational Speech Recognition.
CoRR, 2016

Recurrent Support Vector Machines For Slot Tagging In Spoken Language Understanding.
Proceedings of the NAACL HLT 2016, 2016

Deep Convolutional Neural Networks with Layer-Wise Context Expansion and Attention.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Highway long short-term memory RNNS for distant speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Prediction-adaptation-correction recurrent neural networks for low-resource language speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Deep beamforming networks for multi-channel speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Speaker-aware training of LSTM-RNNS for acoustic modelling.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Integrated adaptation with multi-factor joint-learning for far-field speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

An investigation into using parallel data for far-field speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

The Computational Network Toolkit [Best of the Web].
IEEE Signal Process. Mag., 2015

Speech recognition with prediction-adaptation-correction recurrent neural networks.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Improving speech recognition in reverberation using a room-aware deep neural network and multi-task learning.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Deep bi-directional recurrent networks over spectral windows.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014
Convolutional Neural Networks for Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

A fast maximum likelihood nonlinear feature transformation method for GMM-HMM speaker adaptation.
Neurocomputing, 2014

Deep Learning: Methods and Applications.
Found. Trends Signal Process., 2014

Spoken language understanding using long short-term memory neural networks.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

An introduction to computational networks and the computational network toolkit (invited talk).
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Multi-accent deep neural network acoustic model with accent-specific top layer using the KLD-regularized model adaptation.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A comparative analytic study on the Gaussian mixture and context dependent deep neural network hidden Markov models.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Speech emotion recognition using deep neural network and extreme learning machine.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Recurrent conditional random field for language understanding.
Proceedings of the IEEE International Conference on Acoustics, 2014

Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network.
Proceedings of the IEEE International Conference on Acoustics, 2014

Recurrent deep neural networks for robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

Single-channel mixed speech recognition using deep neural networks.
Proceedings of the IEEE International Conference on Acoustics, 2014

On parallelizability of stochastic gradient descent for speech DNNS.
Proceedings of the IEEE International Conference on Acoustics, 2014

Phone sequence modeling with recurrent neural networks.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013
The Deep Tensor Neural Network With Applications to Large Vocabulary Speech Recognition.
IEEE Trans. Speech Audio Process., 2013

Modeling Spectral Envelopes Using Restricted Boltzmann Machines and Deep Belief Networks for Statistical Parametric Speech Synthesis.
IEEE Trans. Speech Audio Process., 2013

Speech Recognition Using Long-Span Temporal Patterns in a Deep Network Model.
IEEE Signal Process. Lett., 2013

Tensor Deep Stacking Networks.
IEEE Trans. Pattern Anal. Mach. Intell., 2013

Exploiting deep neural networks for detection-based speech recognition.
Neurocomputing, 2013

Feature Learning in Deep Neural Networks - A Study on Speech Recognition Tasks
Proceedings of the 1st International Conference on Learning Representations, 2013

Recurrent neural networks for language understanding.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Semi-supervised GMM and DNN acoustic model training with multi-system combination and confidence re-calibration.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Deep segmental neural networks for speech recognition.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Exploring convolutional neural network structures and optimization techniques for speech recognition.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Error back propagation for sequence training of Context-Dependent Deep NetworkS for conversational speech transcription.
Proceedings of the IEEE International Conference on Acoustics, 2013

An investigation of deep neural networks for noise robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Modeling spectral envelopes using restricted Boltzmann machines for statistical parametric speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2013

Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers.
Proceedings of the IEEE International Conference on Acoustics, 2013

Recent advances in deep learning for speech research at Microsoft.
Proceedings of the IEEE International Conference on Acoustics, 2013

A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion.
Proceedings of the IEEE International Conference on Acoustics, 2013

Large-scale malware classification using random projections and neural networks.
Proceedings of the IEEE International Conference on Acoustics, 2013

2012
Introduction to the Special Section on Deep Learning for Speech and Language Processing.
IEEE Trans. Speech Audio Process., 2012

Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition.
IEEE Trans. Speech Audio Process., 2012

Efficient and effective algorithms for training single-hidden-layer neural networks.
Pattern Recognit. Lett., 2012

Adaptation of context-dependent deep neural networks for automatic speech recognition.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Context-dependent Deep Neural Networks for audio indexing of real-life data.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Large Vocabulary Speech Recognition Using Deep Tensor Neural Networks.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Parallel Training for Deep Stacking Networks.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Pipelined Back-Propagation for Context-Dependent Deep Neural Networks.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Exploiting sparseness in deep neural networks for large vocabulary speech recognition.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

A deep architecture with bilinear modeling of hidden representations: Applications to phonetic recognition.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Scalable stacking and learning for building deep architectures.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
Calibration of Confidence Measures in Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2011

Deep Learning and Its Applications to Signal and Information Processing [Exploratory DSP].
IEEE Signal Process. Mag., 2011

In-Car Media Search.
IEEE Signal Process. Mag., 2011

Improved Bottleneck Features Using Pretrained Deep Neural Networks.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Deep Convex Net: A Scalable Architecture for Speech Pattern Classification.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Accelerated Parallelizable Neural Network Learning Algorithm for Speech Recognition.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Conversational Speech Transcription Using Context-Dependent Deep Neural Networks.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Large vocabulary continuous speech recognition with context-dependent DBN-HMMS.
Proceedings of the IEEE International Conference on Acoustics, 2011

Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010
Sequential Labeling Using Deep-Structured Conditional Random Fields.
IEEE J. Sel. Top. Signal Process., 2010

Active learning and semi-supervised learning for speech recognition: A unified framework using the global entropy reduction maximization criterion.
Comput. Speech Lang., 2010

Deep-structured hidden conditional random fields for phonetic recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Investigation of full-sequence training of deep belief networks for speech recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Unscented transform with online distortion estimation for HMM adaptation.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Binary coding of speech spectrograms using a deep auto-encoder.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Word confidence calibration using a maximum entropy model with constraints on confidence and word distributions.
Proceedings of the IEEE International Conference on Acoustics, 2010

Language recognition using deep-structured conditional random fields.
Proceedings of the IEEE International Conference on Acoustics, 2010

Semantic confidence calibration for spoken dialog applications.
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
A Novel Framework and Training Algorithm for Variable-Parameter Hidden Markov Models.
IEEE Trans. Speech Audio Process., 2009

Solving nonlinear estimation problems using splines [Lecture Notes].
IEEE Signal Process. Mag., 2009

Using continuous features in the maximum entropy model.
Pattern Recognit. Lett., 2009

A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions.
Comput. Speech Lang., 2009

Hidden conditional random field with distribution constraints for phone classification.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Cross-lingual speech recognition under runtime resource constraints.
Proceedings of the IEEE International Conference on Acoustics, 2009

Discriminative pronounciation learning using phonetic decoder and minimum-classification-error criterion.
Proceedings of the IEEE International Conference on Acoustics, 2009

Maximizing global entropy reduction for active learning in speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2009

Using collective information in semi-supervised learning for speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2009

A study on multilingual acoustic modeling for large vocabulary ASR.
Proceedings of the IEEE International Conference on Acoustics, 2009

2008
Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor.
IEEE Trans. Speech Audio Process., 2008

An Integrative and Discriminative Technique for Spoken Utterance Classification.
IEEE Trans. Speech Audio Process., 2008

An introduction to voice search.
IEEE Signal Process. Mag., 2008

Large-margin minimum classification error training: A theoretical risk minimization perspective.
Comput. Speech Lang., 2008

Improvements on Mel-Frequency Cepstrum Minimum-Mean-Square-Error Noise Suppressor for Robust Speech Recognition.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Parameter clustering and sharing in variable-parameter HMMs for noise robust speech recognition.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Discriminative training of variable-parameter HMMs for noise robust speech recognition.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

A minimum-mean-square-error noise reduction algorithm on Mel-frequency cepstra for robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008

Adaptation of compressed HMM parameters for resource-constrained speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008

HMM adaptation using a phase-sensitive acoustic distortion model for environment-robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008

2007
Speaker-adaptive learning of resonance targets in a hidden trajectory model of speech coarticulation.
Comput. Speech Lang., 2007

Improving the quality of alerts and predicting intruder's next goal with Hidden Colored Petri-Net.
Comput. Networks, 2007

Commute UX: Telephone Dialog System for Location-based Services.
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue, 2007

Large-Margin Discriminative Training of Hidden Markov Models for Speech Recognition.
Proceedings of the First IEEE International Conference on Semantic Computing (ICSC 2007), 2007

The voice-rate dialog system for consumer ratings.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Automated directory assistance system - from theory to practice.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Handling phonetic context and speaker variation in a structure-based speech recognizer.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Confidence measures for voice search applications.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Voicepedia: towards speech-based access to unstructured information.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Large-Margin Minimum Classification Error Training for Large-Scale Speech Recognition Tasks.
Proceedings of the IEEE International Conference on Acoustics, 2007

A Discriminative Training Framework using N-Best Speech Recognition Transcriptions and Scores for Spoken Utterance Classification.
Proceedings of the IEEE International Conference on Acoustics, 2007

Use of Differential Cepstra as Acoustic Features in Hidden Trajectory Modeling for Phonetic Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2007

High-performance hmm adaptation with joint compensation of additive and convolutive distortions via Vector Taylor Series.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006
Structured speech modeling.
IEEE Trans. Speech Audio Process., 2006

A bidirectional target-filtering model of speech coarticulation and reduction: two-stage implementation for phonetic recognition.
IEEE Trans. Speech Audio Process., 2006

A lattice search technique for a long-contextual-span hidden trajectory model of speech.
Speech Commun., 2006

An effective and efficient utterance verification technology using word n-gram filler models.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Use of incrementally regulated discriminative margins in MCE training for speech recognition.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

A time-synchronous phonetic decoder for a long-contextual-Span hidden trajectory model.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

N-Gram Based Filler Model for Robust Grammar Authoring.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005
A Speech-Centric Perspective for Human-Computer Interface: A Case Study.
J. VLSI Signal Process., 2005

Semiautomatic Improvements of System-Initiative Spoken Dialog Applications Using Interactive Clustering.
IEEE Trans. Speech Audio Process., 2005

Evaluation of a long-contextual-Span hidden trajectory model and phonetic recognizer using a* lattice search.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Learning statistically characterized resonance targets in a hidden trajectory model of speech coarticulation and reduction.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Maximum Entropy Based Generic Filter for Language Model Adaptation.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

A Hidden Trajectory Model with Bi-directional Target-Filtering: Cascaded vs. Integrated Implementation for Phonetic Recognition.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Alert confidence fusion in intrusion detection systems with extended Dempster-Shafer theory.
Proceedings of the 43nd Annual Southeast Regional Conference, 2005

2004
Unsupervised learning from users' error correction in speech dictation.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Towards Survivable Intrusion Detection System .
Proceedings of the 37th Hawaii International Conference on System Sciences (HICSS-37 2004), 2004

A Novel Framework for Alert Correlation and Understanding.
Proceedings of the Applied Cryptography and Network Security, 2004

2003
Improved name recognition with user modeling.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003


  Loading...