Lei Xie
Orcid: 0000-0001-8234-0823Affiliations:
- Northwestern Polytechnical University, School of Computer Science, Xi'an, China
- The Chinese University of Hong Kong, Department of Systems Engineering and Engineering Management, Hong Kong (2006 - 2007)
- City University of Hong Kong, School of Creative Media, Hong Kong (2004 - 2006)
- Northwestern Polytechnical University, Xi'an, China (PhD 2004)
- Vrije Universiteit Brussel, Department of Electronics and Information Processing, Belgium (2001 - 2002)
According to our database1,
Lei Xie
authored at least 405 papers
between 2005 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
METTS: Multilingual Emotional Text-to-Speech by Cross-Speaker and Cross-Lingual Emotion Transfer.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Distinctive and Natural Speaker Anonymization via Singular Value Transformation-Assisted Matrix.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Conversational Speech Recognition by Learning Audio-Textual Cross-Modal Contextual Representation.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
U-Style: Cascading U-Nets With Multi-Level Speaker and Style Modeling for Zero-Shot Voice Cloning.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
IEEE Signal Process. Lett., 2024
MMGER: Multi-Modal and Multi-Granularity Generative Error Correction With LLM for Joint Accent and Speech Recognition.
IEEE Signal Process. Lett., 2024
Distil-DCCRN: A Small-Footprint DCCRN Leveraging Feature-Based Knowledge Distillation in Speech Enhancement.
IEEE Signal Process. Lett., 2024
Speech Commun., 2024
Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling.
CoRR, 2024
Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text.
CoRR, 2024
Optimizing Dysarthria Wake-Up Word Spotting: An End-to-End Approach for SLT 2024 LRDWWS Challenge.
CoRR, 2024
Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge.
CoRR, 2024
CoRR, 2024
CoRR, 2024
CoRR, 2024
Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study.
CoRR, 2024
Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy.
CoRR, 2024
RaD-Net 2: A causal two-stage repairing and denoising speech enhancement network with knowledge distillation and complex axial self-attention.
CoRR, 2024
AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection.
CoRR, 2024
CoRR, 2024
MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition.
CoRR, 2024
CoRR, 2024
CoRR, 2024
Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual Alignment.
Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, 2024
UniStyle: Unified Style Modeling for Speaking Style Captioning and Stylistic Speech Synthesis.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Boosting Multi-Speaker Expressive Speech Synthesis with Semi-Supervised Contrastive Learning.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024
SSHR: Leveraging Self-supervised Hierarchical Representations for Multilingual Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024
Bs-Plcnet: Band-Split Packet Loss Concealment Network with Multi-Task Learning Framework and Multi-Discriminators.
Proceedings of the IEEE International Conference on Acoustics, 2024
Promptvc: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2024
Dualvc 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2024
Automatic Channel Selection and Spatial Feature Integration for Multi-Channel Speech Recognition Across Various Array Topologies.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
An Audio-Quality-Based Multi-Strategy Approach For Target Speaker Extraction in the Misp 2023 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2024
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
2023
A novel evolutionary algorithm inspired from triangle search and its applications on parameters identification of photovoltaic models.
Soft Comput., October, 2023
Neural Networks, January, 2023
Look&listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement.
IEEE Trans. Multim., 2023
IEEE Trans. Multim., 2023
IEEE ACM Trans. Audio Speech Lang. Process., 2023
MSM-VC: High-Fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-Scale Style Modeling.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
DiCLET-TTS: Diffusion Model Based Cross-Lingual Emotion Transfer for Text-to-Speech - A Study Between English and Mandarin.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
IEEE Signal Process. Lett., 2023
Joint Training or Not: An Exploration of Pre-trained Speech Models in Audio-Visual Speaker Diarization.
CoRR, 2023
CoRR, 2023
CoRR, 2023
CoRR, 2023
Multi-level Temporal-channel Speaker Retrieval for Robust Zero-shot Voice Conversion.
CoRR, 2023
CoRR, 2023
The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task.
Proceedings of the 20th International Conference on Spoken Language Translation, 2023
VISinger2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Two Stage Contextual Word Filtering for Context Bias in Unified Streaming and Non-streaming Transducer.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
TranUSR: Phoneme-to-word Transcoder Based Unified Speech Representation Learning for Cross-lingual Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
DCCRN-KWS: An Audio Bias Based Model for Noise Robust Small-Footprint Keyword Spotting.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Pseudo-Siamese Network based Timbre-reserved Black-box Adversarial Attack in Speaker Identification.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Distance-Based Weight Transfer for Fine-Tuning From Near-Field to Far-Field Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Distinguishable Speaker Anonymization Based on Formant and Fundamental Frequency Scaling.
Proceedings of the IEEE International Conference on Acoustics, 2023
Preserving Background Sound in Noise-Robust Voice Conversion Via Multi-Task Learning.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Delivering Speaking Style in Low-Resource Voice Conversion with Multi-Factor Constraints.
Proceedings of the IEEE International Conference on Acoustics, 2023
DSPGAN: A Gan-Based Universal Vocoder for High-Fidelity TTS by Time-Frequency Domain Supervision from DSP.
Proceedings of the IEEE International Conference on Acoustics, 2023
Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023
Proceedings of the 18th Blizzard Challenge Workshop, Grenoble, France, August 29, 2023, 2023
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
An Exploration of Task-Decoupling on Two-Stage Neural Post Filter for Real-Time Personalized Acoustic Echo Cancellation.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
HIGNN-TTS: Hierarchical Prosody Modeling With Graph Neural Networks for Expressive Long-Form TTS.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
2022
ParaTTS: Learning Linguistic and Prosodic Cross-Sentence Information in Paragraph-Based TTS.
IEEE ACM Trans. Audio Speech Lang. Process., 2022
IEEE ACM Trans. Audio Speech Lang. Process., 2022
MsEmoTTS: Multi-Scale Emotion Transfer, Prediction, and Control for Emotional Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2022
IEEE ACM Trans. Audio Speech Lang. Process., 2022
Cross-Speaker Emotion Transfer Through Information Perturbation in Emotional Speech Synthesis.
IEEE Signal Process. Lett., 2022
Improving data augmentation for low resource speech-to-text translation with diverse paraphrasing.
Neural Networks, 2022
Two-stage streaming keyword detection and localization with multi-scale depthwise temporal convolution.
Neural Networks, 2022
Neural Networks, 2022
MSV Challenge 2022: NPU-HC Speaker Verification System for Low-resource Indian Languages.
CoRR, 2022
VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer.
CoRR, 2022
CoRR, 2022
MFCCA: Multi-Frame Cross-Channel attention for multi-speaker ASR in Multi-party meeting scenario.
CoRR, 2022
An Audio-Visual Attention Based Multimodal Network for Fake Talking Face Videos Detection.
CoRR, 2022
CoRR, 2022
Audio-visual speech separation based on joint feature representation with cross-modal attention.
CoRR, 2022
IQDUBBING: Prosody modeling based on discrete self-supervised speech representation for expressive voice conversion.
CoRR, 2022
MFCCA:Multi-Frame Cross-Channel Attention for Multi-Speaker ASR in Multi-Party Meeting Scenario.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
Spatial-DCCRN: DCCRN Equipped with Frame-Level Angle Feature and Hybrid Filtering for Multi-Channel Speech Enhancement.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
The ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC): Dataset, Tracks, Baseline and Results.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
AccentSpeech: Learning Accent from Crowd-sourced Data for Target Speaker TTS with Accents.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Multi-speaker Multi-style Text-to-speech Synthesis with Single-speaker Single-style Training Data Scenarios.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
CaTT-KWS: A Multi-stage Customized Keyword Spotting Framework based on Cascaded Transducer-Transformer.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2022
Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
S-DCCRN: Super Wide Band DCCRN with Learnable Complex Feature for Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2022
TEA-PSE: Tencent-Ethereal-Audio-Lab Personalized Speech Enhancement System for ICASSP 2022 DNS Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022
Uformer: A Unet Based Dilated Complex & Real Dual-Path Conformer Network for Simultaneous Speech Enhancement and Dereverberation.
Proceedings of the IEEE International Conference on Acoustics, 2022
2021
LET-Decoder: A WFST-Based Lazy-Evaluation Token-Group Decoder With Exact Lattice Generation.
IEEE Signal Process. Lett., 2021
An Improved Equilibrium Optimizer with Application in Unmanned Aerial Vehicle Path Planning.
Sensors, 2021
Neural Networks, 2021
Effective and direct control of neural TTS prosody by removing interactions between different attributes.
Neural Networks, 2021
CoRR, 2021
CoRR, 2021
Improving robustness of one-shot voice conversion with deep discriminative speaker encoder.
CoRR, 2021
Efficient Conformer with Prob-Sparse Attention Mechanism for End-to-EndSpeech Recognition.
CoRR, 2021
INTERSPEECH 2021 ConferencingSpeech Challenge: Towards Far-field Multi-Channel Speech Enhancement for Video Conferencing.
CoRR, 2021
CoRR, 2021
Tuna Swarm Optimization: A Novel Swarm-Based Metaheuristic Algorithm for Global Optimization.
Comput. Intell. Neurosci., 2021
Comput. Intell. Neurosci., 2021
Comput. Intell. Neurosci., 2021
IEEE Access, 2021
IEEE Access, 2021
The SLT 2021 Children Speech Recognition Challenge: Open Datasets, Rules and Baselines.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Learn2Sing: Target Speaker Singing Voice Synthesis by Learning from a Singing Teacher.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Cascade RNN-Transducer: Syllable Based Streaming On-Device Mandarin Speech Recognition with a Syllable-To-Character Converter.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Fine-Grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
IEEE SLT 2021 Alpha-Mini Speech Challenge: Open Datasets, Tracks, Rules and Baselines.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
DESNet: A Multi-Channel Network for Simultaneous Speech Dereverberation, Enhancement and Separation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
F-T-LSTM Based Complex Network for Joint Acoustic Echo Cancellation and Speech Enhancement.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
WeNet: Production Oriented Streaming and Non-Streaming End-to-End Speech Recognition Toolkit.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Enriching Source Style Transfer in Recognition-Synthesis Based Non-Parallel Voice Conversion.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Efficient Conformer with Prob-Sparse Attention Mechanism for End-to-End Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Glow-WaveGAN: Learning Speech Representations from GAN-Based Variational Auto-Encoder for High Fidelity Flow-Based Speech Synthesis.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-End Neural TTS.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the ICMI '21 Companion: Companion Publication of the 2021 International Conference on Multimodal Interaction, Montreal, QC, Canada, October 18, 2021
Proceedings of the ICMI '21 Companion: Companion Publication of the 2021 International Conference on Multimodal Interaction, Montreal, QC, Canada, October 18, 2021
Proceedings of the ICMI '21: International Conference on Multimodal Interaction, 2021
Proceedings of the ICMI '21 Companion: Companion Publication of the 2021 International Conference on Multimodal Interaction, Montreal, QC, Canada, October 18, 2021
Proceedings of the ICMI '21 Companion: Companion Publication of the 2021 International Conference on Multimodal Interaction, Montreal, QC, Canada, October 18, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Duality Temporal-Channel-Frequency Attention Enhanced Speaker Representation Learning.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
Conferencingspeech Challenge: Towards Far-Field Multi-Channel Speech Enhancement for Video Conferencing.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021
2020
IEEE Trans. Emerg. Top. Comput. Intell., 2020
IEEE ACM Trans. Audio Speech Lang. Process., 2020
ACM Trans. Asian Low Resour. Lang. Inf. Process., 2020
Adversarial Feature Learning and Unsupervised Clustering Based Speech Synthesis for Found Data With Acoustic and Textual Noise.
IEEE Signal Process. Lett., 2020
Neural Networks, 2020
Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition.
CoRR, 2020
Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training.
CoRR, 2020
A flight maneuver recognition method based on multi-strategy affine canonical time warping.
Appl. Soft Comput., 2020
Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
NPU Speaker Verification System for INTERSPEECH 2020 Far-Field Speaker Verification Challenge.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Channel-Wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020
2019
IEEE ACM Trans. Audio Speech Lang. Process., 2019
IEEE Signal Process. Lett., 2019
Pre-Alignment Guided Attention for Improving Training Efficiency and Model Stability in End-to-End Speech Synthesis.
IEEE Access, 2019
Query-by-Example Speech Search Using Recurrent Neural Acoustic Word Embeddings With Temporal Context.
IEEE Access, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Unsupervised Adaptation with Adversarial Dropout Regularization for Robust Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the International Conference on Multimodal Interaction, 2019
Robust Audio-visual Speech Recognition Using Bimodal Dfsmn with Multi-condition Training and Dropout Regularization.
Proceedings of the IEEE International Conference on Acoustics, 2019
Enhancing Hybrid Self-attention Structure with Relative-position-aware Bias for Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Adversarial Examples for Improving End-to-end Attention-based Small-footprint Keyword Spotting.
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Component Fusion: Learning Replaceable Language Model Component for End-to-end Speech Recognition System.
Proceedings of the IEEE International Conference on Acoustics, 2019
Domain Adversarial Training for Improving Keyword Spotting Performance of ESL Speech.
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the Blizzard Challenge 2019, Vienna, Austria, September 23, 2019, 2019
Proceedings of the Blizzard Challenge 2019, Vienna, Austria, September 23, 2019, 2019
Controlling Emotion Strength with Relative Attribute for End-to-End Speech Synthesis.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Improving Mandarin End-to-End Speech Synthesis by Self-Attention and Learnable Gaussian Bias.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Learning Hierarchical Representations for Expressive Speaking Style in End-to-End Speech Synthesis.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Multiple fixed beamformers with a spacial Wiener-form postfilter for far-field speech recognition.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
2018
J. Signal Process. Syst., 2018
J. Signal Process. Syst., 2018
Signal Process., 2018
Unsupervised measure of Chinese lexical semantic similarity using correlated graph model for news story segmentation.
Neurocomputing, 2018
ASMMC-MMAC 2018: The Joint Workshop of 4th the Workshop on Affective Social Multimedia Computing and first Multi-Modal Affective Computing of Large-Scale Multimedia Data Workshop.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018
A Refined Query-by-Example Approach to Spoken-Term-Detection on ESL learners' Speech.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Investigating Generative Adversarial Networks Based Speech Dereverberation for Robust Speech Recognition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Unsupervised Domain Adaptation via Domain Adversarial Training for Speaker Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the Blizzard Challenge 2018, Hyderabad, India, September 8, 2018, 2018
Proceedings of the Advances in Brain Inspired Cognitive Systems, 2018
2017
Modeling Latent Topics and Temporal Distance for Story Segmentation of Broadcast News.
IEEE ACM Trans. Audio Speech Lang. Process., 2017
IEEE J. Sel. Top. Signal Process., 2017
J. Ambient Intell. Humaniz. Comput., 2017
A hybrid neural network hidden Markov model approach for automatic story segmentation.
J. Ambient Intell. Humaniz. Comput., 2017
J. Ambient Intell. Humaniz. Comput., 2017
Neurocomputing, 2017
Frontiers Comput. Sci., 2017
Frontiers Comput. Sci., 2017
Denoising Recurrent Neural Network for Deep Bidirectional LSTM Based Voice Conversion.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Pairwise learning using multi-lingual bottleneck features for low-resource query-by-example spoken term detection.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Proceedings of the Blizzard Challenge 2017, Stockholm, Sweden, August 25, 2017, 2017
Extracting bottleneck features and word-like pairs from untranscribed speech for feature representation.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017
Statistical parametric speech synthesis using generative adversarial networks under a multi-task learning framework.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017
2016
Real-time tracking-by-learning with high-order regularization fusion for big video abstraction.
Signal Process., 2016
Multim. Tools Appl., 2016
Deformable object tracking with spatiotemporal segmentation in big vision surveillance.
Neurocomputing, 2016
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016
An Automatic Voice Conversion Evaluation Strategy Based on Perceptual Background Noise Distortion and Speaker Similarity.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016
Proceedings of the Working Notes Proceedings of the MediaEval 2016 Workshop, 2016
Investigating neural network based query-by-example keyword spotting approach for personalized wake-up word detection in Mandarin Chinese.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
Learning Neural Network Representations Using Cross-Lingual Bottleneck Features with Word-Pair Information.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Deep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice Conversion.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Toward High-Performance Language-Independent Query-by-Example Spoken Term Detection for MediaEval 2015: Post-Evaluation Analysis.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Unsupervised Bottleneck Features for Low-Resource Query-by-Example Spoken Term Detection.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Proceedings of the 2016 IEEE International Conference on Multimedia & Expo Workshops, 2016
Approximate search of audio queries by using DTW with phone time boundary and data augmentation.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the Blizzard Challenge 2016, Cuppertino, CA, USA, September 16, 2016, 2016
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
On the use of I-vectors and average voice model for voice conversion without parallel data.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
Predicting articulatory movement from text using deep architecture with stacked bottleneck features.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
2015
IEEE Trans. Multim., 2015
Multiple pedestrian tracking based on couple-states Markov chain with semantic topic learning for video surveillance.
Soft Comput., 2015
Soft Comput., 2015
Soft Comput., 2015
Multim. Tools Appl., 2015
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015
Proceedings of the Working Notes Proceedings of the MediaEval 2015 Workshop, 2015
Articulatory movement prediction using deep bidirectional long short-term memory based recurrent neural networks and word/phone embeddings.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Regularized non-negative matrix factorization using alternating direction method of multipliers and its application to source separation.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: a feasibility study.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Language independent query-by-example spoken term detection using N-best phone sequences and partial matching.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015
Non-negative matrix factorization using stable alternating direction method of multipliers for source separation.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015
A waveform representation framework for high-quality statistical parametric speech synthesis.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015
Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015
2014
Multim. Tools Appl., 2014
Multimodal joint information processing in human machine interaction: recent advances.
Multim. Tools Appl., 2014
Proceedings of the Working Notes Proceedings of the MediaEval 2014 Workshop, 2014
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Experimental study on dereverberation and noise reduction for distant speech recognition.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Stereo acoustic echo suppression using widely linear filtering in the frequency domain.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014
Unsupervised broadcast news story segmentation using distance dependent Chinese restaurant processes.
Proceedings of the IEEE International Conference on Acoustics, 2014
Sentence boundary detection in chinese broadcast news using conditional random fields and prosodic features.
Proceedings of the IEEE China Summit & International Conference on Signal and Information Processing, 2014
Proceedings of the IEEE China Summit & International Conference on Signal and Information Processing, 2014
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014
Multi-view features in a DNN-CRF model for improved sentence unit detection on English broadcast news.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014
2013
Proceedings of the IEEE International Conference on Acoustics, 2013
Proceedings of the IEEE International Conference on Acoustics, 2013
Measuring semantic similarity by contextualword connections in Chinese news story segmentation.
Proceedings of the IEEE International Conference on Acoustics, 2013
Proceedings of the IEEE International Conference on Acoustics, 2013
Numerical calculation of the head-related transfer functions with Chinese dummy head.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013
Context-dependent deep neural networks for commercial Mandarin speech recognition applications.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013
Broadcast News Story Segmentation Using Manifold Learning on Latent Topic Distributions.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013
2012
IEEE Trans. Speech Audio Process., 2012
Broadcast News Story Segmentation Using Conditional Random Fields and Multimodal Features.
IEICE Trans. Inf. Syst., 2012
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Speech Pattern Discovery using Audio-Visual Fusion and Canonical Correlation Analysis.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012
2011
Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news.
Multim. Syst., 2011
On the effectiveness of subwords for lexical cohesion based story segmentation of Chinese broadcast news.
Inf. Sci., 2011
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
2010
Inf. Sci., 2010
Speech and Auditory Interfaces for Ubiquitous, Immersive and Personalized Applications.
Proceedings of the Symposia and Workshops on Ubiquitous, 2010
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
2009
Audio-visual human recognition using semi-supervised spectral learning and hidden Markov models.
J. Vis. Lang. Comput., 2009
Noise robust features for speech/music discrimination in real-time telecommunication.
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009
A Subword Normalized Cut Approach to Automatic Story Segmentation of Chinese Broadcast News.
Proceedings of the Information Retrieval Technology, 2009
Proceedings of the Computer Vision, 2009
2008
Proceedings of the Advances in Multimedia Information Processing, 2008
Subword Latent Semantic Analysis for Texttiling-Based Automatic Story Segmentation of Chinese Broadcast News.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Proceedings of the Advanced Intelligent Computing Theories and Applications. With Aspects of Theoretical and Methodological Issues, 2008
Proceedings of the Information Retrieval Technology, 2008
2007
Realistic Mouth-Synching for Speech-Driven Talking Face Using Articulatory Modelling.
IEEE Trans. Multim., 2007
Combined Use of Speaker- and Tone-Normalized Pitch Reset with Pause Duration for Automatic Story Segmentation in Mandarin Broadcast News.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2007
Modeling the statistical behavior of lexical chains to capture word cohesiveness for automatic story segmentation.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
2006
Proceedings of the Web Information Systems, 2006
Proceedings of the IEEE International Conference on Systems, 2006
The SOMN-HMM Model and Its Application to Automatic Synthesis of 3D Character Animations.
Proceedings of the IEEE International Conference on Systems, 2006
Supervised Learning of Motion Style for Real-time Synthesis of 3D Character Animations.
Proceedings of the IEEE International Conference on Systems, 2006
A Cantonese Speech-Driven Talking Face Using Translingual Audio-to-Visual Conversion.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006
Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), 2006
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006
2005
Multi-stream Articulator Model with Adaptive Reliability Measure for Audio Visual Speech Recognition.
Proceedings of the Advances in Machine Learning and Cybernetics, 2005