Li-Rong Dai
Orcid: 0000-0002-0859-2827Affiliations:
- University of Science and Technology of China, National Engineering Laboratory for Speech and Language Information Processing, Hefei, China
According to our database1,
Li-Rong Dai
authored at least 352 papers
between 2004 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
Sketch-fusion: A gradient compression method with multi-layer fusion for communication-efficient distributed training.
J. Parallel Distributed Comput., March, 2024
MoESys: A Distributed and Efficient Mixture-of-Experts Training and Inference System for Internet Services.
IEEE Trans. Serv. Comput., 2024
VatLM: Visual-Audio-Text Pre-Training With Unified Masked Prediction for Speech Representation Learning.
IEEE Trans. Multim., 2024
IEEE ACM Trans. Audio Speech Lang. Process., 2024
IEEE Signal Process. Lett., 2024
LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion with Inference Acceleration via Latent Consistency Distillation.
CoRR, 2024
LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance.
CoRR, 2024
Exploring Semi-Supervised, Subcategory Classification and Subwords Alignment for Visual Wake Word Spotting.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
A Study of Multichannel Spatiotemporal Features and Knowledge Distillation on Robust Target Speaker Extraction.
Proceedings of the IEEE International Conference on Acoustics, 2024
Sifisinger: A High-Fidelity End-to-End Singing Voice Synthesizer Based on Source-Filter Model.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
Universal wavelength reuse mechanism for optical networks-on-chip based on a cooperative game.
J. Opt. Commun. Netw., June, 2023
A Joint Speech Enhancement and Self-Supervised Representation Learning Framework for Noise-Robust Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
SDW-SWF: Speech Distortion Weighted Single-Channel Wiener Filter for Noise Reduction.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
Energy-Efficient Sparsity-Driven Speech Enhancement in Wireless Acoustic Sensor Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
CoRR, 2023
A Speech Distortion Weighted Single-Channel Wiener Filter Based STFT-Domain Noise Reduction.
Proceedings of the IEEE Statistical Signal Processing Workshop, 2023
Handwritten Chemical Structure Image to Structure-Specific Markup Using Random Conditional Guided Decoder.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Proceedings of the 20th International Conference on Spoken Language Translation, 2023
Proceedings of the 20th International Conference on Spoken Language Translation, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Real-Time Causal Spectro-Temporal Voice Activity Detection Based on Convolutional Encoding and Residual Decoding.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Fine-tuning Audio Spectrogram Transformer with Task-aware Adapters for Sound Event Detection.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the Image and Graphics - 12th International Conference, 2023
Proceedings of the Image and Graphics - 12th International Conference, 2023
Proceedings of the Image and Graphics - 12th International Conference, 2023
Speech4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Robust Data2VEC: Noise-Robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
A Multi-Scale Feature Aggregation Based Lightweight Network for Audio-Visual Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2023
AST-SED: An Effective Sound Event Detection Method Based on Audio Spectrogram Transformer.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023
Learning Semantic Information from Machine Translation to Improve Speech-to-Text Translation.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023
2022
Frequency-Invariant Sensor Selection for MVDR Beamforming in Wireless Acoustic Sensor Networks.
IEEE Trans. Wirel. Commun., 2022
Pattern Recognit., 2022
Cross-Lingual Self-training to Learn Multilingual Representation for Low-Resource Speech Recognition.
Circuits Syst. Signal Process., 2022
CoRR, 2022
A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition.
CoRR, 2022
Proceedings of the 19th International Conference on Spoken Language Translation, 2022
External Text Based Data Augmentation for Low-Resource Speech Recognition in the Constrained Condition of OpenASR21 Challenge.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Differential Time-frequency Log-mel Spectrogram Features for Vision Transformer Based Infant Cry Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Class-Aware Distribution Alignment based Unsupervised Domain Adaptation for Speaker Verification.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
A Complementary Joint Training Approach Using Unpaired Speech and Text A Complementary Joint Training Approach Using Unpaired Speech and Text.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 26th International Conference on Pattern Recognition, 2022
An Experimental Comparison between Low-Resource Semi-Supervised and High-Resource Supervised Automatic Speech Recognition Models.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2022
Learning Contextually Fused Audio-Visual Representations For Audio-Visual Speech Recognition.
Proceedings of the 2022 IEEE International Conference on Image Processing, 2022
A Noise-Robust Self-Supervised Pre-Training Model Based Speech Representation Learning for Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Supervised and Self-Supervised Pretraining Based Covid-19 Detection Using Acoustic Breathing/Cough/Speech Signals.
Proceedings of the IEEE International Conference on Acoustics, 2022
Reference Microphone Selection and Low-Rank Approximation Based Multichannel Wiener Filter with Application to Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022
Self-Supervised Representation Learning for Unsupervised Anomalous Sound Detection Under Domain Shift.
Proceedings of the IEEE International Conference on Acoustics, 2022
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
2021
SRD: A Tree Structure Based Decoder for Online Handwritten Mathematical Expression Recognition.
IEEE Trans. Multim., 2021
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Sensor Selection for Relative Acoustic Transfer Function Steered Linearly-Constrained Beamformers.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Multi-Granularity Sequence Alignment Mapping for Encoder-Decoder Based End-to-End ASR.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Correlating subword articulation with lip shapes for embedding aware audio-visual speech enhancement.
Neural Networks, 2021
XLST: Cross-lingual Self-training to Learn Multilingual Representation for Low Resource Speech Recognition.
CoRR, 2021
Proceedings of the 18th International Conference on Spoken Language Translation, 2021
An Improved Wav2Vec 2.0 Pre-Training Approach Using Enhanced Local Dependency Modeling for Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
An Effective Mutual Mean Teaching Based Domain Adaptation Method for Sound Event Detection.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
A Weight Moving Average Based Alternate Decoupled Learning Algorithm for Long-Tailed Language Identification.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Automatic Lip-Reading with Hierarchical Pyramidal Convolution and Self-Attention for Image Sequences with No Word Boundaries.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
An Improved Mean Teacher Based Method for Large Scale Weakly Labeled Semi-Supervised Sound Event Detection.
Proceedings of the IEEE International Conference on Acoustics, 2021
An Effective Deep Embedding Learning Method Based on Dense-Residual Networks for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2021
TaLNet: Voice Reconstruction from Tongue and Lip Articulation with Transfer Learning from Text-to-Speech Synthesis.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
2020
Non-Parallel Sequence-to-Sequence Voice Conversion With Disentangled Linguistic and Speaker Representations.
IEEE ACM Trans. Audio Speech Lang. Process., 2020
Learning and Modeling Unit Embeddings Using Deep Neural Networks for Unit-Selection-Based Mandarin Speech Synthesis.
ACM Trans. Asian Low Resour. Lang. Inf. Process., 2020
Pattern Recognit., 2020
Segment boundary detection directed attention for online end-to-end speech recognition.
EURASIP J. Audio Speech Music. Process., 2020
Attentive batch normalization for lstm-based acoustic modeling of speech recognition.
CoRR, 2020
Effective Exploitation of Posterior Information for Attention-Based Speech Recognition.
IEEE Access, 2020
An Effective Perturbation Based Semi-Supervised Learning Method for Sound Event Detection.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Semi-Supervised End-to-End ASR via Teacher-Student Learning with Conditional Posterior Distribution.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
An Effective Speaker Recognition Method Based on Joint Identification and Verification Supervisions.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 37th International Conference on Machine Learning, 2020
Extracting Unit Embeddings Using Sequence-To-Sequence Acoustic Models for Unit Selection Speech Synthesis.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Task-Aware Mean Teacher Method for Large Scale Weakly Labeled Semi-Supervised Sound Event Detection.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
An Online Speaker-aware Speech Separation Approach Based on Time-domain Representation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
An Improved Deep Neural Network for Modeling Speaker Characteristics at Different Temporal Scales.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Attention-Based Gated Scaling Adaptive Acoustic Model for CTC-Based Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Voice Conversion by Cascading Automatic Speech Recognition and Text-to-Speech Synthesis with Prosody Transfer.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020
Non-Parallel Voice Conversion with Autoregressive Conversion Model and Duration Adjustment.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020
Attentive Fusion Enhanced Audio-Visual Encoding for Transformer Based Robust Speech Recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020
2019
Track, Attend, and Parse (TAP): An End-to-End Framework for Online Handwritten Mathematical Expression Recognition.
IEEE Trans. Multim., 2019
IEEE ACM Trans. Audio Speech Lang. Process., 2019
Listening and Grouping: An Online Autoregressive Approach for Monaural Speech Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2019
Deep Neural Network Embedding Learning with High-Order Statistics for Text-Independent Speaker Verification.
CoRR, 2019
Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Multi-Task Learning with High-Order Statistics for x-Vector Based Text-Independent Speaker Verification.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Verification System.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 4th International Conference on Multimedia Systems and Signal Processing, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
A Region Based Attention Method for Weakly Supervised Sound Event Detection and Classification.
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the Blizzard Challenge 2019, Vienna, Austria, September 23, 2019, 2019
Knowledge Distillation from Multilingual and Monolingual Teachers for End-to-End Multilingual Speech Recognition.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Speaker to Emotion: Domain Adaptation for Speech Emotion Recognition with Residual Adapters.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
2018
Improving the Decoding Efficiency of Deep Neural Network Acoustic Models by Cluster-Based Senone Selection.
J. Signal Process. Syst., 2018
A Multiobjective Learning and Ensembling Approach to High-Performance Speech Enhancement With Compact Neural Network Architectures.
IEEE ACM Trans. Audio Speech Lang. Process., 2018
Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension.
IEEE ACM Trans. Audio Speech Lang. Process., 2018
IEEE ACM Trans. Audio Speech Lang. Process., 2018
IEEE Signal Process. Lett., 2018
Articulatory-to-acoustic conversion using BLSTM-RNNs with augmented input representation.
Speech Commun., 2018
Circuits Syst. Signal Process., 2018
CoRR, 2018
CoRR, 2018
A Maximum Likelihood Approach to Masking-based Speech Enhancement Using Deep Neural Network.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Learning and Modeling Unit Embeddings for Improving HMM-based Unit Selection Speech Synthesis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Acoustic Modeling with Densely Connected Residual Network for Multichannel Speech Recognition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
An Attention Pooling Based Representation Learning Method for Speech Emotion Recognition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition.
Proceedings of the 24th International Conference on Pattern Recognition, 2018
Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition.
Proceedings of the 24th International Conference on Pattern Recognition, 2018
Radical Analysis Network for Zero-Shot Learning in Printed Chinese Character Recognition.
Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the Blizzard Challenge 2018, Hyderabad, India, September 8, 2018, 2018
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018
2017
IEEE ACM Trans. Audio Speech Lang. Process., 2017
A Gender Mixture Detection Approach to Unsupervised Single-Channel Speech Separation Based on Deep Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2017
A unified DNN approach to speaker-dependent simultaneous speech enhancement and speech separation in low SNR environments.
Speech Commun., 2017
Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition.
Pattern Recognit., 2017
Frontiers Inf. Technol. Electron. Eng., 2017
An information fusion framework with multi-channel feature concatenation and multi-perspective system combination for the deep-learning-based robust recognition of microphone array speech.
Comput. Speech Lang., 2017
CoRR, 2017
Exploring Question Understanding and Adaptation in Neural-Network-Based Question Answering.
CoRR, 2017
A Maximum Likelihood Approach to Deep Neural Network Based Nonlinear Spectral Mapping for Single-Channel Speech Separation.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
End-to-End Language Identification Using High-Order Utterance Representation with Bilinear Pooling.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
An investigation of high-resolution modeling units of deep neural networks for acoustic scene classification.
Proceedings of the 2017 International Joint Conference on Neural Networks, 2017
A GRU-Based Encoder-Decoder Approach with Attention for Online Handwritten Mathematical Expression Recognition.
Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, 2017
Extracting structural spectral features using what-where auto-encoders for statistical parametric speech synthesis.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Joint noise and mask aware training for DNN-based speech enhancement with SUB-band features.
Proceedings of the Hands-free Speech Communications and Microphone Arrays, 2017
Proceedings of the Hands-free Speech Communications and Microphone Arrays, 2017
Proceedings of the Blizzard Challenge 2017, Stockholm, Sweden, August 25, 2017, 2017
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017
Feedforward sequential memory networks based encoder-decoder model for machine translation.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017
2016
Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition.
J. Signal Process. Syst., 2016
J. Signal Process. Syst., 2016
A Regression Approach to Single-Channel Speech Separation Via High-Resolution Deep Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2016
Speech Commun., 2016
Hybrid Orthogonal Projection and Estimation (HOPE): A New Framework to Learn Neural Networks.
J. Mach. Learn. Res., 2016
Joint training of DNNs by incorporating an explicit dereverberation structure for distant speech recognition.
EURASIP J. Adv. Signal Process., 2016
Concept-to-Speech generation with knowledge sharing for acoustic modelling and utterance filtering.
Comput. Speech Lang., 2016
Proceedings of the 2016 Visual Communications and Image Processing, 2016
Improvements on Deep Bottleneck Network based I-Vector Representation for Spoken Language Identification.
Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016
LID-senone Extraction via Deep Neural Networks for End-to-End Language Identification.
Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016
Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, 2016
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
A speaker-dependent deep learning approach to joint speech separation and acoustic modeling for multi-talker automatic speech recognition.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
Mismatched training data enhancement for automatic recognition of children's speech using DNN-HMM.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
Cluster-based senone selection for the efficient calculation of deep neural network acoustic models.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
Learning FOFE based FNN-LMs with noise contrastive estimation and part-of-speech features.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Compact Feedforward Sequential Memory Networks for Large Vocabulary Continuous Speech Recognition.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Articulatory-to-Acoustic Conversion with Cascaded Prediction of Spectral and Excitation Features Using Neural Networks.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
The USTC System for Voice Conversion Challenge 2016: Neural Network Based Approaches for Spectrum, Aperiodicity and F<sub>0</sub> Conversion.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Modeling spectral envelopes using deep conditional restricted Boltzmann machines for statistical parametric speech synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Modulation spectrum compensation for HMM-based speech synthesis using line spectral pairs.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Compact convolutional neural network transfer learning for small-scale image classification.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Deep belief network-based post-filtering for statistical parametric speech synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Content-aware local variability vector for speaker verification with short utterance.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the Blizzard Challenge 2016, Cuppertino, CA, USA, September 16, 2016, 2016
Unsupervised single-channel speech separation via deep neural network for different gender mixtures.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
2015
State-Clustering Based Multiple Deep Neural Networks Modeling Approach for Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2015
IEEE ACM Trans. Audio Speech Lang. Process., 2015
Speech Commun., 2015
Feedforward Sequential Memory Networks: A New Structure to Learn Long-term Dependency.
CoRR, 2015
A Fixed-Size Encoding Method for Variable-Length Sequences with its Application to Neural Network Language Models.
CoRR, 2015
Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
High-resolution acoustic modeling and compact language modeling of language-universal speech attributes for spoken language identification.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Automatic phrase boundary labeling of speech synthesis database using context-dependent HMMs and n-gram prior distributions.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Writer adaptive feature extraction based on convolutional neural networks for online handwritten Chinese character recognition.
Proceedings of the 13th International Conference on Document Analysis and Recognition, 2015
Unsupervised speaker adaptation of deep neural network based on the combination of speaker codes and singular value decomposition for speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Speech Separation based on signal-noise-dependent deep neural networks for robust speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Joint training of front-end and back-end deep neural networks for robust speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Proceedings of the Latent Variable Analysis and Signal Separation, 2015
LIP movement generation using restricted Boltzmann machines for visual speech synthesis.
Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015
A unified speaker-dependent speech separation and enhancement system based on deep neural networks.
Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015
An information fusion approach to recognizing microphone array speech in the CHiME-3 challenge based on a deep learning framework.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015
The Fixed-Size Ordinally-Forgetting Encoding Method for Neural Network Language Models.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015
2014
Fast adaptation of deep neural network based on discriminant codes for speech recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2014
IEEE ACM Trans. Audio Speech Lang. Process., 2014
IEEE Signal Process. Lett., 2014
HMM-based unit selection speech synthesis using log likelihood ratios derived from perceptual data.
Speech Commun., 2014
Unsupervised Prosodic Labeling of Speech Synthesis Databases Using Context-Dependent HMMs.
IEICE Trans. Inf. Syst., 2014
Proceedings of the Odyssey 2014: The Speaker and Language Recognition Workshop, 2014
Speaker adaptation of hybrid NN/HMM model for speech recognition based on singular value decomposition.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
A fusion approach to spoken language identification based on combining multiple phone recognizers and speech attribute detectors.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Speech separation based on improved deep neural networks with dual outputs of speech features for both target and interfering speakers.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Integrating global variance of log power spectrum derived from LSPs into MGE training for HMM-based parametric speech synthesis.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Speaker adaptive bottleneck features extraction for LVCSR based on discriminative learning of speaker codes.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Performance evaluation of deep bottleneck features for spoken language identification.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Improving F0 prediction using bidirectional associative memories and syllable-level F0 features for HMM-based Mandarin speech synthesis.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Modeling DCT parameterized F0 trajectory at intonation phrase level with DNN or decision tree.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Concept-to-speech generation by integrating syntagmatic features into HMM-based speech synthesis.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Voice conversion using generative trained deep neural networks with multiple frame spectral envelopes.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
A Study of Designing Compact Classifiers Using Deep Neural Networks for Online Handwritten Chinese Character Recognition.
Proceedings of the 22nd International Conference on Pattern Recognition, 2014
Writer Adaptation Using Bottleneck Features and Discriminative Linear Regression for Online Handwritten Chinese Character Recognition.
Proceedings of the 14th International Conference on Frontiers in Handwriting Recognition, 2014
Sequence training of multiple deep neural networks for better performance and faster training speed.
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Spectral modeling using neural autoregressive distribution estimators for statistical parametric speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2014
Direct adaptation of hybrid DNN/HMM model for fast speaker adaptation in LVCSR based on speaker code.
Proceedings of the IEEE International Conference on Acoustics, 2014
Lattice based optimization of bottleneck feature extractor with linear transformation.
Proceedings of the IEEE International Conference on Acoustics, 2014
Using bidirectional associative memories for joint spectral envelope modeling in voice conversion.
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the International Conference on Audio, 2014
Global variance equalization for improving deep neural network based speech enhancement.
Proceedings of the IEEE China Summit & International Conference on Signal and Information Processing, 2014
2013
Joint spectral distribution modeling using restricted boltzmann machines for voice conversion.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
A cluster-based multiple deep neural networks method for large vocabulary continuous speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013
Unsupervised prosodic phrase boundary labeling of Mandarin speech synthesis database using context-dependent HMM.
Proceedings of the IEEE International Conference on Acoustics, 2013
Proceedings of the IEEE International Conference on Acoustics, 2013
Proceedings of the IEEE International Conference on Acoustics, 2013
Incoherent training of deep neural networks to de-correlate bottleneck features for speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013
Proceedings of the Blizzard Challenge 2013, 2013
2012
Minimum Kullback-Leibler Divergence Parameter Generation for HMM-Based Speech Synthesis.
IEEE Trans. Speech Audio Process., 2012
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Improved unit selection speech synthesis method utilizing subjective evaluation results on synthetic speech.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Cross-stream dependency modeling using continuous F0 model for HMM-based speech synthesis.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Considering Global Variance of the Log Power Spectrum Derived from Mel-Cepstrum in HMM-based Parametric Speech Synthesis.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Proceedings of the Blizzard Challenge 2012, Portland, OR, USA, September 14, 2012, 2012
2011
Trust Region-Based Optimization for Maximum Mutual Information Estimation of HMMs in Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2011
Improvements in Speaker Characterization Using Spectral Subband Energy Based on Harmonic plus Noise Model.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Estimation of Window Coefficients for Dynamic Feature Extraction for HMM-Based Speech Synthesis.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Proceedings of the IEEE International Conference on Acoustics, 2011
Building HMM based unit-selection speech synthesis system using synthetic speech naturalness evaluation score.
Proceedings of the IEEE International Conference on Acoustics, 2011
Speaker characterization using spectral subband energy ratio based on Harmonic plus Noise Model.
Proceedings of the IEEE International Conference on Acoustics, 2011
Preserve ordering property of generated LSPS for minimum generation error training in HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2011
Proceedings of the IEEE International Conference on Acoustics, 2011
Proceedings of the Blizzard Challenge 2011, Turin, Italy, September 2, 2011, 2011
Proceedings of the First Asian Conference on Pattern Recognition, 2011
2010
Cross-Validation and Minimum Generation Error based Decision Tree Pruning for HMM-based Speech Synthesis.
Int. J. Comput. Linguistics Chin. Lang. Process., 2010
Minimum generation error training for HMM-based prediction of articulatory movements.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
Automatic phrase boundary labeling for Mandarin TTS corpus using context-dependent HMM.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
The description of iFlyTek Speech Lab system for NIST2009 Language Recognition Evaluation.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
Non-negative matrix factorization based discriminative features for speaker verification.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
Statistical modeling of syllable-level F0 features for HMM-based unit selection speech synthesis.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
The estimation and kernel metric of spectral correlation for text-independent speaker verification.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Automatic error detection for unit selection speech synthesis using log likelihood ratio based SVM classifier.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Global variance modeling on the log power spectrum of LSPs for HMM-based speech synthesis.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Proceedings of the 2010 IEEE International Conference on Multimedia and Expo, 2010
A bounded trust region optimization for discriminative training of HMMS in speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2010
Minimum generation error training with weighted Euclidean distance on LSP for HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2010
Proceedings of the IEEE International Conference on Acoustics, 2010
Proceedings of the IEEE International Conference on Acoustics, 2010
Proceedings of the Blizzard Challenge 2010, Kansai Science City, Japan, September 25, 2010, 2010
2009
Comput. Vis. Image Underst., 2009
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009
Proceedings of the IEEE International Conference on Acoustics, 2009
Proceedings of the IEEE International Conference on Acoustics, 2009
Proceedings of the IEEE International Conference on Acoustics, 2009
Proceedings of the IEEE International Conference on Acoustics, 2009
Proceedings of the Blizzard Challenge 2009, Edinburgh, Scotland, UK, September 4, 2009, 2009
2008
Investigation on Adaptation Using Different Discriminative Training Criteria Based Linear Regression and Map.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Exploiting Non-Target Region Information for Confidence Measure Based on Bayesian Information Criterion.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Minimum generation error criterion considering global/local variance for HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2008
Minumum generation error linear regression based model adaptation for HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2008
Proceedings of the Blizzard Challenge 2008, 2008
2007
Int. J. Semantic Comput., 2007
Proceedings of the First IEEE International Conference on Semantic Computing (ICSC 2007), 2007
Proceedings of the Advances in Multimedia Modeling, 2007
Proceedings of the 15th International Conference on Multimedia 2007, 2007
Proceedings of the 15th International Conference on Multimedia 2007, 2007
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007
Proceedings of the IEEE International Conference on Acoustics, 2007
Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery, 2007
Proceedings of the Evaluation of text-to-speech systems: Blizzard Challenge 2007, 2007
2006
Proceedings of the 8th ACM SIGMM International Workshop on Multimedia Information Retrieval, 2006
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2006), 2006
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006
Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), 2006
An Automatic Video Semantic Annotation Scheme Based on Combination of Complementary Predictors.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2006
2005
Semi-automatic video annotation based on active learning with multiple complementary predictors.
Proceedings of the 7th ACM SIGMM International Workshop on Multimedia Information Retrieval, 2005
An Improved Spectral and Prosodic Transformation Method in STRAIGHT-based Voice Conversion.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005
Sliding Window Smoothing For Maximum Entropy Based Intonational Phrase Prediction In Chinese.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005
2004
Proceedings of the Advances in Multimedia Information Processing - PCM 2004, 5th Pacific Rim Conference on Multimedia, Tokyo, Japan, November 30, 2004
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004
A region based multiple frame-rate tradeoff of video streaming.
Proceedings of the 2004 International Conference on Image Processing, 2004
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004