Zhen-Hua Ling

Orcid: 0000-0001-7853-5273

According to our database1, Zhen-Hua Ling authored at least 304 papers between 2002 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Dynamic facial expression recognition with pseudo-label guided multi-modal pre-training.
IET Comput. Vis., February, 2024

Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Syntax-Augmented Hierarchical Interactive Encoder for Zero-Shot Cross-Lingual Information Extraction.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

PE-Wav2vec: A Prosody-Enhanced Speech Model for Self-Supervised Prosody Learning in TTS.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Low-Latency Neural Speech Phase Prediction Based on Parallel Estimation Architecture and Anti-Wrapping Losses for Speech Generation Tasks.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

APCodec: A Neural Audio Codec With Parallel Amplitude and Phase Spectrum Encoding and Decoding.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

MDCTCodec: A Lightweight MDCT-based Neural Audio Codec towards High Sampling Rate and Low Bitrate Scenarios.
CoRR, 2024

APCodec+: A Spectrum-Coding-Based High-Fidelity and High-Compression-Rate Neural Audio Codec with Staged Training Paradigm.
CoRR, 2024

Stage-Wise and Prior-Aware Neural Speech Phase Prediction.
CoRR, 2024

Geometry-Constrained EEG Channel Selection for Brain-Assisted Speech Enhancement.
CoRR, 2024

Refining Self-Supervised Learnt Speech Representation using Brain Activations.
CoRR, 2024

Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding.
CoRR, 2024

Multi-Stage Speech Bandwidth Extension with Flexible Sampling Rate Control.
CoRR, 2024

BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation.
CoRR, 2024

Perturbation-Restrained Sequential Model Editing.
CoRR, 2024

Voice Attribute Editing with Text Prompt.
CoRR, 2024

Corrective Retrieval Augmented Generation.
CoRR, 2024

Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction.
CoRR, 2024

Model Editing Can Hurt General Abilities of Large Language Models.
CoRR, 2024

Speech Reconstruction from Silent Lip and Tongue Articulation by Diffusion Models and Text-Guided Pseudo Target Generation.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Neighboring Perturbations of Knowledge Editing on Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

An End-to-End EEG Channel Selection Method with Residual Gumbel Softmax for Brain-Assisted Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2024

Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2024

Considering Temporal Connection between Turns for Conversational Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2024

Modeling Pseudo-Speaker Uncertainty in Voice Anonymization.
Proceedings of the IEEE International Conference on Acoustics, 2024

Adversarial Speech for Voice Privacy Protection from Personalized Speech Generation.
Proceedings of the IEEE International Conference on Acoustics, 2024

Retrieving, Rethinking and Revising: The Chain-of-Verification Can Improve Retrieval Augmented Generation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

X-ACE: Explainable and Multi-factor Audio Captioning Evaluation.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
Pronunciation Dictionary-Free Multilingual Speech Synthesis Using Learned Phonetic Representations.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Emotion-Regularized Conditional Variational Autoencoder for Emotional Response Generation.
IEEE Trans. Affect. Comput., 2023

Exploring the Topics of Audio Words for Detecting Alzheimer's Disease From Spontaneous Speech.
IEEE Signal Process. Lett., 2023

Long-Frame-Shift Neural Speech Phase Prediction With Spectral Continuity Enhancement and Interpolation Error Compensation.
IEEE Signal Process. Lett., 2023

MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding.
CoRR, 2023

Untying the Reversal Curse via Bidirectional Language Model Editing.
CoRR, 2023

Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement.
CoRR, 2023

MADNet: Maximizing Addressee Deduction Expectation for Multi-Party Conversation Generation.
CoRR, 2023

SHINE: Syntax-augmented Hierarchical Interactive Encoder for Zero-shot Cross-lingual Information Extraction.
CoRR, 2023

DiffuSIA: A Spiral Interaction Architecture for Encoder-Decoder Text Diffusion.
CoRR, 2023

Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis.
CoRR, 2023

USTC-NELSLIP at SemEval-2023 Task 2: Statistical Construction and Dual Adaptation of Gazetteer for Multilingual Complex NER.
Proceedings of the The 17th International Workshop on Semantic Evaluation, 2023

Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Speech Synthesis with Self-Supervisedly Learnt Prosodic Representations.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

BASEN: Time-Domain Brain-Assisted Speech Enhancement Network with Convolutional Cross Attention in Multi-talker Conditions.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Speech Reconstruction from Silent Tongue and Lip Articulation by Pseudo Target Generation and Domain Adversarial Training.
Proceedings of the IEEE International Conference on Acoustics, 2023

Self-Supervised Audio-Visual Speech Representations Learning by Multimodal Self-Distillation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Zero-Shot Personalized Lip-To-Speech Synthesis with Face Image Based Voice Control.
Proceedings of the IEEE International Conference on Acoustics, 2023

The Ustc System for Adress-m Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2023

Neural Speech Phase Prediction Based on Parallel Estimation Architecture and Anti-Wrapping Losses.
Proceedings of the IEEE International Conference on Acoustics, 2023

Is ChatGPT a Good Multi-Party Conversation Solver?
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

MADNet: Maximizing Addressee Deduction Expectation for Multi-Party Conversation Generation.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Symbolization, Prompt, and Classification: A Framework for Implicit Speaker Identification in Novels.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

The USTC-NERCSLIP System for the Track 1.2 of Audio Deepfake Detection (ADD 2023) Challenge.
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

GIFT: Graph-Induced Fine-Tuning for Multi-Party Conversation Understanding.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Pre-training Language Model as a Multi-perspective Course Learner.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Denoising-and-Dereverberation Hierarchical Neural Vocoder for Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Cognitive Diagnosis with Explicit Student Vector Estimation and Unsupervised Question Matrix Learning.
CoRR, 2022

USTC-NELSLIP at SemEval-2022 Task 11: Gazetteer-Adapted Integration Network for Multilingual Complex Named Entity Recognition.
Proceedings of the 16th International Workshop on Semantic Evaluation, SemEval@NAACL 2022, 2022

Decoupled Pronunciation and Prosody Modeling in Meta-Learning-based Multilingual Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Integrating Discrete Word-Level Style Variations into Non-Autoregressive Acoustic Models for Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining Unsupervised and Supervised Phonetic Representations.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Who Says What to Whom: A Survey of Multi-Party Conversations.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

PoNet: Pooling Network for Efficient Token Mixing in Long Sequences.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Speaker Adaption with Intuitive Prosodic Features for Statistical Parametric Speech Synthesis.
Proceedings of the ICDSP 2022: 6th International Conference on Digital Signal Processing, Chengdu, China, February 25, 2022

Discourse-Level Prosody Modeling with a Variational Autoencoder for Non-Autoregressive Expressive Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2022

Dementia Detection by Fusing Speech and Eye-Tracking Representation.
Proceedings of the IEEE International Conference on Acoustics, 2022

Using Multiple Reference Audios and Style Embedding Constraints for Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2022

Neural Grapheme-To-Phoneme Conversion with Pre-Trained Grapheme Models.
Proceedings of the IEEE International Conference on Acoustics, 2022

Improving Recognition-Synthesis Based any-to-one Voice Conversion with Cyclic Training.
Proceedings of the IEEE International Conference on Acoustics, 2022

Wider & Closer: Mixture of Short-channel Distillers for Zero-shot Cross-lingual Named Entity Recognition.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Detecting Alzheimer's Disease Based on Acoustic Features Extracted from Pre-trained Models.
Proceedings of the Artificial Intelligence - Second CAAI International Conference, 2022

TegTok: Augmenting Text Generation via Task-specific and Open-world Knowledge.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

HeterMPC: A Heterogeneous Graph Neural Network for Response Generation in Multi-Party Conversations.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Conversation- and Tree-Structure Losses for Dialogue Disentanglement.
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering, 2022

2021
UnitNet: A Sequence-to-Sequence Acoustic Model for Concatenative Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Extracting and Predicting Word-Level Style Variations for Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

A Multiple-Integration Encoder for Multi-Turn Text-to-SQL Semantic Parsing.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Deep Contextualized Utterance Representations for Response Selection and Dialogue Analysis.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Robustness of Speech Spoofing Detectors Against Adversarial Post-Processing of Voice Conversion.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Compressed Network in Network Models for Traffic Classification.
Proceedings of the IEEE Wireless Communications and Networking Conference, 2021

Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust Waveform Generation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Partner Matters! An Empirical Study on Fusing Personas for Personalized Response Selection in Retrieval-Based Chatbots.
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

SemEval-2021 Task 4: Reading Comprehension of Abstract Meaning.
Proceedings of the 15th International Workshop on Semantic Evaluation, 2021

Phase Spectrum Recovery for Enhancing Low-Quality Speech Captured by Laser Microphones.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

UnitNet-Based Hybrid Speech Synthesis.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Adversarial Voice Conversion Against Neural Spoofing Detectors.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Neural-Network-Based Approach to Identifying Speakers in Novels.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Learning Deep and Wide Contextual Representations Using BERT for Statistical Parametric Speech Synthesis.
Proceedings of the ICDSP 2021: 5th International Conference on Digital Signal Processing, 2021

Voice spoofing detection with raw waveform based on Dual Path Res2net.
Proceedings of the ICCSE '21: 5th International Conference on Crowd Science and Engineering, Jinan, China, October 16, 2021

Graph Attention and Interaction Network With Multi-Task Learning for Fact Verification.
Proceedings of the IEEE International Conference on Acoustics, 2021

Patnet : A Phoneme-Level Autoregressive Transformer Network for Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2021

Detecting Alzheimer's Disease from Speech Using Neural Networks with Bottleneck Features and Data Augmentation.
Proceedings of the IEEE International Conference on Acoustics, 2021

Have You Made a Decision? Where? A Pilot Study on Interpretability of Polarity Analysis Based on Advising Problem.
Proceedings of the IEEE International Conference on Acoustics, 2021

Improving Naturalness and Controllability of Sequence-to-Sequence Speech Synthesis by Learning Local Prosody Representations.
Proceedings of the IEEE International Conference on Acoustics, 2021

Detecting Speaker Personas from Conversational Texts.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Selecting and Analyzing Speech Features for the Screening of Mild Cognitive Impairment.
Proceedings of the 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2021

The Blizzard Challenge 2021.
Proceedings of the Blizzard Challenge 2021, virtual, October 23, 2021, 2021

A Deep Analysis of Speech Separation Guided Diarization Under Realistic Conditions.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

MPC-BERT: A Pre-Trained Language Model for Multi-Party Conversation Understanding.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

TaLNet: Voice Reconstruction from Tongue and Lip Articulation with Transfer Learning from Text-to-Speech Synthesis.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Tracking Interaction States for Multi-Turn Text-to-SQL Semantic Parsing.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Non-Parallel Sequence-to-Sequence Voice Conversion With Disentangled Linguistic and Speaker Representations.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Utterance-to-Utterance Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

A Neural Vocoder With Hierarchical Generation of Amplitude and Phase Spectra for Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Learning and Modeling Unit Embeddings Using Deep Neural Networks for Unit-Selection-Based Mandarin Speech Synthesis.
ACM Trans. Asian Low Resour. Lang. Inf. Process., 2020

Condition-Transforming Variational Autoencoder for Generating Diverse Short Text Conversations.
ACM Trans. Asian Low Resour. Lang. Inf. Process., 2020

Bidirectional Attention for Text-Dependent Speaker Verification.
Sensors, 2020

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech.
Comput. Speech Lang., 2020

Generating diverse conversation responses by creating and ranking multiple candidates.
Comput. Speech Lang., 2020

Learning to Retrieve Entity-Aware Knowledge and Generate Responses with Copy Mechanism for Task-Oriented Dialogue Systems.
CoRR, 2020

Filtering before Iteratively Referring for Knowledge-Grounded Response Selection in Retrieval-Based Chatbots.
CoRR, 2020

DialBERT: A Hierarchical Pre-Trained Model for Conversation Disentanglement.
CoRR, 2020

Pre-Trained and Attention-Based Neural Networks for Building Noetic Task-Oriented Dialogue Systems.
CoRR, 2020

Fine-Tuning BERT for Schema-Guided Zero-Shot Dialogue State Tracking.
CoRR, 2020

Encrypted Network Traffic Classification Using Deep and Parallel Network-in-Network Models.
IEEE Access, 2020

Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

An Adaptive X-Vector Model for Text-Independent Speaker Verification.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Adaptive Speaker Normalization for CTC-Based Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Unsupervised Regularization-Based Adaptive Training for Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Reverberation Modeling for Source-Filter-Based Neural Vocoder.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural Vocoders.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

DCDT: A Digital Clock Drawing Test System for Cognitive Impairment Screening.
Proceedings of the 36th IEEE International Conference on Data Engineering, 2020

Extracting Unit Embeddings Using Sequence-To-Sequence Acoustic Models for Unit Selection Speech Synthesis.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

WaveFFJORD: FFJORD-Based Vocoder for Statistical Parametric Speech Synthesis.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Joint Intent Detection and Entity Linking on Spatial Domain Queries.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Filtering before Iteratively Referring for Knowledge-Grounded Response Selection in Retrieval-Based Chatbots.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Text Classification by Contrastive Learning and Cross-lingual Data Augmentation for Alzheimer's Disease Detection.
Proceedings of the 28th International Conference on Computational Linguistics, 2020

Speaker-Aware BERT for Multi-Turn Response Selection in Retrieval-Based Chatbots.
Proceedings of the CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, 2020

The Blizzard Challenge 2020.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

Voice Conversion by Cascading Automatic Speech Recognition and Text-to-Speech Synthesis with Prosody Transfer.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

Non-Parallel Voice Conversion with Autoregressive Conversion Model and Duration Adjustment.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

Voice Conversion Challenge 2020 -- Intra-lingual semi-parallel and cross-lingual voice conversion --.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

Online Speaker Adaptation for WaveNet-based Neural Vocoders.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Adversarial Post-Processing of Voice Conversion against Spoofing Detection.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019
Sequence-to-Sequence Acoustic Modeling for Voice Conversion.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

The ASVspoof 2019 database.
CoRR, 2019

Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models.
CoRR, 2019

Exploring Unsupervised Pretraining and Sentence Structure Modelling for Winograd Schema Challenge.
CoRR, 2019

Promoting Diversity for End-to-End Conversation Response Generation.
CoRR, 2019

Knowledge Base Question Answering With Attentive Pooling for Question Representation.
IEEE Access, 2019

Distant Supervision Relation Extraction with Intra-Bag and Inter-Bag Attentions.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A Chinese Dataset for Identifying Speakers in Novels.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Neural Text Clustering with Document-Level Attention Based on Dynamic Soft Labels.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Multi-Classification Model for Spoken Language Understanding.
Proceedings of the International Conference on Multimodal Interaction, 2019

Learning Latent Representations for Style Control and Transfer in End-to-end Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2019

Improving Sequence-to-sequence Voice Conversion by Adding Text-supervision.
Proceedings of the IEEE International Conference on Acoustics, 2019

Condition-transforming Variational Autoencoder for Conversation Response Generation.
Proceedings of the IEEE International Conference on Acoustics, 2019

Channel Adversarial Training for Cross-channel Text-independent Speaker Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Dnn-based Spectral Enhancement for Neural Waveform Generators with Low-bit Quantization.
Proceedings of the IEEE International Conference on Acoustics, 2019

Dually Interactive Matching Network for Personalized Response Selection in Retrieval-Based Chatbots.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots.
Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019

The USTC System for Blizzard Challenge 2019.
Proceedings of the Blizzard Challenge 2019, Vienna, Austria, September 23, 2019, 2019

Linguistic Steganography by Sampling-based Language Generation.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Dementia Detection by Analyzing Spontaneous Mandarin Speech.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Multi-Level Matching and Aggregation Network for Few-Shot Relation Classification.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018
Improving the Decoding Efficiency of Deep Neural Network Acoustic Models by Cluster-Based Senone Selection.
J. Signal Process. Syst., 2018

Unit Selection Speech Synthesis Using Frame-Sized Speech Segments and Neural Network Based Acoustic Models.
J. Signal Process. Syst., 2018

A Sequential Neural Encoder With Latent Structured Description for Modeling Sentences.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Extracting Spectral Features Using Deep Autoencoders With Binary Distributed Hidden Units for Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Statistical Parametric Speech Synthesis Using Generalized Distillation Framework.
IEEE Signal Process. Lett., 2018

Articulatory-to-acoustic conversion using BLSTM-RNNs with augmented input representation.
Speech Commun., 2018

Building Sequential Inference Models for End-to-End Response Selection.
CoRR, 2018

Improving Sequence-to-Sequence Acoustic Modeling by Adding Text-Supervision.
CoRR, 2018

Forward Attention in Sequence-to-sequence Acoustic Modelling for Speech Synthesis.
CoRR, 2018

The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

GTDNN-Based Voice Conversion Using DAEs with Binary Distributed Hidden Units.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Learning and Modeling Unit Embeddings for Improving HMM-based Unit Selection Speech Synthesis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

WaveNet Vocoder with Limited Training Data for Voice Conversion.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Forward Attention in Sequence- To-Sequence Acoustic Modeling for Speech Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Pseudo-Supervised Approach for Text Clustering Based on Consensus Analysis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Samplernn-Based Neural Vocoder for Statistical Parametric Speech Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Enhancing Sentence Embedding with Generalized Pooling.
Proceedings of the 27th International Conference on Computational Linguistics, 2018

A Study on Improving End-to-End Neural Coreference Resolution.
Proceedings of the Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, 2018

The USTC System for Blizzard Challenge 2018.
Proceedings of the Blizzard Challenge 2018, Hyderabad, India, September 8, 2018, 2018

An Analysis of Speaker Diarization Fusion Methods For The First DIHARD Challenge.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Hybrid semi-Markov CRF for Neural Sequence Labeling.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

Neural Natural Language Inference Models Enhanced with External Knowledge.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017
Natural Language Inference with External Knowledge.
CoRR, 2017

Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference.
Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP, 2017

Waveform Modeling Using Stacked Dilated Convolutional Neural Networks for Speech Bandwidth Extension.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Cause-Effect Knowledge Acquisition and Neural Association Model for Solving A Set of Winograd Schema Problems.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Extracting structural spectral features using what-where auto-encoders for statistical parametric speech synthesis.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Question Answering with Character-Level LSTM Encoders and Model-Based Data Augmentation.
Proceedings of the Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, 2017

The USTC System for Blizzard Challenge 2017.
Proceedings of the Blizzard Challenge 2017, Stockholm, Sweden, August 25, 2017, 2017

The iFLYTEK system for blizzard machine learning challenge 2017-ES1.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

The USTC system for blizzard machine learning challenge 2017-ES2.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Emotional statistical parametric speech synthesis using LSTM-RNNs.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Enhanced LSTM for Natural Language Inference.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

Combing Context and Commonsense Knowledge Through Neural Networks for Solving Winograd Schema Problems.
Proceedings of the 2017 AAAI Spring Symposia, 2017

2016
Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

DBN-based Spectral Feature Representation for Statistical Parametric Speech Synthesis.
IEEE Signal Process. Lett., 2016

Modeling F0 trajectories in hierarchically structured deep neural networks.
Speech Commun., 2016

Concept-to-Speech generation with knowledge sharing for acoustic modelling and utterance filtering.
Comput. Speech Lang., 2016

Part-of-Speech Relevance Weights for Learning Word Embeddings.
CoRR, 2016

Probabilistic Reasoning via Deep Learning: Neural Association Models.
CoRR, 2016

Distraction-Based Neural Networks for Document Summarization.
CoRR, 2016

Enhancing and Combining Sequential and Tree LSTM for Natural Language Inference.
CoRR, 2016

Intra-Topic Variability Normalization based on Linear Projection for Topic Classification.
Proceedings of the NAACL HLT 2016, 2016

DNN-based unit selection using frame-sized speech segments.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Cluster-based senone selection for the efficient calculation of deep neural network acoustic models.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Articulatory-to-Acoustic Conversion with Cascaded Prediction of Spectral and Excitation Features Using Neural Networks.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

The USTC System for Voice Conversion Challenge 2016: Neural Network Based Approaches for Spectrum, Aperiodicity and F<sub>0</sub> Conversion.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Distraction-Based Neural Networks for Modeling Document.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

Modeling spectral envelopes using deep conditional restricted Boltzmann machines for statistical parametric speech synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

A full training framework of cross-stream dependence modelling for HMM-based singing voice synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Modulation spectrum compensation for HMM-based speech synthesis using line spectral pairs.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Deep belief network-based post-filtering for statistical parametric speech synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Exploring Semantic Representation in Brain Activity Using Word Embeddings.
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016

The USTC System for Blizzard Challenge 2016.
Proceedings of the Blizzard Challenge 2016, Cuppertino, CA, USA, September 16, 2016, 2016

2015
A Deep Generative Architecture for Postfiltering in Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends.
IEEE Signal Process. Mag., 2015

Statistical parametric speech synthesis using a hidden trajectory model.
Speech Commun., 2015

Integrate Document Ranking Information into Confidence Measure Calculation for Spoken Term Detection.
CoRR, 2015

Automatic phrase boundary labeling of speech synthesis database using context-dependent HMMs and n-gram prior distributions.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Restoring high frequency spectral envelopes using neural networks for speech bandwidth extension.
Proceedings of the 2015 International Joint Conference on Neural Networks, 2015

Spectral conversion using deep neural networks trained with multi-source speakers.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

LIP movement generation using restricted Boltzmann machines for visual speech synthesis.
Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015

The USTC System for Blizzard Challenge 2015.
Proceedings of the Blizzard Challenge 2015, 2015

Learning Semantic Word Embeddings based on Ordinal Knowledge Constraints.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015

2014
Voice conversion using deep neural networks with layer-wise generative training.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

HMM-based unit selection speech synthesis using log likelihood ratios derived from perceptual data.
Speech Commun., 2014

Unsupervised Prosodic Labeling of Speech Synthesis Databases Using Context-Dependent HMMs.
IEICE Trans. Inf. Syst., 2014

Integrating global variance of log power spectrum derived from LSPs into MGE training for HMM-based parametric speech synthesis.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Improving F0 prediction using bidirectional associative memories and syllable-level F0 features for HMM-based Mandarin speech synthesis.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Modeling DCT parameterized F0 trajectory at intonation phrase level with DNN or decision tree.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Concept-to-speech generation by integrating syntagmatic features into HMM-based speech synthesis.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

DNN-based stochastic postfilter for HMM-based speech synthesis.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Voice conversion using generative trained deep neural networks with multiple frame spectral envelopes.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Formant-controlled speech synthesis using hidden trajectory model.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Spectral modeling using neural autoregressive distribution estimators for statistical parametric speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2014

Using bidirectional associative memories for joint spectral envelope modeling in voice conversion.
Proceedings of the IEEE International Conference on Acoustics, 2014

The USTC System for Blizzard Challenge 2014.
Proceedings of the Blizzard Challenge 2014, Singapore, Singapore, September 19, 2014, 2014

2013
Articulatory Control of HMM-Based Parametric Speech Synthesis Using Feature-Space-Switched Multiple Regression.
IEEE Trans. Speech Audio Process., 2013

Modeling Spectral Envelopes Using Restricted Boltzmann Machines and Deep Belief Networks for Statistical Parametric Speech Synthesis.
IEEE Trans. Speech Audio Process., 2013

Mage - HMM-based speech synthesis reactively controlled by the articulators.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Mage - reactive articulatory feature control of HMM-based parametric speech synthesis.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

On the evaluation of inversion mapping performance in the acoustic domain.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Joint spectral distribution modeling using restricted boltzmann machines for voice conversion.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Unsupervised prosodic phrase boundary labeling of Mandarin speech synthesis database using context-dependent HMM.
Proceedings of the IEEE International Conference on Acoustics, 2013

Modeling spectral envelopes using restricted Boltzmann machines for statistical parametric speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2013

The USTC System for Blizzard Challenge 2013.
Proceedings of the Blizzard Challenge 2013, 2013

2012
Minimum Kullback-Leibler Divergence Parameter Generation for HMM-Based Speech Synthesis.
IEEE Trans. Speech Audio Process., 2012

Improved unit selection speech synthesis method utilizing subjective evaluation results on synthetic speech.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Cross-stream dependency modeling using continuous F0 model for HMM-based speech synthesis.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Considering Global Variance of the Log Power Spectrum Derived from Mel-Cepstrum in HMM-based Parametric Speech Synthesis.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Vowel Creation by Articulatory Control in HMM-based Parametric Speech Synthesis.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

The USTC System for Blizzard Challenge 2012.
Proceedings of the Blizzard Challenge 2012, Portland, OR, USA, September 14, 2012, 2012

2011
Feature-Space Transform Tying in Unified Acoustic-Articulatory Modelling for Articulatory Control of HMM-Based Speech Synthesis.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Formant-Controlled HMM-Based Speech Synthesis.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Estimation of Window Coefficients for Dynamic Feature Extraction for HMM-Based Speech Synthesis.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Building HMM based unit-selection speech synthesis system using synthetic speech naturalness evaluation score.
Proceedings of the IEEE International Conference on Acoustics, 2011

Preserve ordering property of generated LSPS for minimum generation error training in HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2011

Non-parallel training for voice conversion based on FT-GMM.
Proceedings of the IEEE International Conference on Acoustics, 2011

The USTC System for Blizzard Challenge 2011.
Proceedings of the Blizzard Challenge 2011, Turin, Italy, September 2, 2011, 2011

2010
An Analysis of HMM-based prediction of articulatory movements.
Speech Commun., 2010

Cross-Validation and Minimum Generation Error based Decision Tree Pruning for HMM-based Speech Synthesis.
Int. J. Comput. Linguistics Chin. Lang. Process., 2010

Minimum generation error training for HMM-based prediction of articulatory movements.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Automatic phrase boundary labeling for Mandarin TTS corpus using context-dependent HMM.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Statistical modeling of syllable-level F0 features for HMM-based unit selection speech synthesis.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

GMM-based voice conversion with explicit modelling on feature transform.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Automatic error detection for unit selection speech synthesis using log likelihood ratio based SVM classifier.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

HMM-based text-to-articulatory-movement prediction and analysis of critical articulators.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Global variance modeling on the log power spectrum of LSPs for HMM-based speech synthesis.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

A hierarchical F0 modeling method for HMM-based speech synthesis.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Minimum generation error training with weighted Euclidean distance on LSP for HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2010

The USTC System for Blizzard Challenge 2010.
Proceedings of the Blizzard Challenge 2010, Kansai Science City, Japan, September 25, 2010, 2010

2009
Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis.
IEEE Trans. Speech Audio Process., 2009

Integrating Articulatory Features Into HMM-Based Parametric Speech Synthesis.
IEEE Trans. Speech Audio Process., 2009

Asynchronous F0 and spectrum modeling for HMM-based speech synthesis.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

The USTC System for Blizzard Challenge 2009.
Proceedings of the Blizzard Challenge 2009, Edinburgh, Scotland, UK, September 4, 2009, 2009

2008
Model Adaptation for HMM-Based Speech Synthesis under Minimum Generation Error Criterion.
Proceedings of the Tenth IEEE International Symposium on Multimedia (ISM2008), 2008

Multi-Layer F0 Modeling for HMM-Based Speech Synthesis.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Cross-Stream Dependency Modeling for HMM-Based Speech Synthesis.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Heteronym Verification for Mandarin Speech Synthesis.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Robustness of HMM-based speech synthesis.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Articulatory control of HMM-based parametric speech synthesis driven by phonetic knowledge.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Minimum generation error criterion considering global/local variance for HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2008

Minumum generation error linear regression based model adaptation for HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2008

Minimum unit selection error training for HMM-based unit selection speech synthesis system.
Proceedings of the IEEE International Conference on Acoustics, 2008

The USTC System for Blizzard Challenge 2008.
Proceedings of the Blizzard Challenge 2008, 2008

2007
HMM-Based Hierarchical Unit Selection Combining Kullback-Leibler Divergence with Likelihood Criterion.
Proceedings of the IEEE International Conference on Acoustics, 2007

The USTC and iflytek speech synthesis systems for Blizzard Challenge 2007.
Proceedings of the Evaluation of text-to-speech systems: Blizzard Challenge 2007, 2007

2006
Applying SFC Model for Chinese Expressive Speech Synthesis.
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006

HMM-Based Emotional Speech Synthesis Using Average Emotion Model.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Improving the performance of HMM-based voice conversion using context clustering decision tree and appropriate regression matrix format.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

HMM-based unit selection using frame sized speech segments.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

USTC System for Blizzard Challenge 2006 an Improved HMM-based Speech Synthesis Method.
Proceedings of the Blizzard Challenge 2006, Pittsburgh, PA, USA, September 16, 2006, 2006

2005
An Improved Spectral and Prosodic Transformation Method in STRAIGHT-based Voice Conversion.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Emotional Speech Synthesis Based on Improved Codebook Mapping Voice Conversion.
Proceedings of the Affective Computing and Intelligent Interaction, 2005

A Novel Source Analysis Method by Matching Spectral Characters of LF Model with STRAIGHT Spectrum.
Proceedings of the Affective Computing and Intelligent Interaction, 2005

2004
Modeling glottal effect on the spectral envelop of STRAIGHT using mixture of Gaussians.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

A novel voice conversion system based on codebook mapping with phoneme-tied weighting.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Compression of speech database by feature separation and pattern clustering using STRAIGHT.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

2002
Decision tree based unit pre-selection in Mandarin Chinese synthesis.
Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002

A miniature Chinese TTS system based on tailored corpus.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002


  Loading...