Zhengqi Wen

Orcid: 0000-0001-9430-7115

According to our database1, Zhengqi Wen authored at least 136 papers between 2009 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
VLP2MSA: Expanding vision-language pre-training to multimodal sentiment analysis.
Knowl. Based Syst., January, 2024

Dual-Branch Knowledge Distillation for Noise-Robust Synthetic Speech Detection.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0.
CoRR, 2024

DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech.
CoRR, 2024

Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation.
CoRR, 2024

Exploring the Role of Audio in Multimodal Misinformation Detection.
CoRR, 2024

Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?
CoRR, 2024

EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech.
CoRR, 2024

A Noval Feature via Color Quantisation for Fake Audio Detection.
CoRR, 2024

Temporal Variability and Multi-Viewed Self-Supervised Representations to Tackle the ASVspoof5 Deepfake Challenge.
CoRR, 2024

VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing.
CoRR, 2024

MDPE: A Multimodal Deception Dataset with Personality and Emotional Characteristics.
CoRR, 2024

ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024.
CoRR, 2024

ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation.
CoRR, 2024

Fake News Detection and Manipulation Reasoning via Large Vision-Language Models.
CoRR, 2024

MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation.
CoRR, 2024

Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio.
CoRR, 2024

TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking.
CoRR, 2024

PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation.
CoRR, 2024

Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection.
CoRR, 2024

Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion Strategy.
CoRR, 2024

Generalized Fake Audio Detection via Deep Stable Learning.
CoRR, 2024

The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio.
CoRR, 2024

Emotion selectable end-to-end text-based speech editing.
Artif. Intell., 2024

Dual-View Multimodal Interaction in Multimodal Sentiment Analysis.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

2023
Learning to Behave Like Clean Speech: Dual-Branch Knowledge Distillation for Noise-Robust Fake Audio Detection.
CoRR, 2023

UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion.
CoRR, 2023

Multimodal Cross-Lingual Features and Weight Fusion for Cross-Cultural Humor Detection.
Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, 2023

Learning From Yourself: A Self-Distillation Method For Fake Speech Detection.
Proceedings of the IEEE International Conference on Acoustics, 2023

ADD 2023: the Second Audio Deepfake Detection Challenge.
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

The VIBVG Speech Synthesis System for Blizzard Challenge 2023.
Proceedings of the 18th Blizzard Challenge Workshop, Grenoble, France, August 29, 2023, 2023

Hybrid Multi-Task Learning for End-To-End Multimodal Emotion Recognition.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

NeuralDPS: Neural Deterministic Plus Stochastic Model With Multiband Excitation for Noise-Controllable Waveform Generation.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Hybrid Autoregressive and Non-Autoregressive Transformer Models for Speech Recognition.
IEEE Signal Process. Lett., 2022

ADD 2022: the First Audio Deep Synthesis Detection Challenge.
CoRR, 2022

Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features.
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

Singing-Tacotron: Global Duration Control Attention and Dynamic Filter for End-to-end Singing Voice Synthesis.
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

ADD 2022: the first Audio Deep Synthesis Detection Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

Context-Aware Mask Prediction Network for End-to-End Text-Based Speech Editing.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Integrating Knowledge Into End-to-End Speech Recognition From External Text-Only Data.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition.
CoRR, 2021

Fast End-to-End Speech Recognition via a Non-Autoregressive Model and Cross-Modal Knowledge Transferring from BERT.
CoRR, 2021

Which Phonemes Will Distinguish the Different Regions Within the Same Dialect?
Proceedings of the 24th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2021

Text Enhancement for Paragraph Processing in End-to-End Code-switching TTS.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Towards Fine-Grained Prosody Control for Voice Conversion.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-to-End Spelling Correction Conditioned on Acoustic Feature for Code-Switching Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Prosody and Voice Factorization for Few-Shot Speaker Adaptation in the Challenge M2voc 2021.
Proceedings of the IEEE International Conference on Acoustics, 2021

Bi-Level Style and Prosody Decoupling Modeling for Personalized End-to-End Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2021

Decoupling Pronunciation and Language for End-to-End Code-Switching Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Simulation Analysis on Seismic Capacity of 220kV GIS Switch Bay Mobile Load Transfer Equipment.
Proceedings of the EEET 2021: 4th International Conference on Electronics and Electrical Engineering Technology, Nanjing, China, December 3, 2021

One In A Hundred: Selecting the Best Predicted Sequence from Numerous Candidates for Speech Recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020
A Public Chinese Dataset for Language Model Adaptation.
J. Signal Process. Syst., 2020

End-to-End Post-Filter for Speech Separation With Deep Attention Fusion Features.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Simultaneous Denoising and Dereverberation Using Deep Embedding Features.
CoRR, 2020

Deep Attention Fusion Feature for Speech Separation with End-to-End Post-filter Method.
CoRR, 2020

Spatial and spectral deep attention fusion for multi-channel speech separation using deep embedding features.
CoRR, 2020

Spoken Content and Voice Factorization for Few-Shot Speaker Adaptation.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Bi-Level Speaker Supervision for One-Shot Speech Synthesis.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Non-Autoregressive End-to-End TTS with Coarse-to-Fine Decoding.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

ARVC: An Auto-Regressive Voice Conversion System Without Parallel Training Data.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Dynamic Speaker Representations Adjustment and Decoder Factorization for Speaker Adaptation in End-to-End Speech Synthesis.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Joint Training for Simultaneous Speech Denoising and Dereverberation with Deep Embedding Representations.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Gated Recurrent Fusion of Spatial and Spectral Features for Multi-Channel Speech Separation with Deep Embedding Representations.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Synchronous Transformers for end-to-end Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Focusing on Attention: Prosody Transfer and Adaptative Optimization Strategy for Multi-Speaker End-to-End Speech Synthesis.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

CASIA Voice Conversion System for the Voice Conversion Challenge 2020.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

The NLPR Speech Synthesis entry for Blizzard Challenge 2020.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

2019
Forward-Backward Decoding Sequence for Regularizing End-to-End TTS.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Language-Adversarial Transfer Learning for Low-Resource Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Integrating Whole Context to Sequence-to-sequence Speech Recognition.
CoRR, 2019

Towards Fine-Grained Prosody Control for Voice Conversion.
CoRR, 2019

Forward-Backward Decoding for Regularizing End-to-End TTS.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Self-Attention Transducers for End-to-End Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Phoneme Dependent Speaker Embedding and Model Factorization for Multi-speaker Speech Synthesis and Adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2019

The NLPR Speech Synthesis entry for Blizzard Challenge 2019.
Proceedings of the Blizzard Challenge 2019, Vienna, Austria, September 23, 2019, 2019

Noise Prior Knowledge Learning for Speech Enhancement via Gated Convolutional Generative Adversarial Network.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Voice Activity Detection Based on Time-Delay Neural Networks.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018
Investigating Deep Neural Network Adaptation for Generating Exclamatory and Interrogative Speech in Mandarin.
J. Signal Process. Syst., 2018

CTC Regularized Model Adaptation for Improving LSTM RNN Based Multi-Accent Mandarin Speech Recognition.
J. Signal Process. Syst., 2018

Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning.
J. Signal Process. Syst., 2018

Distilling Knowledge Using Parallel Data for Far-field Speech Recognition.
CoRR, 2018

Utterance-level Permutation Invariant Training with Discriminative Learning for Single Channel Speech Separation.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

CLMAD: A Chinese Language Model Adaptation Dataset.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

BLSTM-CRF Based End-to-End Prosodic Boundary Prediction with Context Sensitive Embeddings in a Text-to-Speech Front-End.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

On the Application and Compression of Deep Time Delay Neural Network for Embedded Statistical Parametric Speech Synthesis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Deep Metric Learning for the Target Cost in Unit-Selection Speech Synthesizer.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Transfer Learning Based Progressive Neural Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Adversarial Multilingual Training for Low-Resource Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Continuous Multimodal Emotion Prediction Based on Long Short Term Memory Recurrent Neural Network.
Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, Mountain View, CA, USA, October 23, 2017

Investigating Efficient Feature Representation Methods and Training Objective for BLSTM-Based Phone Duration Prediction.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Distilling Knowledge from an Ensemble of Models for Punctuation Prediction.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

The NLPR Speech Synthesis entry for Blizzard Challenge 2017.
Proceedings of the Blizzard Challenge 2017, Stockholm, Sweden, August 25, 2017, 2017

2016
Speech Enhancement Based on Analysis-Synthesis Framework with Improved Parameter Domain Enhancement.
J. Signal Process. Syst., 2016

Investigating Effect of Rich Syntactic Features on Mandarin Prosodic Boundaries Prediction.
J. Signal Process. Syst., 2016

Audio Visual Emotion Recognition with Temporal Alignment and Perception Attention.
CoRR, 2016

Text-based sentential stress prediction using continuous lexical embedding for Mandarin speech synthesis.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Learning auxiliary categorical information for speech synthesis based on deep and recurrent neural networks.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Improving accented Mandarin speech recognition by using recurrent neural network based language model adaptation.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

End-to-end keywords spotting based on connectionist temporal classification for Mandarin.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Improving Prosodic Boundaries Prediction for Mandarin Speech Synthesis by Using Enhanced Embedding Feature and Model Fusion Approach.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

The Parameterized Phoneme Identity Feature as a Continuous Real-Valued Vector for Neural Network Based Speech Synthesis.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Long short term memory recurrent neural network based encoding method for emotion recognition in video.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Recurrent Neural Network Based Language Model Adaptation for Accent Mandarin Speech.
Proceedings of the Pattern Recognition - 7th Chinese Conference, 2016

BLSTM Guided Unit Selection Synthesis System for Blizzard Challenge 2016.
Proceedings of the Blizzard Challenge 2016, Cuppertino, CA, USA, September 16, 2016, 2016

Improving BLSTM RNN based Mandarin speech recognition using accent dependent bottleneck features.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Deep neural network based voice conversion with a large synthesized parallel corpus.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015
Long Short Term Memory Recurrent Neural Network based Multimodal Dimensional Emotion Recognition.
Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, 2015

Combining extreme learning machine and decision tree for duration prediction in HMM based speech synthesis.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A novel method of artificial bandwidth extension using deep architecture.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

2014
Pitch-Scaled Spectrum Based Excitation Model for HMM-based Speech Synthesis.
J. Signal Process. Syst., 2014

Multi-scale Temporal Modeling for Dimensional Emotion Recognition in Video.
Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge, 2014

Survey on discriminative feature selection for speech emotion recognition.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Context features based pre-selection and weight prediction in concatenation speech synthesis system.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Efficient voice activity detection algorithm based on sub-band temporal envelope and sub-band long-term signal variability.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Investigating effect of rich syntactic features on Mandarin prosodic phrase boundaries prediction.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

A hierarchical viterbi algorithm for Mandarin hybrid speech synthesis system.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A novel hybrid mandarin speech synthesis system using different base units for model training and concatenation.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013
A novel unit selection method for concatenation speech system using similarity measure.
Proceedings of the 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013

2012
Statistical modification based post-filtering technique for HMM-based speech synthesis.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Amplitude Spectrum based Excitation Model for HMM-based Speech Synthesis.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Pitch-Scaled Analysis based Residual Reconstruction for Speech Analysis and Synthesis.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

2011
An excitation model based on inverse filtering for speech analysis and synthesis.
Proceedings of the 2011 IEEE International Workshop on Machine Learning for Signal Processing, 2011

Inverse Filtering Based Harmonic Plus Noise Excitation Model for HMM-Based Speech Synthesis.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

2010
The WISTON Text to Speech System for Blizzard Challenge 2010.
Proceedings of the Blizzard Challenge 2010, Kansai Science City, Japan, September 25, 2010, 2010

2009
The WISTON Text-to-Speech System for Blizzard Challenge 2009.
Proceedings of the Blizzard Challenge 2009, Edinburgh, Scotland, UK, September 4, 2009, 2009


  Loading...