Pengyuan Zhang
Orcid: 0000-0001-6838-5160
According to our database1,
Pengyuan Zhang
authored at least 160 papers
between 2006 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
Semi-supervised sound event detection with dynamic convolution and confidence-aware mean teacher.
Digit. Signal Process., 2025
2024
Modality-collaborative Transformer with Hybrid Feature Reconstruction for Robust Emotion Recognition.
ACM Trans. Multim. Comput. Commun. Appl., May, 2024
An efficient loss function and deep learning approach for ranking stock returns in the absence of prior knowledge.
Inf. Process. Manag., January, 2024
IEEE ACM Trans. Audio Speech Lang. Process., 2024
IEEE ACM Trans. Audio Speech Lang. Process., 2024
IEEE Signal Process. Lett., 2024
IEEE Signal Process. Lett., 2024
CoRR, 2024
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation.
CoRR, 2024
TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition.
CoRR, 2024
Novel audio characteristic-dependent feature extraction and data augmentation methods for cough-based respiratory disease classification.
Comput. Biol. Medicine, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
One-Epoch Training with Single Test Sample in Test Time for Better Generalization of Cough-Based Covid-19 Detection Model.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Snore Sound Features Based on Percussive Enhancing and Positional Encoding Combined with Multi-Task Learning for Osahs Detection.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2024, 2024
Proceedings of the Advances in Internet, Data & Web Technologies, 2024
2023
SFA: Searching faster architectures for end-to-end automatic speech recognition models.
Comput. Speech Lang., June, 2023
IEEE ACM Trans. Audio Speech Lang. Process., 2023
IEEE ACM Trans. Audio Speech Lang. Process., 2023
IEEE Signal Process. Lett., 2023
First coarse, fine afterward: A lightweight two-stage complex approach for monaural speech enhancement.
Speech Commun., 2023
Expert Syst. Appl., 2023
DSNet: Disentangled Siamese Network with Neutral Calibration for Speech Emotion Recognition.
CoRR, 2023
Synthetic Speech Detection Based on Temporal Consistency and Distribution of Speaker Features.
CoRR, 2023
Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder.
CoRR, 2023
Progressive Sub-Graph Clustering Algorithm for Semi-Supervised Domain Adaptation Speaker Verification.
CoRR, 2023
ForkNet: Simultaneous Time and Time-Frequency Domain Modeling for Speech Enhancement.
CoRR, 2023
CoRR, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Multi-Dimensional Frequency Dynamic Convolution with Confident Mean Teacher for Sound Event Detection.
Proceedings of the IEEE International Conference on Acoustics, 2023
Piecewise Position Encoding in Convolutional Neural Network for Cough-Based Covid-19 Detection.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023
Proceedings of the 18th Blizzard Challenge Workshop, Grenoble, France, August 29, 2023, 2023
2022
A Two-Fold Cross-Validation Training Framework Combined with Meta-Learning for Code-Switching Speech Recognition.
IEICE Trans. Inf. Syst., September, 2022
IEICE Trans. Inf. Syst., August, 2022
IEEE ACM Trans. Audio Speech Lang. Process., 2022
IEEE Signal Process. Lett., 2022
An Adversarial Domain Adaptation Framework With KL-Constraint for Remote Sensing Land Cover Classification.
IEEE Geosci. Remote. Sens. Lett., 2022
Master-Teacher-Student: A Weakly Labelled Semi-Supervised Framework for Audio Tagging and Sound Event Detection.
IEICE Trans. Inf. Syst., 2022
CoRR, 2022
Audio-Visual Scene Classification Using A Transfer Learning Based Joint Optimization Strategy.
CoRR, 2022
CoRR, 2022
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022
An IBC Reference Block Enhancement Model Based on GAN for Screen Content Video Coding.
Proceedings of the MultiMedia Modeling - 28th International Conference, 2022
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
Acoustic or Pattern? Speech Spoofing Countermeasure based on Image Pre-training Models.
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022
Integrating Cross-modal Interactions via Latent Representation Shift for Multi-modal Humor Detection.
Proceedings of the MuSe@MM 2022: Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge, 2022
A Mandarin Prosodic Boundary Prediction Model Based on Multi-Source Semi-Supervision.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Robust Cough Feature Extraction and Classification Method for COVID-19 Cough Detection Based on Vocalization Characteristics.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Improving Recognition of Out-of-vocabulary Words in E2E Code-switching ASR by Fusing Speech Generation Methods.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
NAS-SCAE: Searching Compact Attention-based Encoders For End-to-end Automatic Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
CTA-RNN: Channel and Temporal-wise Attention RNN leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel Output.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Interrelate Training and Searching: A Unified Online Clustering Framework for Speaker Diarization.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Improving Non-Autoregressive End-to-End Speech Recognition with Pre-Trained Acoustic and Language Models.
Proceedings of the IEEE International Conference on Acoustics, 2022
Improving CTC-Based Speech Recognition Via Knowledge Transferring from Pre-Trained Language Models.
Proceedings of the IEEE International Conference on Acoustics, 2022
DPT-FSNet: Dual-Path Transformer Based Full-Band and Sub-Band Fusion Network for Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2022
2021
Keyword Search Using Attention-Based End-to-End ASR and Frame-Synchronous Phoneme Alignments.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Speech Commun., 2021
D-MONA: A dilated mixed-order non-local attention network for speaker and language recognition.
Neural Networks, 2021
A dual-stream deep attractor network with multi-domain learning for speech dereverberation and separation.
Neural Networks, 2021
A Two-Stage Attention Based Modality Fusion Framework for Multi-Modal Speech Emotion Recognition.
IEICE Trans. Inf. Syst., 2021
Data Augmentation based Consistency Contrastive Pre-training for Automatic Speech Recognition.
CoRR, 2021
Improved Conformer-based End-to-End Speech Recognition Using Neural Architecture Search.
CoRR, 2021
Context-dependent Label Smoothing Regularization for Attention-based End-to-End Code-Switching Speech Recognition.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Cough-based COVID-19 Detection with Multi-band Long-Short Term Memory and Convolutional Neural Networks.
Proceedings of the ISAIMS 2021: 2nd International Symposium on Artificial Intelligence for Medicine Sciences, Beijing, China, October 29, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Improved Speech Enhancement Using a Complex-Domain GAN with Fused Time-Domain and Time-Frequency Domain Constraints.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
TVQVC: Transformer Based Vector Quantized Variational Autoencoder with CTC Loss for Voice Conversion.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Power Pooling: An Adaptive Pooling Function for Weakly Labelled Sound Event Detection.
Proceedings of the International Joint Conference on Neural Networks, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the Blizzard Challenge 2021, virtual, October 23, 2021, 2021
Far-Field Speech Recognition Based on Complex-Valued Neural Networks and Inter-Frame Similarity Difference Method.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
2020
IEEE ACM Trans. Audio Speech Lang. Process., 2020
IEEE Geosci. Remote. Sens. Lett., 2020
End-to-End Multilingual Speech Recognition System with Language Supervision Training.
IEICE Trans. Inf. Syst., 2020
Power pooling: An adaptive pooling function for weakly labelled sound event detection.
CoRR, 2020
Exploring the time-domain deep attractor network with two-stream architectures in a reverberant environment.
CoRR, 2020
ACGAN-based Data Augmentation Integrated with Long-term Scalogram for Acoustic Scene Classification.
CoRR, 2020
Power Pooling Operators and Confidence Learning for Semi-Supervised Sound Event Detection.
CoRR, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Speaker Diarization System Based on DPCA Algorithm for Fearless Steps Challenge Phase-2.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Improved Guided Source Separation Integrated with a Strong Back-End for the CHiME-6 Dinner Party Scenario.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the ICIT 2020, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
2019
Long/Short-Term Utility Aware Optimal Selection of Manufacturing Service Composition Toward Industrial Internet Platforms.
IEEE Trans. Ind. Informatics, 2019
IEEE ACM Trans. Audio Speech Lang. Process., 2019
Aluminum alloy microstructural segmentation method based on simple noniterative clustering and adaptive density-based spatial clustering of applications with noise.
J. Electronic Imaging, 2019
Aluminum alloy microstructural segmentation in micrograph with hierarchical parameter transfer learning method.
J. Electronic Imaging, 2019
Speaker-Phonetic I-Vector Modeling for Text-Dependent Speaker Verification with Random Digit Strings.
IEICE Trans. Inf. Syst., 2019
IEICE Trans. Inf. Syst., 2019
Investigation of knowledge transfer approaches to improve the acoustic modeling of Vietnamese ASR system.
IEEE CAA J. Autom. Sinica, 2019
Integrating the Data Augmentation Scheme with Various Classifiers for Acoustic Scene Modeling.
CoRR, 2019
Consensus aware manufacturing service collaboration optimization under blockchain based Industrial Internet platform.
Comput. Ind. Eng., 2019
Weighted Feature Fusion Based Emotional Recognition for Variable-length Speech using DNN.
Proceedings of the 15th International Wireless Communications & Mobile Computing Conference, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Speaker-Invariant Feature-Mapping for Distant Speech Recognition via Adversarial Teacher-Student Learning.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Target Speaker Recovery and Recognition Network with Average x-Vector and Global Training.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
An Audio Scene Classification Framework with Embedded Filters and a DCT-based Temporal Module.
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the Blizzard Challenge 2019, Vienna, Austria, September 23, 2019, 2019
Utterance-level Permutation Invariant Training with Latency-controlled BLSTM for Single-channel Multi-talker Speech Separation.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Proceedings of the 2019 International Conference on Artificial Intelligence and Advanced Manufacturing, 2019
2018
IEICE Trans. Inf. Syst., 2018
Multichannel ASR with Knowledge Distillation and Generalized Cross Correlation Feature.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Multilingual Speech Recognition Training and Adaptation with Language-Specific Gate Units.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Improving Language Modeling with an Adversarial Critic for Automatic Speech Recognition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Investigation on the Combination of Batch Normalization and Dropout in BLSTM-based Acoustic Modeling for ASR.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Improving Multichannel Speech Recognition with Generalized Cross Correlation Inputs and Multitask Learning.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
2017
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Proceedings of the 13th International Conference on Natural Computation, 2017
Fast variable-frame-rate decoding of speech recognition based on deep neural networks.
Proceedings of the 13th International Conference on Natural Computation, 2017
2016
Improved End-to-End Speech Recognition Using Adaptive Per-Dimensional Learning Rate Methods.
IEICE Trans. Inf. Syst., 2016
An unsupervised vocabulary selection technique for Chinese automatic speech recognition.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016
2015
Noise Robust IOA/CAS Speech Separation and Recognition System For The Third 'CHIME' Challenge.
CoRR, 2015
Proceedings of the 11th International Conference on Natural Computation, 2015
Proceedings of the 11th International Conference on Natural Computation, 2015
Proceedings of the Computational Social Networks - 4th International Conference, 2015
2014
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014
Proceedings of the 2014 Tenth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, 2014
Using neural network front-ends on far field multiple microphones based speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014
2012
2010
Enhancing the Robustness of the Posterior-Based Confidence Measures Using Entropy Information for Speech Recognition.
IEICE Trans. Inf. Syst., 2010
2007
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Proceedings of the Third International Conference on Natural Computation, 2007
Proceedings of the Third International Conference on Natural Computation, 2007
Proceedings of the Human-Computer Interaction. HCI Intelligent Multimodal Interaction Environments, 2007
2006
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006