Pengyuan Zhang

Orcid: 0000-0001-6838-5160

According to our database1, Pengyuan Zhang authored at least 160 papers between 2006 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Semi-supervised sound event detection with dynamic convolution and confidence-aware mean teacher.
Digit. Signal Process., 2025

2024
Modality-collaborative Transformer with Hybrid Feature Reconstruction for Robust Emotion Recognition.
ACM Trans. Multim. Comput. Commun. Appl., May, 2024

An efficient loss function and deep learning approach for ranking stock returns in the absence of prior knowledge.
Inf. Process. Manag., January, 2024

Boosting Cross-Domain Speech Recognition With Self-Supervision.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Interrelate Training and Clustering for Online Speaker Diarization.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Prototype Division for Self-Supervised Speaker Verification.
IEEE Signal Process. Lett., 2024

Synthetic Speech Detection Based on the Temporal Consistency of Speaker Features.
IEEE Signal Process. Lett., 2024

SF-Speech: Straightened Flow for Zero-Shot Voice Clone on Small-Scale Dataset.
CoRR, 2024

Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation.
CoRR, 2024

TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition.
CoRR, 2024

Novel audio characteristic-dependent feature extraction and data augmentation methods for cough-based respiratory disease classification.
Comput. Biol. Medicine, 2024

Improving Short Utterance Anti-Spoofing with Aasist2.
Proceedings of the IEEE International Conference on Acoustics, 2024

One-Epoch Training with Single Test Sample in Test Time for Better Generalization of Cough-Based Covid-19 Detection Model.
Proceedings of the IEEE International Conference on Acoustics, 2024

One-Class Knowledge Distillation for Spoofing Speech Detection.
Proceedings of the IEEE International Conference on Acoustics, 2024

Snore Sound Features Based on Percussive Enhancing and Positional Encoding Combined with Multi-Task Learning for Osahs Detection.
Proceedings of the IEEE International Conference on Acoustics, 2024

Make Audio Solely Drive Lip in Talking Face Video Synthesis.
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2024, 2024

Network Scanning Detection Based on Spatiotemporal Behavior.
Proceedings of the Advances in Internet, Data & Web Technologies, 2024

2023
SFA: Searching faster architectures for end-to-end automatic speech recognition models.
Comput. Speech Lang., June, 2023

How to make embeddings suitable for PLDA.
Comput. Speech Lang., June, 2023

Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

The Impact of Silence on Speech Anti-Spoofing.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

So-DAS: A Two-Step Soft-Direction-Aware Speech Separation Framework.
IEEE Signal Process. Lett., 2023

First coarse, fine afterward: A lightweight two-stage complex approach for monaural speech enhancement.
Speech Commun., 2023

Enhancing stock movement prediction with market index and curriculum learning.
Expert Syst. Appl., 2023

DSNet: Disentangled Siamese Network with Neutral Calibration for Speech Emotion Recognition.
CoRR, 2023

Synthetic Speech Detection Based on Temporal Consistency and Distribution of Speaker Features.
CoRR, 2023

Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder.
CoRR, 2023

Progressive Sub-Graph Clustering Algorithm for Semi-Supervised Domain Adaptation Speaker Verification.
CoRR, 2023

The HCCL system for VoxCeleb Speaker Recognition Challenge 2022.
CoRR, 2023

ForkNet: Simultaneous Time and Time-Frequency Domain Modeling for Speech Enhancement.
CoRR, 2023

Speech Corpora Divergence Based Unsupervised Data Selection for ASR.
CoRR, 2023

THLNet: two-stage heterogeneous lightweight network for monaural speech enhancement.
CoRR, 2023

PCF: ECAPA-TDNN with Progressive Channel Fusion for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2023

Multi-Dimensional Frequency Dynamic Convolution with Confident Mean Teacher for Sound Event Detection.
Proceedings of the IEEE International Conference on Acoustics, 2023

Piecewise Position Encoding in Convolutional Neural Network for Cough-Based Covid-19 Detection.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving the Robustness of Deepfake Audio Detection through Confidence Calibration.
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

Detecting Unknown Speech Spoofing Algorithms with Nearest Neighbors.
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

Description of a Multi-Stage Audio Spoofing System in ADD Challenge 2023.
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

The IOA-ThinkIT system for Blizzard Challenge 2023.
Proceedings of the 18th Blizzard Challenge Workshop, Grenoble, France, August 29, 2023, 2023

2022
A Two-Fold Cross-Validation Training Framework Combined with Meta-Learning for Code-Switching Speech Recognition.
IEICE Trans. Inf. Syst., September, 2022

Label-Adversarial Jointly Trained Acoustic Word Embedding.
IEICE Trans. Inf. Syst., August, 2022

Self-Supervised Pre-Training for Attention-Based Encoder-Decoder ASR Model.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

An E2E-ASR-Based Iteratively-Trained Timestamp Estimator.
IEEE Signal Process. Lett., 2022

An Adversarial Domain Adaptation Framework With KL-Constraint for Remote Sensing Land Cover Classification.
IEEE Geosci. Remote. Sens. Lett., 2022

Master-Teacher-Student: A Weakly Labelled Semi-Supervised Framework for Audio Tagging and Sound Event Detection.
IEICE Trans. Inf. Syst., 2022

Deepfake Detection System for the ADD Challenge Track 3.2 Based on Score Fusion.
CoRR, 2022

Streaming non-autoregressive model for any-to-many voice conversion.
CoRR, 2022

Audio-Visual Scene Classification Using A Transfer Learning Based Joint Optimization Strategy.
CoRR, 2022

Back-ends Selection for Deep Speaker Embeddings.
CoRR, 2022

The HCCL-DKU system for fake audio generation task of the 2022 ICASSP ADD Challenge.
CoRR, 2022

Robust Cross-SubBand Countermeasure Against Replay Attacks.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

An IBC Reference Block Enhancement Model Based on GAN for Screen Content Video Coding.
Proceedings of the MultiMedia Modeling - 28th International Conference, 2022

Deepfake Detection System for the ADD Challenge Track 3.2 Based on Score Fusion.
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

DDAM '22: 1st International Workshop on Deepfake Detection for Audio Multimedia.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Acoustic or Pattern? Speech Spoofing Countermeasure based on Image Pre-training Models.
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

Improving Spoofing Capability for End-to-end Any-to-many Voice Conversion.
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

Integrating Cross-modal Interactions via Latent Representation Shift for Multi-modal Humor Detection.
Proceedings of the MuSe@MM 2022: Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge, 2022

A Mandarin Prosodic Boundary Prediction Model Based on Multi-Source Semi-Supervision.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Sequence Distribution Matching for Unsupervised Domain Adaptation in ASR.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Summary On The ISCSLP 2022 Chinese-English Code-Switching ASR Challenge.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Decoupled Federated Learning for ASR with Non-IID Data.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Robust Cough Feature Extraction and Classification Method for COVID-19 Cough Detection Based on Vocalization Characteristics.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

SASV Based on Pre-trained ASV System and Integrated Scoring Module.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving Recognition of Out-of-vocabulary Words in E2E Code-switching ASR by Fusing Speech Generation Methods.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

NAS-SCAE: Searching Compact Attention-based Encoders For End-to-end Automatic Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

CTA-RNN: Channel and Temporal-wise Attention RNN leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel Output.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Interrelate Training and Searching: A Unified Online Clustering Framework for Speaker Diarization.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving Non-Autoregressive End-to-End Speech Recognition with Pre-Trained Acoustic and Language Models.
Proceedings of the IEEE International Conference on Acoustics, 2022

Improving CTC-Based Speech Recognition Via Knowledge Transferring from Pre-Trained Language Models.
Proceedings of the IEEE International Conference on Acoustics, 2022

DPT-FSNet: Dual-Path Transformer Based Full-Band and Sub-Band Fusion Network for Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Keyword Search Using Attention-Based End-to-End ASR and Frame-Synchronous Phoneme Alignments.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

A unified system for multilingual speech recognition and language identification.
Speech Commun., 2021

D-MONA: A dilated mixed-order non-local attention network for speaker and language recognition.
Neural Networks, 2021

A dual-stream deep attractor network with multi-domain learning for speech dereverberation and separation.
Neural Networks, 2021

A Two-Stage Attention Based Modality Fusion Framework for Multi-Modal Speech Emotion Recognition.
IEICE Trans. Inf. Syst., 2021

Data Augmentation based Consistency Contrastive Pre-training for Automatic Speech Recognition.
CoRR, 2021

Wav2vec-S: Semi-Supervised Pre-Training for Speech Recognition.
CoRR, 2021

Improved Conformer-based End-to-End Speech Recognition Using Neural Architecture Search.
CoRR, 2021

Context-dependent Label Smoothing Regularization for Attention-based End-to-End Code-Switching Speech Recognition.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Non-autoregressive Deliberation-Attention based End-to-End ASR.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Cough-based COVID-19 Detection with Multi-band Long-Short Term Memory and Convolutional Neural Networks.
Proceedings of the ISAIMS 2021: 2nd International Symposium on Artificial Intelligence for Medicine Sciences, Beijing, China, October 29, 2021

The Effect of Silence and Dual-Band Fusion in Anti-Spoofing System.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

LinearSpeech: Parallel Text-to-Speech with Linear Complexity.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Adaptive Margin Circle Loss for Speaker Verification.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Incorporating Cross-Speaker Style Transfer for Multi-Language Text-to-Speech.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Improved Speech Enhancement Using a Complex-Domain GAN with Fused Time-Domain and Time-Frequency Domain Constraints.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

TVQVC: Transformer Based Vector Quantized Variational Autoencoder with CTC Loss for Voice Conversion.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Power Pooling: An Adaptive Pooling Function for Weakly Labelled Sound Event Detection.
Proceedings of the International Joint Conference on Neural Networks, 2021

The Thinkit System for Icassp2021 M2voc Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2021

RNN-T Based Open-Vocabulary Keyword Spotting in Mandarin with Multi-Level Detection.
Proceedings of the IEEE International Conference on Acoustics, 2021

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Text Data.
Proceedings of the IEEE International Conference on Acoustics, 2021

History Utterance Embedding Transformer LM for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

The IOA-ThinkIT system for Blizzard Challenge 2021.
Proceedings of the Blizzard Challenge 2021, virtual, October 23, 2021, 2021

Far-Field Speech Recognition Based on Complex-Valued Neural Networks and Inter-Frame Similarity Difference Method.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Online Hybrid CTC/Attention End-to-End Automatic Speech Recognition Architecture.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Domain Adaption for Fine-Grained Urban Village Extraction From Satellite Images.
IEEE Geosci. Remote. Sens. Lett., 2020

End-to-End Multilingual Speech Recognition System with Language Supervision Training.
IEICE Trans. Inf. Syst., 2020

Power pooling: An adaptive pooling function for weakly labelled sound event detection.
CoRR, 2020

Exploring the time-domain deep attractor network with two-stream architectures in a reverberant environment.
CoRR, 2020

ACGAN-based Data Augmentation Integrated with Long-term Scalogram for Acoustic Scene Classification.
CoRR, 2020

Power Pooling Operators and Confidence Learning for Semi-Supervised Sound Event Detection.
CoRR, 2020

Domain Adaptation Using Class Similarity for Robust Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Speaker Diarization System Based on DPCA Algorithm for Fearless Steps Challenge Phase-2.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Improved Guided Source Separation Integrated with a Strong Back-End for the CHiME-6 Dinner Party Scenario.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Lingual-Agnostic Meta-Learning for Low-Resource Part-of-Speech Tagging.
Proceedings of the ICIT 2020, 2020

Transformer-Based Online CTC/Attention End-To-End Speech Recognition Architecture.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

CN-Celeb: A Challenging Chinese Speaker Recognition Dataset.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Long/Short-Term Utility Aware Optimal Selection of Manufacturing Service Composition Toward Industrial Internet Platforms.
IEEE Trans. Ind. Informatics, 2019

Tailoring an Interpretable Neural Language Model.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Aluminum alloy microstructural segmentation method based on simple noniterative clustering and adaptive density-based spatial clustering of applications with noise.
J. Electronic Imaging, 2019

Aluminum alloy microstructural segmentation in micrograph with hierarchical parameter transfer learning method.
J. Electronic Imaging, 2019

Speaker-Phonetic I-Vector Modeling for Text-Dependent Speaker Verification with Random Digit Strings.
IEICE Trans. Inf. Syst., 2019

Automatic Speech Recognition System with Output-Gate Projected Gated Recurrent Unit.
IEICE Trans. Inf. Syst., 2019

Investigation of knowledge transfer approaches to improve the acoustic modeling of Vietnamese ASR system.
IEEE CAA J. Autom. Sinica, 2019

Integrating the Data Augmentation Scheme with Various Classifiers for Acoustic Scene Modeling.
CoRR, 2019

Consensus aware manufacturing service collaboration optimization under blockchain based Industrial Internet platform.
Comput. Ind. Eng., 2019

Weighted Feature Fusion Based Emotional Recognition for Variable-length Speech using DNN.
Proceedings of the 15th International Wireless Communications & Mobile Computing Conference, 2019

Multi-Accent Adaptation Based on Gate Mechanism.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Speaker-Invariant Feature-Mapping for Distant Speech Recognition via Adversarial Teacher-Student Learning.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Online Hybrid CTC/Attention Architecture for End-to-End Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Character-Aware Sub-Word Level Language Modeling for Uyghur and Turkish ASR.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Target Speaker Recovery and Recognition Network with Average x-Vector and Global Training.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Self-attention Based Prosodic Boundary Prediction for Chinese Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2019

An Audio Scene Classification Framework with Embedded Filters and a DCT-based Temporal Module.
Proceedings of the IEEE International Conference on Acoustics, 2019

The IOA-ThinkIT system for Blizzard Challenge 2019.
Proceedings of the Blizzard Challenge 2019, Vienna, Austria, September 23, 2019, 2019

Utterance-level Permutation Invariant Training with Latency-controlled BLSTM for Single-channel Multi-talker Speech Separation.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

A Novel Method for Automatic Heart Murmur Diagnosis Using Phonocardiogram.
Proceedings of the 2019 International Conference on Artificial Intelligence and Advanced Manufacturing, 2019

2018
Improve Multichannel Speech Recognition with Temporal and Spatial Information.
IEICE Trans. Inf. Syst., 2018

Multichannel ASR with Knowledge Distillation and Generalized Cross Correlation Feature.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Space-Time Residual LSTM Architechture for Distant Speech Recognition.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Evaluating Modeling Units and Sub-word Features in Language Models for Turkish ASR.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Multilingual Speech Recognition Training and Adaptation with Language-Specific Gate Units.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Improving Language Modeling with an Adversarial Critic for Automatic Speech Recognition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Investigation on the Combination of Batch Normalization and Dropout in BLSTM-based Acoustic Modeling for ASR.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Deep Convolutional Neural Network with Scalogram for Audio Scene Modeling.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Improving Multichannel Speech Recognition with Generalized Cross Correlation Inputs and Multitask Learning.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Attention-Based LSTM with Multi-Task Learning for Distant Speech Recognition.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

An improved lexicon generation method for mandarin speech recognition.
Proceedings of the 13th International Conference on Natural Computation, 2017

Fast variable-frame-rate decoding of speech recognition based on deep neural networks.
Proceedings of the 13th International Conference on Natural Computation, 2017

2016
Improved End-to-End Speech Recognition Using Adaptive Per-Dimensional Learning Rate Methods.
IEICE Trans. Inf. Syst., 2016

An unsupervised vocabulary selection technique for Chinese automatic speech recognition.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

2015
Noise Robust IOA/CAS Speech Separation and Recognition System For The Third 'CHIME' Challenge.
CoRR, 2015

A bi-scale method of link prediction.
Proceedings of the 11th International Conference on Natural Computation, 2015

An improvement of link prediction by combining local information and betweenness.
Proceedings of the 11th International Conference on Natural Computation, 2015

A Method of Link Prediction Based on Betweenness.
Proceedings of the Computational Social Networks - 4th International Conference, 2015

2014
Semi-supervised DNN training in meeting recognition.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Enhanced Out of Vocabulary Word Detection Using Local Acoustic Information.
Proceedings of the 2014 Tenth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, 2014

Using neural network front-ends on far field multiple microphones based speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

2012
Optimization of Spoken Term Detection System.
J. Appl. Math., 2012

2010
Enhancing the Robustness of the Posterior-Based Confidence Measures Using Entropy Information for Speech Recognition.
IEICE Trans. Inf. Syst., 2010

2007
A fast fuzzy keyword spotting algorithm based on syllable confusion network.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Keyword Spotting Based on Syllable Confusion Network.
Proceedings of the Third International Conference on Natural Computation, 2007

Real Context Model for Tone Recognition in Mandarin Conversational Telephone Speech.
Proceedings of the Third International Conference on Natural Computation, 2007

A Spoken Dialogue System Based on Keyword Spotting Technology.
Proceedings of the Human-Computer Interaction. HCI Intelligent Multimodal Interaction Environments, 2007

2006
Keyword Spotting Based on Phoneme Confusion Matrix.
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006

Syllable Based Audio Search Using Confusion Network Arc as Indexing Unit.
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006


  Loading...