Pengyuan Zhang

CoRR, 2022

Audio-Visual Scene Classification Using A Transfer Learning Based Joint Optimization Strategy.

[BibT_eX]

[DOI]

Chengxin Chen

Meng Wang

CoRR, 2022

Back-ends Selection for Deep Speaker Embeddings.

[BibT_eX]

[DOI]

CoRR, 2022

The HCCL-DKU system for fake audio generation task of the 2022 ICASSP ADD Challenge.

[BibT_eX]

[DOI]

CoRR, 2022

Robust Cross-SubBand Countermeasure Against Replay Attacks.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

An IBC Reference Block Enhancement Model Based on GAN for Screen Content Video Coding.

[BibT_eX]

[DOI]

Proceedings of the MultiMedia Modeling - 28th International Conference, 2022

Deepfake Detection System for the ADD Challenge Track 3.2 Based on Score Fusion.

[BibT_eX]

[DOI]

Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

DDAM '22: 1st International Workshop on Deepfake Detection for Audio Multimedia.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Acoustic or Pattern? Speech Spoofing Countermeasure based on Image Pre-training Models.

[BibT_eX]

[DOI]

Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

Improving Spoofing Capability for End-to-end Any-to-many Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

Integrating Cross-modal Interactions via Latent Representation Shift for Multi-modal Humor Detection.

[BibT_eX]

[DOI]

Chengxin Chen

Proceedings of the MuSe@MM 2022: Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge, 2022

A Mandarin Prosodic Boundary Prediction Model Based on Multi-Source Semi-Supervision.

[BibT_eX]

[DOI]

Peiyang Shi

Zengqiang Shang

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Sequence Distribution Matching for Unsupervised Domain Adaptation in ASR.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Summary On The ISCSLP 2022 Chinese-English Code-Switching ASR Challenge.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Decoupled Federated Learning for ASR with Non-IID Data.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Robust Cough Feature Extraction and Classification Method for COVID-19 Cough Detection Based on Vocalization Characteristics.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

SASV Based on Pre-trained ASV System and Integrated Scoring Module.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving Recognition of Out-of-vocabulary Words in E2E Code-switching ASR by Fusing Speech Generation Methods.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

NAS-SCAE: Searching Compact Attention-based Encoders For End-to-end Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

CTA-RNN: Channel and Temporal-wise Attention RNN leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition.

[BibT_eX]

[DOI]

Chengxin Chen

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel Output.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Interrelate Training and Searching: A Unified Online Clustering Framework for Speaker Diarization.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving Non-Autoregressive End-to-End Speech Recognition with Pre-Trained Acoustic and Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Improving CTC-Based Speech Recognition Via Knowledge Transferring from Pre-Trained Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

DPT-FSNet: Dual-Path Transformer Based Full-Band and Sub-Band Fusion Network for Speech Enhancement.

[BibT_eX]

[DOI]

Feng Dang

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Keyword Search Using Attention-Based End-to-End ASR and Frame-Synchronous Phoneme Alignments.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

A unified system for multilingual speech recognition and language identification.

[BibT_eX]

[DOI]

Speech Commun., 2021

D-MONA: A dilated mixed-order non-local attention network for speaker and language recognition.

[BibT_eX]

[DOI]

Neural Networks, 2021

A dual-stream deep attractor network with multi-domain learning for speech dereverberation and separation.

[BibT_eX]

[DOI]

Neural Networks, 2021

A Two-Stage Attention Based Modality Fusion Framework for Multi-Modal Speech Emotion Recognition.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2021

Data Augmentation based Consistency Contrastive Pre-training for Automatic Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

Wav2vec-S: Semi-Supervised Pre-Training for Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

Improved Conformer-based End-to-End Speech Recognition Using Neural Architecture Search.

[BibT_eX]

[DOI]

CoRR, 2021

Context-dependent Label Smoothing Regularization for Attention-based End-to-End Code-Switching Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Non-autoregressive Deliberation-Attention based End-to-End ASR.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Cough-based COVID-19 Detection with Multi-band Long-Short Term Memory and Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the ISAIMS 2021: 2nd International Symposium on Artificial Intelligence for Medicine Sciences, Beijing, China, October 29, 2021

The Effect of Silence and Dual-Band Fusion in Anti-Spoofing System.

[BibT_eX]

[DOI]

Yuxiang Zhang

Wenchao Wang

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

LinearSpeech: Parallel Text-to-Speech with Linear Complexity.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Adaptive Margin Circle Loss for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Incorporating Cross-Speaker Style Transfer for Multi-Language Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Improved Speech Enhancement Using a Complex-Domain GAN with Fused Time-Domain and Time-Frequency Domain Constraints.

[BibT_eX]

[DOI]

Feng Dang

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

TVQVC: Transformer Based Vector Quantized Variational Autoencoder with CTC Loss for Voice Conversion.

[BibT_eX]

[DOI]

Ziyi Chen

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Power Pooling: An Adaptive Pooling Function for Weakly Labelled Sound Event Detection.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2021

The Thinkit System for Icassp2021 M2voc Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

RNN-T Based Open-Vocabulary Keyword Spotting in Mandarin with Multi-Level Detection.

[BibT_eX]

[DOI]

Zuozhen Liu

Ta Li

Proceedings of the IEEE International Conference on Acoustics, 2021

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Text Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

History Utterance Embedding Transformer LM for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

The IOA-ThinkIT system for Blizzard Challenge 2021.

[BibT_eX]

[DOI]

Proceedings of the Blizzard Challenge 2021, virtual, October 23, 2021, 2021

Far-Field Speech Recognition Based on Complex-Valued Neural Networks and Inter-Frame Similarity Difference Method.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Online Hybrid CTC/Attention End-to-End Automatic Speech Recognition Architecture.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Domain Adaption for Fine-Grained Urban Village Extraction From Satellite Images.

[BibT_eX]

[DOI]

IEEE Geosci. Remote. Sens. Lett., 2020

End-to-End Multilingual Speech Recognition System with Language Supervision Training.

[BibT_eX]

[DOI]

Danyang Liu

Ji Xu

IEICE Trans. Inf. Syst., 2020

Power pooling: An adaptive pooling function for weakly labelled sound event detection.

[BibT_eX]

[DOI]

CoRR, 2020

Exploring the time-domain deep attractor network with two-stream architectures in a reverberant environment.

[BibT_eX]

[DOI]

CoRR, 2020

ACGAN-based Data Augmentation Integrated with Long-term Scalogram for Acoustic Scene Classification.

[BibT_eX]

[DOI]

CoRR, 2020

Power Pooling Operators and Confidence Learning for Semi-Supervised Sound Event Detection.

[BibT_eX]

[DOI]

Yuzhuo Liu

CoRR, 2020

Domain Adaptation Using Class Similarity for Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Speaker Diarization System Based on DPCA Algorithm for Fearless Steps Challenge Phase-2.

[BibT_eX]

[DOI]

Xueshuai Zhang

Wenchao Wang

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Improved Guided Source Separation Integrated with a Strong Back-End for the CHiME-6 Dinner Party Scenario.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Lingual-Agnostic Meta-Learning for Low-Resource Part-of-Speech Tagging.

[BibT_eX]

[DOI]

Proceedings of the ICIT 2020, 2020

Transformer-Based Online CTC/Attention End-To-End Speech Recognition Architecture.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

CN-Celeb: A Challenging Chinese Speaker Recognition Dataset.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Long/Short-Term Utility Aware Optimal Selection of Manufacturing Service Composition Toward Industrial Internet Platforms.

[BibT_eX]

[DOI]

IEEE Trans. Ind. Informatics, 2019

Tailoring an Interpretable Neural Language Model.

[BibT_eX]

[DOI]

Yike Zhang

IEEE ACM Trans. Audio Speech Lang. Process., 2019

Aluminum alloy microstructural segmentation method based on simple noniterative clustering and adaptive density-based spatial clustering of applications with noise.

[BibT_eX]

[DOI]

J. Electronic Imaging, 2019

Aluminum alloy microstructural segmentation in micrograph with hierarchical parameter transfer learning method.

[BibT_eX]

[DOI]

J. Electronic Imaging, 2019

Speaker-Phonetic I-Vector Modeling for Text-Dependent Speaker Verification with Random Digit Strings.

[BibT_eX]

[DOI]

Shengyu Yao

Ruohua Zhou

IEICE Trans. Inf. Syst., 2019

Automatic Speech Recognition System with Output-Gate Projected Gated Recurrent Unit.

[BibT_eX]

[DOI]

Gaofeng Cheng

Ji Xu

IEICE Trans. Inf. Syst., 2019

Investigation of knowledge transfer approaches to improve the acoustic modeling of Vietnamese ASR system.

[BibT_eX]

[DOI]

IEEE CAA J. Autom. Sinica, 2019

Integrating the Data Augmentation Scheme with Various Classifiers for Acoustic Scene Modeling.

[BibT_eX]

[DOI]

CoRR, 2019

Consensus aware manufacturing service collaboration optimization under blockchain based Industrial Internet platform.

[BibT_eX]

[DOI]

Comput. Ind. Eng., 2019

Weighted Feature Fusion Based Emotional Recognition for Variable-length Speech using DNN.

[BibT_eX]

[DOI]

Sifan Wu

Fei Li

Proceedings of the 15th International Wireless Communications & Mobile Computing Conference, 2019

Multi-Accent Adaptation Based on Gate Mechanism.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Speaker-Invariant Feature-Mapping for Distant Speech Recognition via Adversarial Teacher-Student Learning.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Online Hybrid CTC/Attention Architecture for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Character-Aware Sub-Word Level Language Modeling for Uyghur and Turkish ASR.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Target Speaker Recovery and Recognition Network with Average x-Vector and Global Training.

[BibT_eX]

[DOI]

Wenjie Li

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Self-attention Based Prosodic Boundary Prediction for Chinese Speech Synthesis.

[BibT_eX]

[DOI]

Chunhui Lu

Proceedings of the IEEE International Conference on Acoustics, 2019

An Audio Scene Classification Framework with Embedded Filters and a DCT-based Temporal Module.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

The IOA-ThinkIT system for Blizzard Challenge 2019.

[BibT_eX]

[DOI]

Proceedings of the Blizzard Challenge 2019, Vienna, Austria, September 23, 2019, 2019

Utterance-level Permutation Invariant Training with Latency-controlled BLSTM for Single-channel Multi-talker Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

A Novel Method for Automatic Heart Murmur Diagnosis Using Phonocardiogram.

[BibT_eX]

[DOI]

Proceedings of the 2019 International Conference on Artificial Intelligence and Advanced Manufacturing, 2019

2018

Improve Multichannel Speech Recognition with Temporal and Spatial Information.

[BibT_eX]

[DOI]

Yu Zhang

Qingwei Zhao

IEICE Trans. Inf. Syst., 2018

Multichannel ASR with Knowledge Distillation and Generalized Cross Correlation Feature.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Space-Time Residual LSTM Architechture for Distant Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Evaluating Modeling Units and Sub-word Features in Language Models for Turkish ASR.

[BibT_eX]

[DOI]

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Multilingual Speech Recognition Training and Adaptation with Language-Specific Gate Units.

[BibT_eX]

[DOI]

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Improving Language Modeling with an Adversarial Critic for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Yike Zhang

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Investigation on the Combination of Batch Normalization and Dropout in BLSTM-based Acoustic Modeling for ASR.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Deep Convolutional Neural Network with Scalogram for Audio Scene Modeling.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Improving Multichannel Speech Recognition with Generalized Cross Correlation Inputs and Multitask Learning.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Attention-Based LSTM with Multi-Task Learning for Distant Speech Recognition.

[BibT_eX]

[DOI]

Yu Zhang

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

An improved lexicon generation method for mandarin speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 13th International Conference on Natural Computation, 2017

Fast variable-frame-rate decoding of speech recognition based on deep neural networks.

[BibT_eX]

[DOI]

Proceedings of the 13th International Conference on Natural Computation, 2017

2016

Improved End-to-End Speech Recognition Using Adaptive Per-Dimensional Learning Rate Methods.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2016

An unsupervised vocabulary selection technique for Chinese automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

2015

Noise Robust IOA/CAS Speech Separation and Recognition System For The Third 'CHIME' Challenge.

[BibT_eX]

[DOI]

CoRR, 2015

A bi-scale method of link prediction.

[BibT_eX]

[DOI]

Proceedings of the 11th International Conference on Natural Computation, 2015

An improvement of link prediction by combining local information and betweenness.

[BibT_eX]

[DOI]

Proceedings of the 11th International Conference on Natural Computation, 2015

A Method of Link Prediction Based on Betweenness.

[BibT_eX]

[DOI]

Proceedings of the Computational Social Networks - 4th International Conference, 2015

2014

Semi-supervised DNN training in meeting recognition.

[BibT_eX]

[DOI]

Yulan Liu

Thomas Hain

Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Enhanced Out of Vocabulary Word Detection Using Local Acoustic Information.

[BibT_eX]

[DOI]

Proceedings of the 2014 Tenth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, 2014

Using neural network front-ends on far field multiple microphones based speech recognition.

[BibT_eX]

[DOI]

Yulan Liu

Thomas Hain

Proceedings of the IEEE International Conference on Acoustics, 2014

2012

Optimization of Spoken Term Detection System.

[BibT_eX]

[DOI]

Chuanxu Wang

J. Appl. Math., 2012

2010

Enhancing the Robustness of the Posterior-Based Confidence Measures Using Entropy Information for Speech Recognition.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2010

2007

A fast fuzzy keyword spotting algorithm based on syllable confusion network.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Keyword Spotting Based on Syllable Confusion Network.

[BibT_eX]

[DOI]

Proceedings of the Third International Conference on Natural Computation, 2007

Real Context Model for Tone Recognition in Mandarin Conversational Telephone Speech.

[BibT_eX]

[DOI]

Proceedings of the Third International Conference on Natural Computation, 2007

A Spoken Dialogue System Based on Keyword Spotting Technology.

[BibT_eX]

[DOI]

Qingwei Zhao