Jiangyan Yi

Orcid: 0000-0003-2422-4618

According to our database1, Jiangyan Yi authored at least 134 papers between 2016 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Dual-Branch Knowledge Distillation for Noise-Robust Synthetic Speech Detection.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Dynamic Ensemble Teacher-Student Distillation Framework for Light-Weight Fake Audio Detection.
IEEE Signal Process. Lett., 2024

CFAD: A Chinese dataset for fake audio detection.
Speech Commun., 2024

SceneFake: An initial dataset and benchmarks for scene fake audio detection.
Pattern Recognit., 2024

DGSD: Dynamical graph self-distillation for EEG-based auditory spatial attention detection.
Neural Networks, 2024

Spatial reconstructed local attention Res2Net with F0 subband for fake speech detection.
Neural Networks, 2024

Open-vocabulary Multimodal Emotion Recognition: Dataset, Metric, and Benchmark.
CoRR, 2024

WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification.
CoRR, 2024

VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing.
CoRR, 2024

ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild.
CoRR, 2024

Enhancing Partially Spoofed Audio Localization with Boundary-aware Attention Mechanism.
CoRR, 2024

An Unsupervised Domain Adaptation Method for Locating Manipulated Region in partially fake Audio.
CoRR, 2024

AffectGPT: Dataset and Framework for Explainable Multimodal Emotion Recognition.
CoRR, 2024

Frequency-mix Knowledge Distillation for Fake Speech Detection.
CoRR, 2024

RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake Detection.
CoRR, 2024

TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking.
CoRR, 2024

EVDA: Evolving Deepfake Audio Detection Continual Learning Benchmark.
CoRR, 2024

Emotion selectable end-to-end text-based speech editing.
Artif. Intell., 2024

MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition.
Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, 2024

Utilizing Speaker Profiles for Impersonation Audio Detection.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

MSFNet: Multi-Scale Fusion Network for Brain-Controlled Speaker Extraction.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Multi-Scale Permutation Entropy for Audio Deepfake Detection.
Proceedings of the IEEE International Conference on Acoustics, 2024

Fewer-Token Neural Speech Codec with Time-Invariant Codes.
Proceedings of the IEEE International Conference on Acoustics, 2024

NLoPT: N-gram Enhanced Low-Rank Task Adaptive Pre-training for Efficient Language Model Adaption.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

EmoFake: An Initial Dataset for Emotion Fake Audio Detection.
Proceedings of the Chinese Computational Linguistics - 23rd China National Conference, 2024

Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms.
Proceedings of the Chinese Computational Linguistics - 23rd China National Conference, 2024

What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Subband fusion of complex spectrogram for fake speech detection.
Speech Commun., November, 2023

Transfer knowledge for punctuation prediction via adversarial training.
Speech Commun., April, 2023

Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection.
CoRR, 2023

Learning to Behave Like Clean Speech: Dual-Branch Knowledge Distillation for Noise-Robust Fake Audio Detection.
CoRR, 2023

Fewer-token Neural Speech Codec with Time-invariant Codes.
CoRR, 2023

Controllable Residual Speaker Representation for Voice Conversion.
CoRR, 2023

Audio Deepfake Detection: A Survey.
CoRR, 2023

Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection.
CoRR, 2023

Low-rank Adaptation Method for Wav2vec2-based Fake Audio Detection.
CoRR, 2023

Adaptive Fake Audio Detection with Low-Rank Model Squeezing.
CoRR, 2023

TO-Rawnet: Improving RawNet with TCN and Orthogonal Regularization for Fake Audio Detection.
CoRR, 2023

MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning.
CoRR, 2023

UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion.
CoRR, 2023

MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

TO-Rawnet: Improving RawNet with TCN and Orthogonal Regularization for Fake Audio Detection.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Detection of Cross-Dataset Fake Audio Based on Prosodic and Pronunciation Features.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection.
Proceedings of the International Conference on Machine Learning, 2023

Learning From Yourself: A Self-Distillation Method For Fake Speech Detection.
Proceedings of the IEEE International Conference on Acoustics, 2023

GCC-Speaker: Target Speaker Localization with Optimal Speaker-Dependent Weighting in Multi-Speaker Scenarios.
Proceedings of the IEEE International Conference on Acoustics, 2023

Adaptive Fake Audio Detection with Low-Rank Model Squeezing.
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

ADD 2023: the Second Audio Deepfake Detection Challenge.
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

Low-rank Adaptation Method for Wav2vec2-based Fake Audio Detection.
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

TST: Time-Sparse Transducer for Automatic Speech Recognition.
Proceedings of the Artificial Intelligence - Third CAAI International Conference, 2023

The VIBVG Speech Synthesis System for Blizzard Challenge 2023.
Proceedings of the 18th Blizzard Challenge Workshop, Grenoble, France, August 29, 2023, 2023

2022
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

NeuralDPS: Neural Deterministic Plus Stochastic Model With Multiband Excitation for Noise-Controllable Waveform Generation.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Hybrid Autoregressive and Non-Autoregressive Transformer Models for Speech Recognition.
IEEE Signal Process. Lett., 2022

SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection.
CoRR, 2022

EmoFake: An Initial Dataset for Emotion Fake Audio Detection.
CoRR, 2022

System Fingerprints Detection for DeepFake Audio: An Initial Dataset and Investigation.
CoRR, 2022

ADD 2022: the First Audio Deep Synthesis Detection Challenge.
CoRR, 2022

Reducing language context confusion for end-to-end code-switching automatic speech recognition.
CoRR, 2022

An Initial Investigation for Detecting Vocoder Fingerprints of Fake Audio.
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features.
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

Fully Automated End-to-End Fake Audio Detection.
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

Singing-Tacotron: Global Duration Control Attention and Dynamic Filter for End-to-end Singing Voice Synthesis.
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

DDAM '22: 1st International Workshop on Deepfake Detection for Audio Multimedia.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

reducing multilingual context confusion for end-to-end code-switching automatic speech recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Robust Deep Audio Splicing Detection Method via Singularity Detection Feature.
Proceedings of the IEEE International Conference on Acoustics, 2022

ADD 2022: the first Audio Deep Synthesis Detection Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

Context-Aware Mask Prediction Network for End-to-End Text-Based Speech Editing.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Integrating Knowledge Into End-to-End Speech Recognition From External Text-Only Data.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Half-Truth: A Partially Fake Audio Detection Dataset.
CoRR, 2021

TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition.
CoRR, 2021

Fast End-to-End Speech Recognition via a Non-Autoregressive Model and Cross-Modal Knowledge Transferring from BERT.
CoRR, 2021

Rnn-transducer With Language Bias For End-to-end Mandarin-English Code-switching Speech Recognition.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Hierarchically Attending Time-Frequency and Channel Features for Improving Speaker Verification.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Text Enhancement for Paragraph Processing in End-to-End Code-switching TTS.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Half-Truth: A Partially Fake Audio Detection Dataset.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Continual Learning for Fake Audio Detection.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-to-End Spelling Correction Conditioned on Acoustic Feature for Code-Switching Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Patnet : A Phoneme-Level Autoregressive Transformer Network for Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2021

Prosody and Voice Factorization for Few-Shot Speaker Adaptation in the Challenge M2voc 2021.
Proceedings of the IEEE International Conference on Acoustics, 2021

Bi-Level Style and Prosody Decoupling Modeling for Personalized End-to-End Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2021

Decoupling Pronunciation and Language for End-to-End Code-Switching Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

One In A Hundred: Selecting the Best Predicted Sequence from Numerous Candidates for Speech Recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020
A Public Chinese Dataset for Language Model Adaptation.
J. Signal Process. Syst., 2020

End-to-End Post-Filter for Speech Separation With Deep Attention Fusion Features.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Simultaneous Denoising and Dereverberation Using Deep Embedding Features.
CoRR, 2020

Adversarial Transfer Learning for Punctuation Restoration.
CoRR, 2020

Deep Attention Fusion Feature for Speech Separation with End-to-End Post-filter Method.
CoRR, 2020

Spatial and spectral deep attention fusion for multi-channel speech separation using deep embedding features.
CoRR, 2020

Focal Loss for Punctuation Prediction.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Spoken Content and Voice Factorization for Few-Shot Speaker Adaptation.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Bi-Level Speaker Supervision for One-Shot Speech Synthesis.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Non-Autoregressive End-to-End TTS with Coarse-to-Fine Decoding.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Dynamic Speaker Representations Adjustment and Decoder Factorization for Speaker Adaptation in End-to-End Speech Synthesis.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Joint Training for Simultaneous Speech Denoising and Dereverberation with Deep Embedding Representations.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Gated Recurrent Fusion of Spatial and Spectral Features for Multi-Channel Speech Separation with Deep Embedding Representations.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Synchronous Transformers for end-to-end Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Focusing on Attention: Prosody Transfer and Adaptative Optimization Strategy for Multi-Speaker End-to-End Speech Synthesis.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Forward-Backward Decoding Sequence for Regularizing End-to-End TTS.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Language-Adversarial Transfer Learning for Low-Resource Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Integrating Whole Context to Sequence-to-sequence Speech Recognition.
CoRR, 2019

Self-Attention Transducers for End-to-End Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Language-invariant Bottleneck Features from Adversarial End-to-end Acoustic Models for Low Resource Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Self-attention Based Model for Punctuation Prediction Using Word and Speech Embeddings.
Proceedings of the IEEE International Conference on Acoustics, 2019

Focal Loss for End-to-end Short Utterances Chinese Dialect Identification.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Batch Normalization based Unsupervised Speaker Adaptation for Acoustic Models.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Distilling Knowledge for Distant Speech Recognition via Parallel Data.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Hypersphere Embedding and Additive Margin for Query-by-example Keyword Spotting.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Noise Prior Knowledge Learning for Speech Enhancement via Gated Convolutional Generative Adversarial Network.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Voice Activity Detection Based on Time-Delay Neural Networks.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018
CTC Regularized Model Adaptation for Improving LSTM RNN Based Multi-Accent Mandarin Speech Recognition.
J. Signal Process. Syst., 2018

Distilling Knowledge Using Parallel Data for Far-field Speech Recognition.
CoRR, 2018

Utterance-level Permutation Invariant Training with Discriminative Learning for Single Channel Speech Separation.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

CLMAD: A Chinese Language Model Adaptation Dataset.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Research on Dynamic and Static Fusion Polymorphic Gesture Recognition Algorithm for Interactive Teaching Interface.
Proceedings of the Cognitive Systems and Signal Processing - 4th International Conference, 2018

Adversarial Multilingual Training for Low-Resource Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

End-to-End Continuous Emotion Recognition from Video Using 3D Convlstm Networks.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Continuous Multimodal Emotion Prediction Based on Long Short Term Memory Recurrent Neural Network.
Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, Mountain View, CA, USA, October 23, 2017

Distilling Knowledge from an Ensemble of Models for Punctuation Prediction.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016
Improving accented Mandarin speech recognition by using recurrent neural network based language model adaptation.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

End-to-end keywords spotting based on connectionist temporal classification for Mandarin.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Recurrent Neural Network Based Language Model Adaptation for Accent Mandarin Speech.
Proceedings of the Pattern Recognition - 7th Chinese Conference, 2016

Improving BLSTM RNN based Mandarin speech recognition using accent dependent bottleneck features.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016


  Loading...