Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

ADD 2023: the Second Audio Deepfake Detection Challenge.

[BibT_eX]

[DOI]

Jiangyan Yi

Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

Low-rank Adaptation Method for Wav2vec2-based Fake Audio Detection.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

TST: Time-Sparse Transducer for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Artificial Intelligence - Third CAAI International Conference, 2023

The VIBVG Speech Synthesis System for Blizzard Challenge 2023.

[BibT_eX]

[DOI]

Proceedings of the 18th Blizzard Challenge Workshop, Grenoble, France, August 29, 2023, 2023

2022

CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2022

NeuralDPS: Neural Deterministic Plus Stochastic Model With Multiband Excitation for Noise-Controllable Waveform Generation.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2022

Hybrid Autoregressive and Non-Autoregressive Transformer Models for Speech Recognition.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2022

SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection.

[BibT_eX]

[DOI]

CoRR, 2022

EmoFake: An Initial Dataset for Emotion Fake Audio Detection.

[BibT_eX]

[DOI]

CoRR, 2022

System Fingerprints Detection for DeepFake Audio: An Initial Dataset and Investigation.

[BibT_eX]

[DOI]

CoRR, 2022

ADD 2022: the First Audio Deep Synthesis Detection Challenge.

[BibT_eX]

[DOI]

CoRR, 2022

Reducing language context confusion for end-to-end code-switching automatic speech recognition.

[BibT_eX]

[DOI]

CoRR, 2022

An Initial Investigation for Detecting Vocoder Fingerprints of Fake Audio.

[BibT_eX]

[DOI]

Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features.

[BibT_eX]

[DOI]

Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

Fully Automated End-to-End Fake Audio Detection.

[BibT_eX]

[DOI]

Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

Singing-Tacotron: Global Duration Control Attention and Dynamic Filter for End-to-end Singing Voice Synthesis.

[BibT_eX]

[DOI]

Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

DDAM '22: 1st International Workshop on Deepfake Detection for Audio Multimedia.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

reducing multilingual context confusion for end-to-end code-switching automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Robust Deep Audio Splicing Detection Method via Singularity Detection Feature.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

ADD 2022: the first Audio Deep Synthesis Detection Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Context-Aware Mask Prediction Network for End-to-End Text-Based Speech Editing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Integrating Knowledge Into End-to-End Speech Recognition From External Text-Only Data.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Half-Truth: A Partially Fake Audio Detection Dataset.

[BibT_eX]

[DOI]

CoRR, 2021

TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

Fast End-to-End Speech Recognition via a Non-Autoregressive Model and Cross-Modal Knowledge Transferring from BERT.

[BibT_eX]

[DOI]

CoRR, 2021

Rnn-transducer With Language Bias For End-to-end Mandarin-English Code-switching Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Hierarchically Attending Time-Frequency and Channel Features for Improving Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Text Enhancement for Paragraph Processing in End-to-End Code-switching TTS.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Half-Truth: A Partially Fake Audio Detection Dataset.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Continual Learning for Fake Audio Detection.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-to-End Spelling Correction Conditioned on Acoustic Feature for Code-Switching Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Patnet : A Phoneme-Level Autoregressive Transformer Network for Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Prosody and Voice Factorization for Few-Shot Speaker Adaptation in the Challenge M2voc 2021.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Bi-Level Style and Prosody Decoupling Modeling for Personalized End-to-End Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Decoupling Pronunciation and Language for End-to-End Code-Switching Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

One In A Hundred: Selecting the Best Predicted Sequence from Numerous Candidates for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020

A Public Chinese Dataset for Language Model Adaptation.

[BibT_eX]

[DOI]

J. Signal Process. Syst., 2020

End-to-End Post-Filter for Speech Separation With Deep Attention Fusion Features.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Simultaneous Denoising and Dereverberation Using Deep Embedding Features.

[BibT_eX]

[DOI]

CoRR, 2020

Adversarial Transfer Learning for Punctuation Restoration.

[BibT_eX]

[DOI]

CoRR, 2020

Deep Attention Fusion Feature for Speech Separation with End-to-End Post-filter Method.

[BibT_eX]

[DOI]

CoRR, 2020

Spatial and spectral deep attention fusion for multi-channel speech separation using deep embedding features.

[BibT_eX]

[DOI]

CoRR, 2020

Focal Loss for Punctuation Prediction.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Spoken Content and Voice Factorization for Few-Shot Speaker Adaptation.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Bi-Level Speaker Supervision for One-Shot Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Non-Autoregressive End-to-End TTS with Coarse-to-Fine Decoding.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Dynamic Speaker Representations Adjustment and Decoder Factorization for Speaker Adaptation in End-to-End Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Joint Training for Simultaneous Speech Denoising and Dereverberation with Deep Embedding Representations.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Gated Recurrent Fusion of Spatial and Spectral Features for Multi-Channel Speech Separation with Deep Embedding Representations.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Synchronous Transformers for end-to-end Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Focusing on Attention: Prosody Transfer and Adaptative Optimization Strategy for Multi-Speaker End-to-End Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Forward-Backward Decoding Sequence for Regularizing End-to-End TTS.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2019

Language-Adversarial Transfer Learning for Low-Resource Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2019

Integrating Whole Context to Sequence-to-sequence Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2019

Self-Attention Transducers for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Language-invariant Bottleneck Features from Adversarial End-to-end Acoustic Models for Low Resource Speech Recognition.

[BibT_eX]

[DOI]

Jiangyan Yi

Jianhua Tao

Ye Bai

Proceedings of the IEEE International Conference on Acoustics, 2019

Self-attention Based Model for Punctuation Prediction Using Word and Speech Embeddings.

[BibT_eX]

[DOI]

Jiangyan Yi

Jianhua Tao

Proceedings of the IEEE International Conference on Acoustics, 2019

Focal Loss for End-to-end Short Utterances Chinese Dialect Identification.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Batch Normalization based Unsupervised Speaker Adaptation for Acoustic Models.

[BibT_eX]

[DOI]

Jiangyan Yi

Jianhua Tao

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Distilling Knowledge for Distant Speech Recognition via Parallel Data.

[BibT_eX]

[DOI]

Jiangyan Yi

Jianhua Tao

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Hypersphere Embedding and Additive Margin for Query-by-example Keyword Spotting.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Noise Prior Knowledge Learning for Speech Enhancement via Gated Convolutional Generative Adversarial Network.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Voice Activity Detection Based on Time-Delay Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018

CTC Regularized Model Adaptation for Improving LSTM RNN Based Multi-Accent Mandarin Speech Recognition.

[BibT_eX]

[DOI]

J. Signal Process. Syst., 2018

Distilling Knowledge Using Parallel Data for Far-field Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2018

Utterance-level Permutation Invariant Training with Discriminative Learning for Single Channel Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

CLMAD: A Chinese Language Model Adaptation Dataset.

[BibT_eX]

[DOI]

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Research on Dynamic and Static Fusion Polymorphic Gesture Recognition Algorithm for Interactive Teaching Interface.

[BibT_eX]

[DOI]

Proceedings of the Cognitive Systems and Signal Processing - 4th International Conference, 2018

Adversarial Multilingual Training for Low-Resource Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

End-to-End Continuous Emotion Recognition from Video Using 3D Convlstm Networks.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Continuous Multimodal Emotion Prediction Based on Long Short Term Memory Recurrent Neural Network.

[BibT_eX]

[DOI]

Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, Mountain View, CA, USA, October 23, 2017

Distilling Knowledge from an Ensemble of Models for Punctuation Prediction.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016

Improving accented Mandarin speech recognition by using recurrent neural network based language model adaptation.

[BibT_eX]

[DOI]

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

End-to-end keywords spotting based on connectionist temporal classification for Mandarin.

[BibT_eX]

[DOI]

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Recurrent Neural Network Based Language Model Adaptation for Accent Mandarin Speech.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition - 7th Chinese Conference, 2016

Improving BLSTM RNN based Mandarin speech recognition using accent dependent bottleneck features.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Jiangyan Yi

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...