Ye Bai

Orcid: 0000-0001-5533-6909

According to our database¹, Ye Bai authored at least 52 papers between 2011 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

Seed-Music: A Unified Framework for High Quality and Controlled Music Generation.

[BibT_eX]

[DOI]

CoRR, 2024

NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training.

[BibT_eX]

[DOI]

CoRR, 2024

Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking.

[BibT_eX]

[DOI]

CoRR, 2024

Jointly Recognizing Speech and Singing Voices Based on Multi-Task Audio Source Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

PolyVoice: Language Models for Speech to Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

Transfer knowledge for punctuation prediction via adversarial training.

[BibT_eX]

[DOI]

Speech Commun., April, 2023

PolyVoice: Language Models for Speech to Speech Translation.

[BibT_eX]

[DOI]

CoRR, 2023

HoloSinger: Semantics and Music Driven Motion Generation with Octahedral Holographic Projection.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Image-driven Audio-visual Universal Source Separation.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2022

ADD 2022: the First Audio Deep Synthesis Detection Challenge.

[BibT_eX]

[DOI]

CoRR, 2022

Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

K-Converter: An Unsupervised Singing Voice Conversion System.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

ADD 2022: the first Audio Deep Synthesis Detection Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Integrating Knowledge Into End-to-End Speech Recognition From External Text-Only Data.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Half-Truth: A Partially Fake Audio Detection Dataset.

[BibT_eX]

[DOI]

CoRR, 2021

TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

Fast End-to-End Speech Recognition via a Non-Autoregressive Model and Cross-Modal Knowledge Transferring from BERT.

[BibT_eX]

[DOI]

CoRR, 2021

Rnn-transducer With Language Bias For End-to-end Mandarin-English Code-switching Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Hierarchically Attending Time-Frequency and Channel Features for Improving Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Half-Truth: A Partially Fake Audio Detection Dataset.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Continual Learning for Fake Audio Detection.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-to-End Spelling Correction Conditioned on Acoustic Feature for Code-Switching Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Decoupling Pronunciation and Language for End-to-End Code-Switching Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

One In A Hundred: Selecting the Best Predicted Sequence from Numerous Candidates for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020

A Public Chinese Dataset for Language Model Adaptation.

[BibT_eX]

[DOI]

J. Signal Process. Syst., 2020

Deep imitator: Handwriting calligraphy imitation via deep attention networks.

[BibT_eX]

[DOI]

Pattern Recognit., 2020

Adversarial Transfer Learning for Punctuation Restoration.

[BibT_eX]

[DOI]

CoRR, 2020

Focal Loss for Punctuation Prediction.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Synchronous Transformers for end-to-end Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Language-Adversarial Transfer Learning for Low-Resource Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2019

Autonomous "Figure-8" Flights of a Quadcopter: Experimental Datasets.

[BibT_eX]

[DOI]

Srikanth Gururajan

Ye Bai

Data, 2019

Integrating Whole Context to Sequence-to-sequence Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2019

Research on the effect of psychological stress intervention in music students based on Diffie-Hellman key exchange algorithm.

[BibT_eX]

[DOI]

Ye Bai

Clust. Comput., 2019

Self-Attention Transducers for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Language-invariant Bottleneck Features from Adversarial End-to-end Acoustic Models for Low Resource Speech Recognition.

[BibT_eX]

[DOI]

Jiangyan Yi

Jianhua Tao

Ye Bai

Proceedings of the IEEE International Conference on Acoustics, 2019

Hypersphere Embedding and Additive Margin for Query-by-example Keyword Spotting.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Noise Prior Knowledge Learning for Speech Enhancement via Gated Convolutional Generative Adversarial Network.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Voice Activity Detection Based on Time-Delay Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018

Utterance-level Permutation Invariant Training with Discriminative Learning for Single Channel Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

CLMAD: A Chinese Language Model Adaptation Dataset.

[BibT_eX]

[DOI]

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Adversarial Multilingual Training for Low-Resource Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2016

End-to-end keywords spotting based on connectionist temporal classification for Mandarin.

[BibT_eX]

[DOI]

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

2013

Study of a speech coding algorithm based on a contact conduction transmitter in a complicated water area.

[BibT_eX]

[DOI]

Proceedings of the Conference on Underwater Networks and Systems, 2013

2011

Method of the Road Lines Recognition in the Maps of Digital Material Based on Improvemented BP Neural Network.

[BibT_eX]

[DOI]

Proceedings of the Advances in Computer Science, 2011

The Heavy Mineral Analysis Based on Immune Self-organizing Neural Network.

[BibT_eX]

[DOI]

Proceedings of the Advances in Computer Science, 2011

Ye Bai

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...