Zejun Ma
Orcid: 0009-0009-6731-0541
According to our database1,
Zejun Ma
authored at least 105 papers
between 2010 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
BiFSMNv2: Pushing Binary Neural Networks for Keyword Spotting to Real-Network Performance.
IEEE Trans. Neural Networks Learn. Syst., August, 2024
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models.
CoRR, 2024
MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and Instruction Tuning.
CoRR, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Extending Multilingual ASR to New Languages Using Supplementary Encoder and Decoder Components.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
2023
IEEE Trans. Pattern Anal. Mach. Intell., June, 2023
IEEE Trans. Pattern Anal. Mach. Intell., March, 2023
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models.
CoRR, 2023
CoRR, 2023
Language-specific Acoustic Boundary Learning for Mandarin-English Code-switching Speech Recognition.
CoRR, 2023
CoRR, 2023
GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation.
CoRR, 2023
Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System.
CoRR, 2023
Towards Building Voice-based Conversational Recommender Systems: Datasets, Potential Solutions and Prospects.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023
Proceedings of the 31st ACM International Conference on Multimedia, 2023
S2CD: Self-heuristic Speaker Content Disentanglement for Any-to-Any Voice Conversion.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Disentangling the Contribution of Non-native Speech in Automated Pronunciation Assessment.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Phonetic and Prosody-aware Self-supervised Learning Approach for Non-native Fluency Scoring.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Language-specific Boundary Learning for Improving Mandarin-English Code-switching Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023
Dynamics Analysis of Large-Scale Transmission Tower-Line Coupled System under Measured Typhoon Load.
Proceedings of the 6th International Conference on Information Technologies and Electrical Engineering, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Internal Language Model Estimation Based Adaptive Language Model Fusion for Domain Adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Leveraging Phone-Level Linguistic-Acoustic Similarity For Utterance-Level Pronunciation Scoring.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Improving Large-Scale Deep Biasing With Phoneme Features and Text-Only Data in Streaming Transducer.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
CIF-PT: Bridging Speech and Text Representations for Spoken Language Understanding via Continuous Integrate-and-Fire Pre-Training.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
2022
Sequence-Level Speaker Change Detection With Difference-Based Continuous Integrate-and-Fire.
IEEE Signal Process. Lett., 2022
Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features.
CoRR, 2022
CoRR, 2022
A Novel Chinese Dialect TTS Frontend with Non-Autoregressive Neural Machine Translation.
CoRR, 2022
Improving Non-native Word-level Pronunciation Scoring with Phone-level Mixup Data Augmentation and Multi-source Information.
CoRR, 2022
CoRR, 2022
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022, 2022
Proceedings of the ICMR '22: International Conference on Multimedia Retrieval, Newark, NJ, USA, June 27, 2022
Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022
Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022
Proceedings of the 23rd International Society for Music Information Retrieval Conference, 2022
Towards high-fidelity singing voice conversion with acoustic reference and contrastive predictive coding.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Internal Language Model Estimation Through Explicit Context Vector Learning for Attention-based Encoder-decoder ASR.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Using Fluency Representation Learned from Sequential Raw Features for Improving Non-native Fluency Scoring.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
The Volcspeech System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022
Language Adaptive Cross-Lingual Speech Representation Learning with Sparse Sharing Sub-Networks.
Proceedings of the IEEE International Conference on Acoustics, 2022
Improving Pseudo-Label Training For End-To-End Speech Recognition Using Gradient Mask.
Proceedings of the IEEE International Conference on Acoustics, 2022
Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection.
Proceedings of the IEEE International Conference on Acoustics, 2022
Bytecover2: Towards Dimensionality Reduction of Latent Embedding for Efficient Cover Song Identification.
Proceedings of the IEEE International Conference on Acoustics, 2022
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022
Zero-Shot Audio Source Separation through Query-Based Learning from Weakly-Labeled Data.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
2021
Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech.
CoRR, 2021
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021
ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Attention-Based Cross-Modal Fusion for Audio-Visual Voice Activity Detection in Musical Video Streams.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Emitting Word Timings with HMM-Free End-to-End System in Automatic Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Rule-Embedded Network for Audio-Visual Voice Activity Detection in Live Musical Video Streams.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Singing Melody Extraction from Polyphonic Music based on Spectral Correlation Modeling.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
2020
CoRR, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
A Unified Sequence-to-Sequence Front-End Model for Mandarin Text-to-Speech Synthesis.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
2019
Learning Hierarchical Representations for Expressive Speaking Style in End-to-End Speech Synthesis.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
2017
2012
Unsupervised training of subspace gaussian mixture models for conversational telephone speech recognition.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
2011
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
2010
Distributed link-aware rate allocation for R-D optimal multiple video streaming over wireless networks.
Proceedings of the International Conference on Wireless Communications and Signal Processing, 2010