Ziyang Ma

Orcid: 0000-0002-8195-3262

According to our database1, Ziyang Ma authored at least 63 papers between 2013 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Towards Weakly Supervised Text-to-Audio Grounding.
IEEE Trans. Multim., 2024

E$^{3}$TTS: End-to-End Text-Based Speech Editing TTS System and Its Applications.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization.
CoRR, 2024

CTC-Assisted LLM-Based Contextual ASR.
CoRR, 2024

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup.
CoRR, 2024

Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap.
CoRR, 2024

SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs.
CoRR, 2024

DRCap: Decoding CLAP Latents with Retrieval-augmented Generation for Zero-shot Audio Captioning.
CoRR, 2024

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching.
CoRR, 2024

CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought.
CoRR, 2024

NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization.
CoRR, 2024

Progressive Residual Extraction based Pre-training for Speech Representation Learning.
CoRR, 2024

Foundation Models for Music: A Survey.
CoRR, 2024

Language Model Can Listen While Speaking.
CoRR, 2024

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens.
CoRR, 2024

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs.
CoRR, 2024

TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers.
CoRR, 2024

GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement.
CoRR, 2024

EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark.
CoRR, 2024

LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR.
CoRR, 2024

MaLa-ASR: Multimedia-Assisted LLM-Based ASR.
CoRR, 2024

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series.
CoRR, 2024

MuPT: A Generative Symbolic Music Pretrained Transformer.
CoRR, 2024

The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge.
CoRR, 2024

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model.
CoRR, 2024

HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling.
CoRR, 2024

ChatMusician: Understanding and Generating Music Intrinsically with LLM.
CoRR, 2024

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity.
CoRR, 2024

ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering.
CoRR, 2024

1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem.
Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition.
Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, 2024

EAT: Self-Supervised Pre-Training with Efficient Audio Transformer.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

BAT: Learning to Reason about Spatial Sounds with Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Hourglass-AVSR: Down-Up Sampling-Based Computational Efficiency Model for Audio-Visual Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS.
Proceedings of the IEEE International Conference on Acoustics, 2024

Leveraging Speech PTM, Text LLM, And Emotional TTS For Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

VoiceFlow: Efficient Text-To-Speech with Rectified Flow Matching.
Proceedings of the IEEE International Conference on Acoustics, 2024

Source-free Domain Adaptation for Aspect-based Sentiment Analysis.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024


emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT.
CoRR, 2023

Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation.
CoRR, 2023

LTCR: Long-Text Chinese Rumor Detection Dataset.
CoRR, 2023

Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing based Data Augmentation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Front-End Adapter: Adapting Front-End Input of Speech Based Self-Supervised Learning for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Fast-Hubert: an Efficient Training Framework for Self-Supervised Speech Representation Learning.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
TESSP: Text-Enhanced Self-Supervised Speech Pre-training.
CoRR, 2022

2021
Feature-weighted ordinal classification for predicting drug response in multiple myeloma.
Bioinform., 2021

Joint Optimization of Computation Offloading, Data Compression, Energy Harvesting, and Application Scenarios in Fog Computing.
IEEE Access, 2021

Hierarchical Deep Residual Reasoning for Temporal Moment Localization.
Proceedings of the MMAsia '21: ACM Multimedia Asia, Gold Coast, Australia, December 1, 2021

2020
A Blockchain-Based Trust Management With Conditional Privacy-Preserving Announcement Scheme for VANETs.
IEEE Internet Things J., 2020

The Application of TED Talk Strategies in Freshmen Library Orientation Lecture.
Proceedings of the ICIEI 2020: The 5th International Conference on Information and Education Innovations, 2020

2015
Bounded-Distortion Metric Learning.
CoRR, 2015

Video Super-Resolution via Deep Draft-Ensemble Learning.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Handling motion blur in multi-frame super-resolution.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014
Real-time and robust hand tracking with a single depth camera.
Vis. Comput., 2014

2013
Coherence-enhancing line drawing for color images.
Sci. China Inf. Sci., 2013

Constant Time Weighted Median Filtering for Stereo Matching and Beyond.
Proceedings of the IEEE International Conference on Computer Vision, 2013


  Loading...