Ziyang Ma

Orcid: 0000-0002-8195-3262

According to our database¹, Ziyang Ma authored at least 68 papers between 2013 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model.

[BibT_eX]

[DOI]

CoRR, January, 2025

MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization.

[BibT_eX]

[DOI]

CoRR, January, 2025

Multiscale Memory Autoencoder and Spatial Filtering for Hyperspectral Anomaly Detection.

[BibT_eX]

[DOI]

IEEE Geosci. Remote. Sens. Lett., 2025

2024

Towards Weakly Supervised Text-to-Audio Grounding.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

E$^{3}$TTS: End-to-End Text-Based Speech Editing TTS System and Its Applications.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization.

[BibT_eX]

[DOI]

CoRR, 2024

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup.

[BibT_eX]

[DOI]

CoRR, 2024

Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap.

[BibT_eX]

[DOI]

CoRR, 2024

SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

DRCap: Decoding CLAP Latents with Retrieval-augmented Generation for Zero-shot Audio Captioning.

[BibT_eX]

[DOI]

CoRR, 2024

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching.

[BibT_eX]

[DOI]

CoRR, 2024

CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought.

[BibT_eX]

[DOI]

CoRR, 2024

Progressive Residual Extraction based Pre-training for Speech Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2024

Foundation Models for Music: A Survey.

[BibT_eX]

[DOI]

CoRR, 2024

Language Model Can Listen While Speaking.

[BibT_eX]

[DOI]

CoRR, 2024

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens.

[BibT_eX]

[DOI]

CoRR, 2024

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers.

[BibT_eX]

[DOI]

CoRR, 2024

GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement.

[BibT_eX]

[DOI]

CoRR, 2024

EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark.

[BibT_eX]

[DOI]

CoRR, 2024

LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR.

[BibT_eX]

[DOI]

CoRR, 2024

MaLa-ASR: Multimedia-Assisted LLM-Based ASR.

[BibT_eX]

[DOI]

CoRR, 2024

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series.

[BibT_eX]

[DOI]

CoRR, 2024

MuPT: A Generative Symbolic Music Pretrained Transformer.

[BibT_eX]

[DOI]

CoRR, 2024

The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge.

[BibT_eX]

[DOI]

CoRR, 2024

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model.

[BibT_eX]

[DOI]

CoRR, 2024

HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling.

[BibT_eX]

[DOI]

CoRR, 2024

ChatMusician: Understanding and Generating Music Intrinsically with LLM.

[BibT_eX]

[DOI]

CoRR, 2024

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity.

[BibT_eX]

[DOI]

CoRR, 2024

ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering.

[BibT_eX]

[DOI]

CoRR, 2024

CTC-Assisted LLM-Based Contextual ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

NDVQ: Robust Neural Audio Codec With Normal Distribution-Based Vector Quantization.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, 2024

Improving Emotion Recognition with Pre-Trained Models, Multimodality, and Contextual Information.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

The X-Lance Technical Report for Interspeech 2024 Speech Processing using Discrete Speech Unit Challenge.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

EAT: Self-Supervised Pre-Training with Efficient Audio Transformer.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

BAT: Learning to Reason about Spatial Sounds with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Hourglass-AVSR: Down-Up Sampling-Based Computational Efficiency Model for Audio-Visual Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Leveraging Speech PTM, Text LLM, And Emotional TTS For Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

VoiceFlow: Efficient Text-To-Speech with Rectified Flow Matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Source-free Domain Adaptation for Aspect-based Sentiment Analysis.

[BibT_eX]

[DOI]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

ChatMusician: Understanding and Generating Music Intrinsically with LLM.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT.

[BibT_eX]

[DOI]

CoRR, 2023

Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation.

[BibT_eX]

[DOI]

CoRR, 2023

LTCR: Long-Text Chinese Rumor Detection Dataset.

[BibT_eX]

[DOI]

CoRR, 2023

Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing based Data Augmentation.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Front-End Adapter: Adapting Front-End Input of Speech Based Self-Supervised Learning for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Fast-Hubert: an Efficient Training Framework for Self-Supervised Speech Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

TESSP: Text-Enhanced Self-Supervised Speech Pre-training.

[BibT_eX]

[DOI]

CoRR, 2022

2021

Feature-weighted ordinal classification for predicting drug response in multiple myeloma.

[BibT_eX]

[DOI]

Ziyang Ma

Jeongyoun Ahn

Bioinform., 2021

Joint Optimization of Computation Offloading, Data Compression, Energy Harvesting, and Application Scenarios in Fog Computing.

[BibT_eX]

[DOI]

IEEE Access, 2021

Hierarchical Deep Residual Reasoning for Temporal Moment Localization.

[BibT_eX]

[DOI]

Proceedings of the MMAsia '21: ACM Multimedia Asia, Gold Coast, Australia, December 1, 2021

2020

A Blockchain-Based Trust Management With Conditional Privacy-Preserving Announcement Scheme for VANETs.

[BibT_eX]

[DOI]

IEEE Internet Things J., 2020

The Application of TED Talk Strategies in Freshmen Library Orientation Lecture.

[BibT_eX]

[DOI]

Proceedings of the ICIEI 2020: The 5th International Conference on Information and Education Innovations, 2020

2015

Bounded-Distortion Metric Learning.

[BibT_eX]

[DOI]

CoRR, 2015

Video Super-Resolution via Deep Draft-Ensemble Learning.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Handling motion blur in multi-frame super-resolution.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014

Real-time and robust hand tracking with a single depth camera.

[BibT_eX]

[DOI]

Ziyang Ma

Enhua Wu

Vis. Comput., 2014

2013

Coherence-enhancing line drawing for color images.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2013

Constant Time Weighted Median Filtering for Stereo Matching and Beyond.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2013

Ziyang Ma

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...