Guangzhi Sun

Orcid: 0000-0002-5886-056X

According to our database1, Guangzhi Sun authored at least 41 papers between 2019 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Graph Neural Networks for Contextual ASR With the Tree-Constrained Pointer Generator.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR.
CoRR, 2024

video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models.
CoRR, 2024

Can Large Language Models Understand Spatial Audio?
CoRR, 2024

Bayesian WeakS-to-Strong from Text Classification to Generation.
CoRR, 2024

Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in Zero and Few-shot Learning.
CoRR, 2024

CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models.
CoRR, 2024

Matching domain experts by training from scratch on domain knowledge.
CoRR, 2024

M<sup>3</sup>AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset.
CoRR, 2024

Large language models surpass human experts in predicting neuroscience results.
CoRR, 2024

SALMONN: Towards Generic Hearing Abilities for Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Enhancing Quantised End-to-End ASR Models Via Personalisation.
Proceedings of the IEEE International Conference on Acoustics, 2024

Connecting Speech Encoder and Large Language Model for ASR.
Proceedings of the IEEE International Conference on Acoustics, 2024

Extending Large Language Models for Speech and Audio Captioning.
Proceedings of the IEEE International Conference on Acoustics, 2024

Parameter Efficient Finetuning for Speech Emotion Recognition and Domain Adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2024

Building Better AI Agents: A Provocation on the Utilisation of Persona in LLM-based Conversational Agents.
Proceedings of the ACM Conversational User Interfaces 2024, 2024

Speech-based Slot Filling using Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

M³AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Minimising Biasing Word Errors for Contextual ASR With the Tree-Constrained Pointer Generator.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch.
CoRR, 2023

Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models.
CoRR, 2023

Conditional Diffusion Model for Target Speaker Extraction.
CoRR, 2023

Affect Recognition in Conversations Using Large Language Models.
CoRR, 2023

Enhancing Quantised End-to-End ASR Models via Personalisation.
CoRR, 2023

Cross-Utterance Conditioned VAE for Speech Generation.
CoRR, 2023

Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited Annotated Data.
CoRR, 2023

Can Contextual Biasing Remain Effective with Whisper and GPT-2?
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

End-to-End Spoken Language Understanding with Tree-Constrained Pointer Generator.
Proceedings of the IEEE International Conference on Acoustics, 2023

Spectral Clustering-Aware Learning of Embeddings for Speaker Diarisation.
Proceedings of the IEEE International Conference on Acoustics, 2023

TorchAudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Combination of deep speaker embeddings for diarisation.
Neural Networks, 2021

Content-Aware Speaker Embeddings for Speaker Diarisation.
Proceedings of the IEEE International Conference on Acoustics, 2021

Transformer Language Models with LSTM-Based Cross-Utterance Information Representation.
Proceedings of the IEEE International Conference on Acoustics, 2021

Tree-Constrained Pointer Generator for End-to-End Contextual Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Cross-Utterance Language Models with Acoustic Error Sampling.
CoRR, 2020

Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior.
CoRR, 2020

Fully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech Synthesis.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Generating Diverse and Natural Text-to-Speech Samples Using a Quantized Fine-Grained VAE and Autoregressive Prosody Prior.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Speaker Diarisation Using 2D Self-attentive Combination of Embeddings.
Proceedings of the IEEE International Conference on Acoustics, 2019


  Loading...