Siqi Zheng

Orcid: 0009-0002-6787-4223

According to our database1, Siqi Zheng authored at least 57 papers between 2018 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
UbiPhysio: Support Daily Functioning, Fitness, and Rehabilitation with Action Understanding and Feedback in Natural Language.
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., March, 2024

Intercity connectivity and urban innovation.
Comput. Environ. Urban Syst., 2024

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup.
CoRR, 2024

OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation.
CoRR, 2024

MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization.
CoRR, 2024

Exploring Text-Queried Sound Event Detection with Audio Source Separation.
CoRR, 2024

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.
CoRR, 2024

Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization.
CoRR, 2024

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens.
CoRR, 2024

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs.
CoRR, 2024

Accompanied Singing Voice Synthesis with Fully Text-controlled Melody.
CoRR, 2024

Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers.
CoRR, 2024

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec.
CoRR, 2024

AudioLCM: Text-to-Audio Generation with Latent Consistency Models.
CoRR, 2024

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity.
CoRR, 2024

AudioLCM: Efficient and High-Quality Text-to-Audio Generation with Minimal Inference Steps.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

VoiceTuner: Self-Supervised Pre-training and Efficient Fine-tuning For Voice Generation.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

FunCodec: A Fundamental, Reproducible and Integrable Open-Source Toolkit for Neural Speech Codec.
Proceedings of the IEEE International Conference on Acoustics, 2024

Loss Masking Is Not Needed In Decoder-Only Transformer For Discrete-Token-Based ASR.
Proceedings of the IEEE International Conference on Acoustics, 2024

PepperPose: Full-Body Pose Estimation with a Companion Robot.
Proceedings of the CHI Conference on Human Factors in Computing Systems, 2024

2023
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT.
CoRR, 2023

Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation.
CoRR, 2023

Self-Distillation Network with Ensemble Prototypes: Learning Robust Speaker Representations without Supervision.
CoRR, 2023

Improving BERT with Hybrid Pooling Network and Drop Mask.
CoRR, 2023

3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement.
CoRR, 2023

CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Pushing the Limits of Self-Supervised Speaker Verification using Regularized Distillation Framework.
Proceedings of the IEEE International Conference on Acoustics, 2023

Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

A Two-Layer Human-in-the-Loop Optimization Framework for Customizing Lower-Limb Exoskeleton Assistance.
Proceedings of the American Control Conference, 2023

DopplerBAS: Binaural Audio Synthesis Addressing Doppler Effect.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Multi-Source Time Series Remote Sensing Feature Selection and Urban Forest Extraction Based on Improved Artificial Bee Colony.
Remote. Sens., 2022

Contextual Expressive Text-to-Speech.
CoRR, 2022

Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios.
CoRR, 2022

Deep Representation Decomposition for Rate-Invariant Speaker Verification.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

PRISM: Pre-trained Indeterminate Speaker Representation Model for Speaker Diarization and Speaker Verification.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Label-Dividing Gated Graph Neural Network for Hierarchical Text Classification.
Proceedings of the International Joint Conference on Neural Networks, 2022

PoNet: Pooling Network for Efficient Token Mixing in Long Sequences.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Reformulating Speaker Diarization As Community Detection With Emphasis On Topological Structure.
Proceedings of the IEEE International Conference on Acoustics, 2022

Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

M2Met: The Icassp 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

Graph Convolutional Network Based Semi-Supervised Learning on Multi-Speaker Meeting Data.
Proceedings of the IEEE International Conference on Acoustics, 2022

Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021
Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information.
CoRR, 2021

BeamTransformer: Microphone Array-based Overlapping Speech Detection.
CoRR, 2021

Measuring daily-life fear perception change: a computational study in the context of COVID-19.
CoRR, 2021

Estimating air quality co-benefits of energy transition using machine learning.
CoRR, 2021

Investigation of Spatial-Acoustic Features for Overlapping Speech Detection in Multiparty Meetings.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Real-Time Speaker Diarization System Based on Spatial Spectrum.
Proceedings of the IEEE International Conference on Acoustics, 2021

Cam: Context-Aware Masking for Robust Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Phonetically-Aware Coupled Network For Short Duration Text-Independent Speaker Verification.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019
Time-resolved protein activation by proximal decaging in living systems.
Nat., 2019

Autoencoder-Based Semi-Supervised Curriculum Learning for Out-of-Domain Speaker Verification.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Towards a Fault-Tolerant Speaker Verification System: A Regularization Approach to Reduce the Condition Number.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Factors Influencing University Students' Intention to Redeem Digital Takeaway Coupons - Analysis Based on A Survey in China.
Proceedings of the ICIT 2019, 2019

2018
A Noise-Robust Self-Adaptive Multitarget Speaker Detection System.
Proceedings of the 24th International Conference on Pattern Recognition, 2018


  Loading...