Xulong Zhang

Orcid: 0000-0001-7005-992X

Affiliations:
  • Lab of Large Audio Model (LLAM), Shanghai, China
  • Ping An Technology, Shenzhen, China
  • Fudan University, Shanghai, China (PhD 2021)


According to our database1, Xulong Zhang authored at least 69 papers between 2013 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Semi-Supervised Self-Learning Enhanced Music Emotion Recognition.
CoRR, 2024

EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization.
Proceedings of the International Joint Conference on Neural Networks, 2024

ConTuner: Singing Voice Beautifying with Pitch and Expressiveness Condition.
Proceedings of the International Joint Conference on Neural Networks, 2024

QLSC: A Query Latent Semantic Calibrator for Robust Extractive Question Answering.
Proceedings of the International Joint Conference on Neural Networks, 2024

EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning.
Proceedings of the International Joint Conference on Neural Networks, 2024

MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion.
Proceedings of the International Joint Conference on Neural Networks, 2024

Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation.
Proceedings of the International Joint Conference on Neural Networks, 2024

RREH: Reconstruction Relations Embedded Hashing for Semi-paired Cross-Modal Retrieval.
Proceedings of the Advanced Intelligent Computing Technology and Applications, 2024

Enhancing Emotion Recognition in Conversation Through Emotional Cross-Modal Fusion and Inter-class Contrastive Learning.
Proceedings of the Advanced Intelligent Computing Technology and Applications, 2024

EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model.
Proceedings of the IEEE International Conference on Acoustics, 2024

ED-TTS: Multi-Scale Emotion Modeling Using Cross-Domain Emotion Diarization for Emotional Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2024

Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2024

IDEAW: Robust Neural Audio Watermarking with Invertible Dual-Embedding.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Medical Speech Symptoms Classification via Disentangled Representation.
Proceedings of the 27th International Conference on Computer Supported Cooperative Work in Design, 2024

RSET: Remapping-Based Sorting Method for Emotion Transfer Speech Synthesis.
Proceedings of the Web and Big Data - 8th International Joint Conference, 2024

2023
Melody Generation from Lyrics with Local Interpretability.
ACM Trans. Multim. Comput. Commun. Appl., 2023

DiffTalker: Co-driven audio-image diffusion for talking faces via intermediate landmarks.
CoRR, 2023

Machine Unlearning Methodology base on Stochastic Teacher Network.
CoRR, 2023

Symbolic & Acoustic: Multi-domain Music Emotion Modeling for Instrumental Music.
CoRR, 2023

Sparks of Large Audio Models: A Survey and Outlook.
CoRR, 2023

PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Investigation of Music Emotion Recognition Based on Segmented Semi-Supervised Learning.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech Model.
Proceedings of the International Joint Conference on Neural Networks, 2023

FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework.
Proceedings of the 35th IEEE International Conference on Tools with Artificial Intelligence, 2023

AOSR-Net: All-in-One Sandstorm Removal Network.
Proceedings of the 35th IEEE International Conference on Tools with Artificial Intelligence, 2023

Contrastive Latent Space Reconstruction Learning for Audio-Text Retrieval.
Proceedings of the 35th IEEE International Conference on Tools with Artificial Intelligence, 2023

Improving EEG-based Emotion Recognition by Fusing Time-Frequency and Spatial Representations.
Proceedings of the IEEE International Conference on Acoustics, 2023

Dynamic Alignment Mask CTC: Improved Mask CTC With Aligned Cross Entropy.
Proceedings of the IEEE International Conference on Acoustics, 2023

QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2023

VQ-CL: Learning Disentangled Speech Representations with Contrastive Learning and Vector Quantization.
Proceedings of the IEEE International Conference on Acoustics, 2023

Learning Speech Representations with Flexible Hidden Feature Dimensions.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Music Genre Classification from multi-modal Properties of Music and Genre Correlations Perspective.
Proceedings of the IEEE International Conference on Acoustics, 2023

DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation.
Proceedings of the IEEE Intl Conf on Parallel & Distributed Processing with Applications, 2023

CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding.
Proceedings of the IEEE Intl Conf on Parallel & Distributed Processing with Applications, 2023

CLN-VC: Text-Free Voice Conversion Based on Fine-Grained Style Control and Contrastive Learning with Negative Samples Augmentation.
Proceedings of the IEEE Intl Conf on Parallel & Distributed Processing with Applications, 2023

Research on the Impact of Executive Shareholding on New Investment in Enterprises Based on Multivariable Linear Regression Model.
Proceedings of the Web and Big Data - 7th International Joint Conference, 2023

A Hierarchy-Based Analysis Approach for Blended Learning: A Case Study with Chinese Students.
Proceedings of the Web and Big Data - 7th International Joint Conference, 2023

Stock Volatility Prediction Based on Transformer Model Using Mixed-Frequency Data.
Proceedings of the Web and Big Data - 7th International Joint Conference, 2023

An Empirical Study of Attention Networks for Semantic Segmentation.
Proceedings of the Web and Big Data - 7th International Joint Conference, 2023

Symbolic and Acoustic: Multi-domain Music Emotion Modeling for Instrumental Music.
Proceedings of the Advanced Data Mining and Applications - 19th International Conference, 2023

Voice Conversion with Denoising Diffusion Probabilistic GAN Models.
Proceedings of the Advanced Data Mining and Applications - 19th International Conference, 2023

Machine Unlearning Methodology Based on Stochastic Teacher Network.
Proceedings of the Advanced Data Mining and Applications - 19th International Conference, 2023

2022
Boosting Star-GANs for Voice Conversion with Contrastive Discriminator.
CoRR, 2022

Linguistic-Enhanced Transformer with CTC Embedding for Speech Recognition.
Proceedings of the 18th International Conference on Mobility, Sensing and Networking, 2022

Improving Speech Representation Learning via Speech-level and Phoneme-level Masking Approach.
Proceedings of the 18th International Conference on Mobility, Sensing and Networking, 2022

Improving Imbalanced Text Classification with Dynamic Curriculum Learning.
Proceedings of the 18th International Conference on Mobility, Sensing and Networking, 2022

Semi-Supervised Learning Based on Reference Model for Low-resource TTS.
Proceedings of the 18th International Conference on Mobility, Sensing and Networking, 2022

MetaSpeech: Speech Effects Switch Along with Environment for Metaverse.
Proceedings of the 18th International Conference on Mobility, Sensing and Networking, 2022

Adapitch: Adaption Multi-Speaker Text-to-Speech Conditioned on Pitch Disentangling with Untranscribed Data.
Proceedings of the 18th International Conference on Mobility, Sensing and Networking, 2022

Tiny-Sepformer: A Tiny Time-Domain Transformer Network For Speech Separation.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

MetaSID: Singer Identification with Domain Adaptation for Metaverse.
Proceedings of the International Joint Conference on Neural Networks, 2022

Singer Identification for Metaverse with Timbral and Middle-Level Perceptual Features.
Proceedings of the International Joint Conference on Neural Networks, 2022

TDASS: Target Domain Adaptation Speech Synthesis Framework for Multi-speaker Low-Resource TTS.
Proceedings of the International Joint Conference on Neural Networks, 2022

MDCNN-SID: Multi-scale Dilated Convolution Network for Singer Identification.
Proceedings of the International Joint Conference on Neural Networks, 2022

SUSing: SU-net for Singing Voice Synthesis.
Proceedings of the International Joint Conference on Neural Networks, 2022

Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar.
Proceedings of the 34th IEEE International Conference on Tools with Artificial Intelligence, 2022

Boosting StarGANs for Voice Conversion with Contrastive Discriminator.
Proceedings of the Neural Information Processing - 29th International Conference, 2022

nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-Shot Multi-speaker text-to-speech.
Proceedings of the IEEE International Conference on Acoustics, 2022

DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised Learning.
Proceedings of the IEEE International Conference on Acoustics, 2022

Avqvc: One-Shot Voice Conversion By Vector Quantization With Applying Contrastive Learning.
Proceedings of the IEEE International Conference on Acoustics, 2022

Shallow Diffusion Motion Model for Talking Face Generation from Speech.
Proceedings of the Web and Big Data - 6th International Joint Conference, 2022

2021
Singer Identification Using Deep Timbre Feature Learning with KNN-NET.
Proceedings of the IEEE International Conference on Acoustics, 2021

Cyclegean: Cycle Generative Enhanced Adversarial Network for Voice Conversion.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

TGAVC: Improving Autoencoder Voice Conversion with Text-Guided and Adversarial Training.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Music Artist Classification with WaveNet Classifier for Raw Waveform Audio Data.
CoRR, 2020

Comparison for Improvements of Singing Voice Detection System Based on Vocal Separation.
CoRR, 2020

2017
流行音乐主旋律提取技术综述 (Review on Main Melody Extraction from Pop Music).
计算机科学, 2017

2013
Probability-Symmetric Storage Allocation for Distributed Storage Systems Based on Network Coding.
Int. J. Online Biomed. Eng., 2013


  Loading...