Xixin Wu
Orcid: 0000-0001-9543-1572
According to our database1,
Xixin Wu
authored at least 114 papers
between 2012 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
A multiscale analysis-assisted two-stage reduced-order deep learning approach for effective thermal conductivity of arbitrary contrast heterogeneous materials.
Eng. Appl. Artif. Intell., 2024
Towards Within-Class Variation in Alzheimer's Disease Detection from Spontaneous Speech.
CoRR, 2024
AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions.
CoRR, 2024
CoRR, 2024
Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation.
CoRR, 2024
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions.
CoRR, 2024
SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis.
CoRR, 2024
SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models.
CoRR, 2024
Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models.
CoRR, 2024
Large Language Model-based FMRI Encoding of Language Functions for Subjects with Neurocognitive Disorder.
CoRR, 2024
Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System.
CoRR, 2024
Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models.
CoRR, 2024
UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner.
CoRR, 2024
CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction.
CoRR, 2024
Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder.
CoRR, 2024
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models.
CoRR, 2024
IEEE Access, 2024
Rethinking Machine Ethics - Can LLMs Perform Moral Reasoning through the Lens of Moral Theories?
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024
Proceedings of the International Joint Conference on Neural Networks, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Unifying One-Shot Voice Conversion and Cloning with Disentangled Speech Representations.
Proceedings of the IEEE International Conference on Acoustics, 2024
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts.
Proceedings of the IEEE International Conference on Acoustics, 2024
Stylespeech: Self-Supervised Style Enhancing with VQ-VAE-Based Pre-Training for Expressive Audiobook Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2024
Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
IEEE ACM Trans. Audio Speech Lang. Process., 2023
MSStyleTTS: Multi-Scale Style Modeling With Hierarchical Context Information for Expressive Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
IEEE ACM Trans. Audio Speech Lang. Process., 2023
Estimating the Uncertainty in Emotion Class Labels With Utterance-Specific Dirichlet Priors.
IEEE Trans. Affect. Comput., 2023
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning.
CoRR, 2023
SpeechTripleNet: End-to-End Disentangled Speech Representation Learning for Content, Timbre and Prosody.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Integrated and Enhanced Pipeline System to Support Spoken Language Analytics for Screening Neurocognitive Disorders.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
PunCantonese: A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
A Sidecar Separator Can Convert A Single-Talker Speech Recognition System to A Multi-Talker One.
Proceedings of the IEEE International Conference on Acoustics, 2023
VF-Taco2: Towards Fast and Lightweight Synthesis for Autoregressive Models with Variation Autoencoder and Feature Distillation.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Leveraging Pretrained Representations With Task-Related Keywords for Alzheimer's Disease Detection.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023
2022
Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations.
CoRR, 2022
Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using β-VAE.
CoRR, 2022
Disentangled Speech Representation Learning for One-Shot Cross-Lingual Voice Conversion Using ß-VAE.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
Speech-Vision Based Multi-Modal AI Control of a Magnetic Anchored and Actuated Endoscope.
Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2022
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022
Inferring Speaking Styles from Multi-modal Conversational Context by Multi-scale Relational Graph Convolutional Networks.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Exploring linguistic feature and model combination for speech recognition based automatic AD detection.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
The CUHK-Tencent Speaker Diarization System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2022
A Multitask Learning Framework for Speaker Change Detection with Content Information from Unsupervised Speech Decomposition.
Proceedings of the IEEE International Conference on Acoustics, 2022
Grounded Dialogue Generation with Cross-encoding Re-ranker, Grounding Span Prediction, and Passage Dropout.
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering, 2022
2021
IEEE ACM Trans. Audio Speech Lang. Process., 2021
IEEE ACM Trans. Audio Speech Lang. Process., 2021
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Improved End-to-End Dysarthric Speech Recognition via Meta-learning Based Model Re-initialization.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
VAENAR-TTS: Variational Auto-Encoder Based Non-AutoRegressive Text-to-Speech Synthesis.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
2020
Deep segmental phonetic posterior-grams based discovery of non-categories in L2 English speech.
CoRR, 2020
Bayesian x-vector: Bayesian Neural Network based x-vector System for Speaker Verification.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Non-Native Children's Automatic Speech Recognition: The INTERSPEECH 2020 Shared Task ALTA Systems.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
End-To-End Voice Conversion Via Cross-Modal Knowledge Distillation for Dysarthric Speech Reconstruction.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
2019
Comparative Study of Parametric and Representation Uncertainty Modeling for Recurrent Neural Network Language Models.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Jointly Trained Conversion Model and WaveNet Vocoder for Non-Parallel Voice Conversion Using Mel-Spectrograms and Phonetic Posteriorgrams.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
LF-MMI Training of Bayesian and Gaussian Process Time Delay Neural Networks for Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Quasi-fully Convolutional Neural Network with Variational Inference for Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2019
Bayesian and Gaussian Process Neural Networks for Large Vocabulary Continuous Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019
Learning Discriminative Features from Spectrograms Using Center Loss for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019
2018
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Development of the CUHK Dysarthric Speech Recognition System for the UA Speech Corpus.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Unsupervised Discovery of Non-native Phonetic Patterns in L2 English Speech for Mispronunciation Detection and Diagnosis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Integrating Articulatory Features into Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech.
Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
2017
Intonation classification for L2 English speech using multi-distribution deep neural networks.
Comput. Speech Lang., 2017
2015
Understanding speaking styles of internet speech data with LSTM and low-resource training.
Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015
2014
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
2012
Adaptive named entity recognition based on conditional random fields with automatic updated dynamic gazetteers.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012