Xixin Wu

Orcid: 0000-0001-9543-1572

According to our database1, Xixin Wu authored at least 114 papers between 2012 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
A multiscale analysis-assisted two-stage reduced-order deep learning approach for effective thermal conductivity of arbitrary contrast heterogeneous materials.
Eng. Appl. Artif. Intell., 2024

A Survey on the Honesty of Large Language Models.
CoRR, 2024

Towards Within-Class Variation in Alzheimer's Disease Detection from Spontaneous Speech.
CoRR, 2024

AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions.
CoRR, 2024

Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC.
CoRR, 2024

Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation.
CoRR, 2024

Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions.
CoRR, 2024

SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis.
CoRR, 2024

SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models.
CoRR, 2024

Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models.
CoRR, 2024

Large Language Model-based FMRI Encoding of Language Functions for Subjects with Neurocognitive Disorder.
CoRR, 2024

Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System.
CoRR, 2024

Autoregressive Speech Synthesis without Vector Quantization.
CoRR, 2024

Purple-teaming LLMs with Adversarial Defender Training.
CoRR, 2024

Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models.
CoRR, 2024

UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner.
CoRR, 2024

CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction.
CoRR, 2024

Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder.
CoRR, 2024

SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models.
CoRR, 2024

Injecting Linguistic Knowledge Into BERT for Dialogue State Tracking.
IEEE Access, 2024

Rethinking Machine Ethics - Can LLMs Perform Moral Reasoning through the Lens of Moral Theories?
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy.
Proceedings of the International Joint Conference on Neural Networks, 2024

UniAudio: Towards Universal Audio Generation with Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization.
Proceedings of the IEEE International Conference on Acoustics, 2024

Unifying One-Shot Voice Conversion and Cloning with Disentangled Speech Representations.
Proceedings of the IEEE International Conference on Acoustics, 2024

Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts.
Proceedings of the IEEE International Conference on Acoustics, 2024

Stylespeech: Self-Supervised Style Enhancing with VQ-VAE-Based Pre-Training for Expressive Audiobook Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2024

Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction.
Proceedings of the IEEE International Conference on Acoustics, 2024

Cross-Speaker Encoding Network for Multi-Talker Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

SimCalib: Graph Neural Network Calibration Based on Similarity between Nodes.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Hiformer: Sequence Modeling Networks With Hierarchical Attention Mechanisms.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

MSStyleTTS: Multi-Scale Style Modeling With Hierarchical Context Information for Expressive Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

MSMC-TTS: Multi-Stage Multi-Codebook VQ-VAE Based Neural TTS.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Estimating the Uncertainty in Emotion Class Labels With Utterance-Specific Dirichlet Priors.
IEEE Trans. Affect. Comput., 2023

UniAudio: An Audio Foundation Model Toward Universal Audio Generation.
CoRR, 2023

QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning.
CoRR, 2023

SAIL: Search-Augmented Instruction Learning.
CoRR, 2023

Interpretable Unified Language Checking.
CoRR, 2023

SpeechTripleNet: End-to-End Disentangled Speech Representation Learning for Content, Timbre and Prosody.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Integrated and Enhanced Pipeline System to Support Spoken Language Analytics for Screening Neurocognitive Disorders.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

PunCantonese: A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Sidecar Separator Can Convert A Single-Talker Speech Recognition System to A Multi-Talker One.
Proceedings of the IEEE International Conference on Acoustics, 2023

VF-Taco2: Towards Fast and Lightweight Synthesis for Autoregressive Models with Variation Autoencoder and Feature Distillation.
Proceedings of the IEEE International Conference on Acoustics, 2023

A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Leveraging Pretrained Representations With Task-Related Keywords for Alzheimer's Disease Detection.
Proceedings of the IEEE International Conference on Acoustics, 2023

Search Augmented Instruction Learning.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

2022
Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations.
CoRR, 2022

Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using β-VAE.
CoRR, 2022

Disentangled Speech Representation Learning for One-Shot Cross-Lingual Voice Conversion Using ß-VAE.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Speech-Vision Based Multi-Modal AI Control of a Magnetic Anchored and Actuated Endoscope.
Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2022

Tackling Spoofing-Aware Speaker Verification with Multi-Model Fusion.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Inferring Speaking Styles from Multi-modal Conversational Context by Multi-scale Relational Graph Convolutional Networks.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

HILvoice:Human-in-the-Loop Style Selection for Elder-Facing Speech Synthesis.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Spoofing-Aware Speaker Verification by Multi-Level Fusion.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Exploring linguistic feature and model combination for speech recognition based automatic AD detection.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

The CUHK-Tencent Speaker Diarization System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

Characterizing the Adversarial Vulnerability of Speech self-Supervised Learning.
Proceedings of the IEEE International Conference on Acoustics, 2022

Neural Architecture Search for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2022

A Multitask Learning Framework for Speaker Change Detection with Content Information from Unsupervised Speech Decomposition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Grounded Dialogue Generation with Cross-encoding Re-ranker, Grounding Span Prediction, and Passage Dropout.
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering, 2022

2021
Speech Emotion Recognition Using Sequential Capsule Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Exemplar-Based Emotive Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Any-to-Many Voice Conversion With Location-Relative Sequence-to-Sequence Modeling.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Attention Forcing for Machine Translation.
CoRR, 2021

Should Ensemble Members Be Calibrated?
CoRR, 2021

Improved End-to-End Dysarthric Speech Recognition via Meta-learning Based Model Re-initialization.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

VAENAR-TTS: Variational Auto-Encoder Based Non-AutoRegressive Text-to-Speech Synthesis.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Channel-Wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Deliberation-Based Multi-Pass Speech Synthesis.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

2020
Deep segmental phonetic posterior-grams based discovery of non-categories in L2 English speech.
CoRR, 2020

Bayesian x-vector: Bayesian Neural Network based x-vector System for Speaker Verification.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Speaker-Aware Linear Discriminant Analysis in Speaker Verification.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Ensemble Approaches for Uncertainty in Spoken Language Assessment.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Non-Native Children's Automatic Speech Recognition: The INTERSPEECH 2020 Shared Task ALTA Systems.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-To-End Voice Conversion Via Cross-Modal Knowledge Distillation for Dysarthric Speech Reconstruction.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

End-To-End Accent Conversion Without Using Native Utterances.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Adversarial Attacks on GMM I-Vector Based Speaker Verification Systems.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Maximizing Mutual Information for Tacotron.
CoRR, 2019

Comparative Study of Parametric and Representation Uncertainty Modeling for Recurrent Neural Network Language Models.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Unsupervised Methods for Audio Classification from Lecture Discussion Recordings.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Jointly Trained Conversion Model and WaveNet Vocoder for Non-Parallel Voice Conversion Using Mel-Spectrograms and Phonetic Posteriorgrams.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

LF-MMI Training of Bayesian and Gaussian Process Time Delay Neural Networks for Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Recurrent Neural Network Language Model Training Using Natural Gradient.
Proceedings of the IEEE International Conference on Acoustics, 2019

Speech Emotion Recognition Using Capsule Networks.
Proceedings of the IEEE International Conference on Acoustics, 2019

Quasi-fully Convolutional Neural Network with Variational Inference for Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2019

Bayesian and Gaussian Process Neural Networks for Large Vocabulary Continuous Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Learning Discriminative Features from Spectrograms Using Center Loss for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

End-to-end Code-switched TTS with Mix of Monolingual Recordings.
Proceedings of the IEEE International Conference on Acoustics, 2019

Coupling Global and Local Context for Unsupervised Aspect Extraction.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

2018
The HCCL-CUHK System for the Voice Conversion Challenge 2018.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Speech Super-Resolution Using Parallel WaveNet.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Development of the CUHK Dysarthric Speech Recognition System for the UA Speech Corpus.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Unsupervised Discovery of Non-native Phonetic Patterns in L2 English Speech for Mispronunciation Detection and Diagnosis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Integrating Articulatory Features into Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech.
Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018

Feature Based Adaptation for Speaking Style Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Intonation classification for L2 English speech using multi-distribution deep neural networks.
Comput. Speech Lang., 2017

2015
Acoustic to articulatory mapping with deep neural network.
Multim. Tools Appl., 2015

Understanding speaking styles of internet speech data with LSTM and low-resource training.
Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015

2014
Automatic speech data clustering with human perception based weighted distance.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

2012
Adaptive named entity recognition based on conditional random fields with automatic updated dynamic gazetteers.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012


  Loading...