Xin Wang
Orcid: 0000-0001-8246-0606Affiliations:
- Graduate University for Advanced Studies (SOKENDAI), National Institute of Informatics, Department of Informatics, Tokyo, Japan
According to our database1,
Xin Wang
authored at least 123 papers
between 2012 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
IEEE ACM Trans. Audio Speech Lang. Process., 2024
ZMM-TTS: Zero-Shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-Supervised Discrete Speech Representations.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Application of Prompt Learning Models in Identifying the Collaborative Problem Solving Skills in an Online Task.
Proc. ACM Hum. Comput. Interact., 2024
Joint speaker encoder and neural back-end model for fully end-to-end automatic speaker verification with multiple enrollment utterances.
Comput. Speech Lang., 2024
CoRR, 2024
CoRR, 2024
CoRR, 2024
Malacopula: adversarial automatic speaker verification attacks using a neural-based generalised Hammerstein model.
CoRR, 2024
CoRR, 2024
Adapting General Disentanglement-Based Speaker Anonymization for Enhanced Emotion Preservation.
CoRR, 2024
Revisiting and Improving Scoring Fusion for Spoofing-aware Speaker Verification Using Compositional Data Analysis.
CoRR, 2024
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios.
CoRR, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Spoofing Attack Augmentation: Can Differently-Trained Attack Models Improve Generalisation?
Proceedings of the IEEE International Conference on Acoustics, 2024
Can Large-Scale Vocoded Spoofed Data Improve Speech Spoofing Countermeasure with a Self-Supervised Front End?
Proceedings of the IEEE International Conference on Acoustics, 2024
2023
Using iterative adaptation and dynamic mask for child speech extraction under real-world multilingual conditions.
Speech Commun., July, 2023
The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
IEEE ACM Trans. Audio Speech Lang. Process., 2023
IEEE ACM Trans. Audio Speech Lang. Process., 2023
DDSP-based Neural Waveform Synthesis of Polyphonic Guitar Performance from String-wise MIDI Input.
CoRR, 2023
Language-independent speaker anonymization using orthogonal Householder neural network.
CoRR, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Spoofed Training Data for Speech Spoofing Countermeasure Can Be Efficiently Created Using Neural Vocoders.
Proceedings of the IEEE International Conference on Acoustics, 2023
Can Knowledge of End-to-End Text-to-Speech Models Improve Neural Midi-to-Audio Synthesis Systems?
Proceedings of the IEEE International Conference on Acoustics, 2023
Hiding Speaker's Sex in Speech Using Zero-Evidence Speaker Representation in an Analysis/Synthesis Pipeline.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the 2023 Symposium on Eye Tracking Research and Applications, 2023
2022
IEEE ACM Trans. Audio Speech Lang. Process., 2022
The PartialSpoof Database and Countermeasures for the Detection of Short Generated Audio Segments Embedded in a Speech Utterance.
CoRR, 2022
Investigating Active-Learning-Based Training Data Selection for Speech Spoofing Countermeasure.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
Automatic Speaker Verification Spoofing and Deepfake Detection Using Wav2vec 2.0 and Data Augmentation.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022
Language-Independent Speaker Anonymization Approach Using Self-Supervised Pre-Trained Models.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022
Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Attention Back-End for Automatic Speaker Verification with Multiple Enrollment Utterances.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
2021
ASVspoof 2019: Spoofing Countermeasures for the Detection of Synthesized, Converted and Replayed Speech.
IEEE Trans. Biom. Behav. Identity Sci., 2021
Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis.
Comput. Speech Lang., 2021
CoRR, 2021
ASVspoof 2021: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan.
CoRR, 2021
CoRR, 2021
Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances.
CoRR, 2021
Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021
Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust Waveform Generation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Combining Oculo-motor Indices to Measure Cognitive Load of Synthetic Speech in Noisy Listening Conditions.
Proceedings of the 2021 Symposium on Eye Tracking Research and Applications, 2021
Evaluating Synthetic Speech Workload with Oculo-motor Indices: Preliminary Observations for Japanese Speech.
Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies, 2021
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021
2020
A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F<sub>0</sub> Model for Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2020
IEEE ACM Trans. Audio Speech Lang. Process., 2020
Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals.
IEEE ACM Trans. Audio Speech Lang. Process., 2020
ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech.
Comput. Speech Lang., 2020
Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis.
CoRR, 2020
Modeling of Rakugo Speech and Its Limitations: Toward Speech Synthesis That Entertains Audiences.
IEEE Access, 2020
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020
Using Cyclic Noise as the Source Signal for Neural Source-Filter-Based Speech Waveform Model.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Transferring Neural Speech Waveform Synthesizers to Musical Instrument Sounds Generation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Effect of Choice of Probability Distribution, Randomness, and Search Methods for Alignment Modeling in Sequence-to-Sequence Text-to-Speech Synthesis Using Hard Alignment.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
A Study of Child Speech Extraction Using Joint Speech Enhancement and Separation in Realistic Conditions.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Zero-Shot Multi-Speaker Text-To-Speech with State-Of-The-Art Neural Speaker Embeddings.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
2019
Transformation of low-quality device-recorded speech to high-quality speech using improved SEGAN model.
CoRR, 2019
Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments.
CoRR, 2019
Initial investigation of encoder-decoder end-to-end TTS using marginalization of monotonic hard alignments.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019
Rakugo speech synthesis using segment-to-segment neural transduction and style tokens - toward speech synthesis for entertaining audiences.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019
Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Training Multi-Speaker Neural Text-to-Speech Systems Using Speaker-Imbalanced Speech Corpora.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Joint Training Framework for Text-to-Speech and Voice Conversion Using Multi-Source Tacotron and WaveNet.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Investigation of Enhanced Tacotron Text-to-speech Synthesis Systems with Self-attention for Pitch Accent Language.
Proceedings of the IEEE International Conference on Acoustics, 2019
Neural Source-filter-based Waveform Model for Statistical Parametric Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Audiovisual Speaker Conversion: Jointly and Simultaneously Transforming Facial Expression and Acoustic Characteristics.
Proceedings of the IEEE International Conference on Acoustics, 2019
2018
Fundamental Frequency Modeling for Neural-Network-Based Statistical Parametric Speech Synthesis.
PhD thesis, 2018
IEEE ACM Trans. Audio Speech Lang. Process., 2018
Speech Commun., 2018
Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis.
CoRR, 2018
Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Investigating Accuracy of Pitch-accent Annotations in Neural Network-based Speech Synthesis and Denoising Effects.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
2017
An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech Synthesis.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017
2016
Investigation of Using Continuous Representation of Various Linguistic Units in Neural Network Based Text-to-Speech Synthesis.
IEICE Trans. Inf. Syst., 2016
Concept-to-Speech generation with knowledge sharing for acoustic modelling and utterance filtering.
Comput. Speech Lang., 2016
A Comparative Study of the Performance of HMM, DNN, and RNN based Speech Synthesis Systems Trained on Very Large Speaker-Dependent Corpora.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016
Enhance the Word Vector with Prosodic Information for the Recurrent Neural Network Based TTS System.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System Using Deep Recurrent Neural Networks.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Using Text and Acoustic Features in Predicting Glottal Excitation Waveforms for Parametric Speech Synthesis with Recurrent Neural Networks.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
A full training framework of cross-stream dependence modelling for HMM-based singing voice synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the Blizzard Challenge 2016, Cuppertino, CA, USA, September 16, 2016, 2016
2014
Concept-to-speech generation by integrating syntagmatic features into HMM-based speech synthesis.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
2013
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
2012
Cross-stream dependency modeling using continuous F0 model for HMM-based speech synthesis.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012