Xin Wang

Orcid: 0000-0001-8246-0606

Affiliations:
  • Graduate University for Advanced Studies (SOKENDAI), National Institute of Informatics, Department of Informatics, Tokyo, Japan


According to our database1, Xin Wang authored at least 123 papers between 2012 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
The VoicePrivacy 2022 Challenge: Progress and Perspectives in Voice Anonymisation.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

ZMM-TTS: Zero-Shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-Supervised Discrete Speech Representations.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Application of Prompt Learning Models in Identifying the Collaborative Problem Solving Skills in an Online Task.
Proc. ACM Hum. Comput. Interact., 2024

Joint speaker encoder and neural back-end model for fully end-to-end automatic speaker verification with multiple enrollment utterances.
Comput. Speech Lang., 2024

SpoofCeleb: Speech Deepfake Detection and SASV In The Wild.
CoRR, 2024

Audio Codec Augmentation for Robust Collaborative Watermarking of Speech Synthesis.
CoRR, 2024

Text-To-Speech Synthesis In The Wild.
CoRR, 2024

Spoofing-Aware Speaker Verification Robust Against Domain and Channel Mismatches.
CoRR, 2024

A Preliminary Case Study on Long-Form In-the-Wild Audio Spoofing Detection.
CoRR, 2024

Malacopula: adversarial automatic speaker verification attacks using a neural-based generalised Hammerstein model.
CoRR, 2024

ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale.
CoRR, 2024

Adapting General Disentanglement-Based Speaker Anonymization for Enhanced Emotion Preservation.
CoRR, 2024

A Benchmark for Multi-speaker Anonymization.
CoRR, 2024

Revisiting and Improving Scoring Fusion for Spoofing-aware Speaker Verification Using Compositional Data Analysis.
CoRR, 2024

An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios.
CoRR, 2024

Spoof Diarization: "What Spoofed When" in Partially Spoofed Audio.
CoRR, 2024

To what extent can ASV systems naturally defend against spoofing attacks?
CoRR, 2024

The VoicePrivacy 2024 Challenge Evaluation Plan.
CoRR, 2024

Synvox2: Towards A Privacy-Friendly Voxceleb2 Dataset.
Proceedings of the IEEE International Conference on Acoustics, 2024

Collaborative Watermarking for Adversarial Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2024

Spoofing Attack Augmentation: Can Differently-Trained Attack Models Improve Generalisation?
Proceedings of the IEEE International Conference on Acoustics, 2024

Can Large-Scale Vocoded Spoofed Data Improve Speech Spoofing Countermeasure with a Self-Supervised Front End?
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
Using iterative adaptation and dynamic mask for child speech extraction under real-world multilingual conditions.
Speech Commun., July, 2023

The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Speaker Anonymization Using Orthogonal Householder Neural Network.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Speaker-Text Retrieval via Contrastive Learning.
CoRR, 2023

DDSP-based Neural Waveform Synthesis of Polyphonic Guitar Performance from String-wise MIDI Input.
CoRR, 2023

Language-independent speaker anonymization using orthogonal Householder neural network.
CoRR, 2023

Range-Based Equal Error Rate for Spoof Localization.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Towards Single Integrated Spoofing-aware Speaker Verification Embeddings.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Spoofed Training Data for Speech Spoofing Countermeasure Can Be Efficiently Created Using Neural Vocoders.
Proceedings of the IEEE International Conference on Acoustics, 2023

Can Knowledge of End-to-End Text-to-Speech Models Improve Neural Midi-to-Audio Synthesis Systems?
Proceedings of the IEEE International Conference on Acoustics, 2023

Hiding Speaker's Sex in Speech Using Zero-Evidence Speaker Representation in an Analysis/Synthesis Pipeline.
Proceedings of the IEEE International Conference on Acoustics, 2023

Modelling Attention Levels with Ocular Responses in a Speech-in-Noise Recall Task.
Proceedings of the 2023 Symposium on Eye Tracking Research and Applications, 2023

2022
Privacy and Utility of X-Vector Based Speaker Anonymization.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

The VoicePrivacy 2020 Challenge: Results and findings.
Comput. Speech Lang., 2022

The VoicePrivacy 2020 Challenge Evaluation Plan.
CoRR, 2022

The PartialSpoof Database and Countermeasures for the Detection of Short Generated Audio Segments Embedded in a Speech Utterance.
CoRR, 2022

The VoicePrivacy 2022 Challenge Evaluation Plan.
CoRR, 2022

A Practical Guide to Logical Access Voice Presentation Attack Detection.
CoRR, 2022

Investigating Active-Learning-Based Training Data Selection for Speech Spoofing Countermeasure.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Automatic Speaker Verification Spoofing and Deepfake Detection Using Wav2vec 2.0 and Data Augmentation.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Language-Independent Speaker Anonymization Approach Using Self-Supervised Pre-Trained Models.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Investigating Self-Supervised Front Ends for Speech Spoofing Countermeasures.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Attention Back-End for Automatic Speaker Verification with Multiple Enrollment Utterances.
Proceedings of the IEEE International Conference on Acoustics, 2022

Estimating the Confidence of Speech Spoofing Countermeasure.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
ASVspoof 2019: Spoofing Countermeasures for the Detection of Synthesized, Converted and Replayed Speech.
IEEE Trans. Biom. Behav. Identity Sci., 2021

Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis.
Comput. Speech Lang., 2021

ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection.
CoRR, 2021

ASVspoof 2021: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan.
CoRR, 2021

Benchmarking and challenges in security and privacy for voice biometrics.
CoRR, 2021

Multi-Task Learning in Utterance-Level and Segmental-Level Spoof Detection.
CoRR, 2021

Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances.
CoRR, 2021

Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis.
Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust Waveform Generation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

An Initial Investigation for Detecting Partially Spoofed Audio.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-to-End Text-to-Speech Using Latent Duration Based on VQ-VAE.
Proceedings of the IEEE International Conference on Acoustics, 2021

How Similar or Different is Rakugo Speech Synthesizer to Professional Performers?
Proceedings of the IEEE International Conference on Acoustics, 2021

Combining Oculo-motor Indices to Measure Cognitive Load of Synthetic Speech in Noisy Listening Conditions.
Proceedings of the 2021 Symposium on Eye Tracking Research and Applications, 2021

Evaluating Synthetic Speech Workload with Oculo-motor Indices: Preliminary Observations for Japanese Speech.
Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies, 2021

A Multi-Level Attention Model for Evidence-Based Fact Checking.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

2020
A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F<sub>0</sub> Model for Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech.
Comput. Speech Lang., 2020

Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis.
CoRR, 2020

Modeling of Rakugo Speech and Its Limitations: Toward Speech Synthesis That Entertains Audiences.
IEEE Access, 2020


Fine-Grained Similarity Measurement between Educational Videos and Exercises.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Using Cyclic Noise as the Source Signal for Neural Source-Filter-Based Speech Waveform Model.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Introducing the VoicePrivacy Initiative.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Design Choices for X-Vector Based Speaker Anonymization.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Reverberation Modeling for Source-Filter-Based Neural Vocoder.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Transferring Neural Speech Waveform Synthesizers to Musical Instrument Sounds Generation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Effect of Choice of Probability Distribution, Randomness, and Search Methods for Alignment Modeling in Sequence-to-Sequence Text-to-Speech Synthesis Using Hard Alignment.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A Study of Child Speech Extraction Using Joint Speech Enhancement and Separation in Realistic Conditions.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Zero-Shot Multi-Speaker Text-To-Speech with State-Of-The-Art Neural Speaker Embeddings.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Transformation of low-quality device-recorded speech to high-quality speech using improved SEGAN model.
CoRR, 2019

The ASVspoof 2019 database.
CoRR, 2019

Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments.
CoRR, 2019

Initial investigation of encoder-decoder end-to-end TTS using marginalization of monotonic hard alignments.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Rakugo speech synthesis using segment-to-segment neural transduction and style tokens - toward speech synthesis for entertaining audiences.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Speaker Anonymization Using X-vector and Neural Waveform Models.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Training Multi-Speaker Neural Text-to-Speech Systems Using Speaker-Imbalanced Speech Corpora.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

MOSNet: Deep Learning-Based Objective Assessment for Voice Conversion.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Joint Training Framework for Text-to-Speech and Voice Conversion Using Multi-Source Tacotron and WaveNet.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Investigation of Enhanced Tacotron Text-to-speech Synthesis Systems with Self-attention for Pitch Accent Language.
Proceedings of the IEEE International Conference on Acoustics, 2019

Neural Source-filter-based Waveform Model for Statistical Parametric Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2019

STFT Spectral Loss for Training a Neural Speech Waveform Model.
Proceedings of the IEEE International Conference on Acoustics, 2019

Audiovisual Speaker Conversion: Jointly and Simultaneously Transforming Facial Expression and Acoustic Characteristics.
Proceedings of the IEEE International Conference on Acoustics, 2019

2018
Fundamental Frequency Modeling for Neural-Network-Based Statistical Parametric Speech Synthesis.
PhD thesis, 2018

Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Investigating very deep highway networks for parametric speech synthesis.
Speech Commun., 2018

Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis.
CoRR, 2018

Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

A Progressive Deep Learning Approach to Child Speech Separation.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Investigating Accuracy of Pitch-accent Annotations in Neural Network-based Speech Synthesis and Denoising Effects.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Speech Waveform Synthesis from MFCC Sequences with Generative Adversarial Networks.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech Synthesis.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Principles for Learning Controllable TTS from Annotated and Latent Variation.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

An autoregressive recurrent mixture density network for parametric speech synthesis.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

A maximum likelihood approach to deep neural network based speech dereverberation.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016
Investigation of Using Continuous Representation of Various Linguistic Units in Neural Network Based Text-to-Speech Synthesis.
IEICE Trans. Inf. Syst., 2016

Concept-to-Speech generation with knowledge sharing for acoustic modelling and utterance filtering.
Comput. Speech Lang., 2016

A Comparative Study of the Performance of HMM, DNN, and RNN based Speech Synthesis Systems Trained on Very Large Speaker-Dependent Corpora.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Investigating RNN-based speech enhancement methods for noise-robust Text-to-Speech.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Enhance the Word Vector with Prosodic Information for the Recurrent Neural Network Based TTS System.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System Using Deep Recurrent Neural Networks.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Using Text and Acoustic Features in Predicting Glottal Excitation Waveforms for Parametric Speech Synthesis with Recurrent Neural Networks.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

A full training framework of cross-stream dependence modelling for HMM-based singing voice synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

The NII speech synthesis entry for Blizzard Challenge 2016.
Proceedings of the Blizzard Challenge 2016, Cuppertino, CA, USA, September 16, 2016, 2016

2014
Concept-to-speech generation by integrating syntagmatic features into HMM-based speech synthesis.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

2013
An anisotropic diffusion filter based on multidirectional separability.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

2012
Cross-stream dependency modeling using continuous F0 model for HMM-based speech synthesis.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012


  Loading...