Junichi Yamagishi
Orcid: 0000-0003-2752-3955Affiliations:
- National Institute of Informatics, Tokyo, Japan
- University of Edinburgh, Scotland, UK (former)
According to our database1,
Junichi Yamagishi
authored at least 406 papers
between 2002 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2025
2024
IEEE ACM Trans. Audio Speech Lang. Process., 2024
ZMM-TTS: Zero-Shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-Supervised Discrete Speech Representations.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Joint speaker encoder and neural back-end model for fully end-to-end automatic speaker verification with multiple enrollment utterances.
Comput. Speech Lang., 2024
CoRR, 2024
CoRR, 2024
CoRR, 2024
Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion.
CoRR, 2024
CoRR, 2024
CoRR, 2024
Revisiting and Improving Scoring Fusion for Spoofing-aware Speaker Verification Using Compositional Data Analysis.
CoRR, 2024
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios.
CoRR, 2024
Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems.
CoRR, 2024
Analysis of Fine-Grained Counting Methods for Masked Face Counting: A Comparative Study.
IEEE Access, 2024
IEEE Access, 2024
Exploring Self-Supervised Vision Transformers for Deepfake Detection: A Comparative Analysis.
Proceedings of the IEEE International Joint Conference on Biometrics, 2024
Uncertainty as a Predictor: Leveraging Self-Supervised Learning for Zero-Shot MOS Prediction.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Spoofing Attack Augmentation: Can Differently-Trained Attack Models Improve Generalisation?
Proceedings of the IEEE International Conference on Acoustics, 2024
Can Large-Scale Vocoded Spoofed Data Improve Speech Spoofing Countermeasure with a Self-Supervised Front End?
Proceedings of the IEEE International Conference on Acoustics, 2024
Bridging Textual and Tabular Worlds for Fact Verification: A Lightweight, Attention-Based Model.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024
2023
Dataset, October, 2023
ACM Trans. Graph., August, 2023
The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
IEEE ACM Trans. Audio Speech Lang. Process., 2023
IEEE ACM Trans. Audio Speech Lang. Process., 2023
DDSP-based Neural Waveform Synthesis of Polyphonic Guitar Performance from String-wise MIDI Input.
CoRR, 2023
Language-independent speaker anonymization using orthogonal Householder neural network.
CoRR, 2023
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Controlling Multi-Class Human Vocalization Generation via a Simple Segment-based Labeling Scheme.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Investigating Range-Equalizing Bias in Mean Opinion Score Ratings of Synthesized Speech.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the IEEE International Joint Conference on Biometrics, 2023
Spoofed Training Data for Speech Spoofing Countermeasure Can Be Efficiently Created Using Neural Vocoders.
Proceedings of the IEEE International Conference on Acoustics, 2023
Can Knowledge of End-to-End Text-to-Speech Models Improve Neural Midi-to-Audio Synthesis Systems?
Proceedings of the IEEE International Conference on Acoustics, 2023
Hiding Speaker's Sex in Speech Using Zero-Evidence Speaker Representation in an Analysis/Synthesis Pipeline.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-Supervised Setting.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
The Voicemos Challenge 2023: Zero-Shot Subjective Speech Quality Prediction for Multiple Domains.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Exploring Isolated Musical Notes as Pre-training Data for Predominant Instrument Recognition in Polyphonic Music.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
2022
IEEE Trans. Biom. Behav. Identity Sci., 2022
IEEE ACM Trans. Audio Speech Lang. Process., 2022
Use of Speaker Recognition Approaches for Learning and Evaluating Embedding Representations of Musical Instrument Sounds.
IEEE ACM Trans. Audio Speech Lang. Process., 2022
IEEE ACM Trans. Audio Speech Lang. Process., 2022
IEEE Signal Process. Lett., 2022
Effects of Image Processing Operations on Adversarial Noise and Their Use in Detecting and Correcting Adversarial Images.
IEICE Trans. Inf. Syst., 2022
The PartialSpoof Database and Countermeasures for the Detection of Short Generated Audio Segments Embedded in a Speech Utterance.
CoRR, 2022
Investigating Active-Learning-Based Training Data Selection for Speech Spoofing Countermeasure.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
Automatic Speaker Verification Spoofing and Deepfake Detection Using Wav2vec 2.0 and Data Augmentation.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022
Language-Independent Speaker Anonymization Approach Using Self-Supervised Pre-Trained Models.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022
Spoofing-Aware Attention based ASV Back-end with Multiple Enrollment Utterances and a Sampling Strategy for the SASV Challenge 2022.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, 2022
Attention Back-End for Automatic Speaker Verification with Multiple Enrollment Utterances.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
On the Interplay between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the 29th International Conference on Computational Linguistics, 2022
Proceedings of the Handbook of Digital Face Manipulation and Detection, 2022
Proceedings of the Handbook of Digital Face Manipulation and Detection, 2022
2021
ASVspoof 2019: Spoofing Countermeasures for the Detection of Synthesized, Converted and Replayed Speech.
IEEE Trans. Biom. Behav. Identity Sci., 2021
An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Multi-Metric Optimization Using Generative Adversarial Networks for Near-End Speech Intelligibility Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
IEICE Trans. Inf. Syst., 2021
Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis.
Comput. Speech Lang., 2021
Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio.
CoRR, 2021
LaughNet: synthesizing laughter utterances from waveform silhouettes and a single laughter example.
CoRR, 2021
CoRR, 2021
ASVspoof 2021: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan.
CoRR, 2021
CoRR, 2021
Use of speaker recognition approaches for learning timbre representations of musical instrument sounds from raw waveforms.
CoRR, 2021
Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances.
CoRR, 2021
Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance.
Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021
Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021
Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021
Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021
Enhancing Low-Quality Voice Recordings Using Disentangled Channel Factor and Neural Waveform Model.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust Waveform Generation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
OpenForensics: Large-Scale Challenging Dataset For Multi-Face Forgery Detection And Segmentation In-The-Wild.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Learning Disentangled Phone and Speaker Representations in a Semi-Supervised VQ-VAE Paradigm.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio.
Proceedings of the 16th IEEE International Conference on Automatic Face and Gesture Recognition, 2021
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021
2020
A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F<sub>0</sub> Model for Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2020
IEEE ACM Trans. Audio Speech Lang. Process., 2020
IEEE ACM Trans. Audio Speech Lang. Process., 2020
Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals.
IEEE ACM Trans. Audio Speech Lang. Process., 2020
ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech.
Comput. Speech Lang., 2020
Introduction to the special issue "Speaker and language characterization and recognition: Voice modeling, conversion, synthesis and ethical aspects".
Comput. Speech Lang., 2020
Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis.
CoRR, 2020
CoRR, 2020
Modeling of Rakugo Speech and Its Limitations: Toward Speech Synthesis That Entertains Audiences.
IEEE Access, 2020
An Initial Investigation on Optimizing Tandem Speaker Verification and Countermeasure Systems Using Reinforcement Learning.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020
Using Cyclic Noise as the Source Signal for Neural Source-Filter-Based Speech Waveform Model.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Noise Tokens: Learning Neural Noise Templates for Environment-Aware Speech Enhancement.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
iMetricGAN: Intelligibility Enhancement for Speech-in-Noise Using Generative Adversarial Network-Based Metric Learning.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the IEEE International Conference on Image Processing, 2020
Generating Master Faces for Use in Performing Wolf Attacks on Face Recognition Systems.
Proceedings of the 2020 IEEE International Joint Conference on Biometrics, 2020
Transferring Neural Speech Waveform Synthesizers to Musical Instrument Sounds Generation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Effect of Choice of Probability Distribution, Randomness, and Search Methods for Alignment Modeling in Sequence-to-Sequence Text-to-Speech Synthesis Using Hard Alignment.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Zero-Shot Multi-Speaker Text-To-Speech with State-Of-The-Art Neural Speaker Embeddings.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020
Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020
Voice Conversion Challenge 2020 -- Intra-lingual semi-parallel and cross-lingual voice conversion --.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020
A Method for Identifying Origin of Digital Images Using a Convolutional Neural Network.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020
Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-Based Detection.
Proceedings of the Advanced Information Networking and Applications, 2020
2019
Proceedings of the Handbook of Biometric Anti-Spoofing, 2019
Complex-Valued Restricted Boltzmann Machine for Speaker-Dependent Speech Parameterization From Complex Spectra.
IEEE ACM Trans. Audio Speech Lang. Process., 2019
J. Inf. Secur. Appl., 2019
CoRR, 2019
Transformation of low-quality device-recorded speech to high-quality speech using improved SEGAN model.
CoRR, 2019
A Method for Identifying Origin of Digital Images Using a Convolution Neural Network.
CoRR, 2019
Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments.
CoRR, 2019
A Unified Speaker Adaptation Method for Speech Synthesis using Transcribed and Untranscribed Speech with Backpropagation.
CoRR, 2019
Training a Neural Speech Waveform Model using Spectral Losses of Short-Time Fourier Transform and Continuous Wavelet Transform.
CoRR, 2019
Initial investigation of encoder-decoder end-to-end TTS using marginalization of monotonic hard alignments.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019
Rakugo speech synthesis using segment-to-segment neural transduction and style tokens - toward speech synthesis for entertaining audiences.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019
Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Training Multi-Speaker Neural Text-to-Speech Systems Using Speaker-Imbalanced Speech Corpora.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Does the Lombard Effect Improve Emotional Communication in Noise? - Analysis of Emotional Speech Acted in Noise.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Joint Training Framework for Text-to-Speech and Voice Conversion Using Multi-Source Tacotron and WaveNet.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Investigation of Enhanced Tacotron Text-to-speech Synthesis Systems with Self-attention for Pitch Accent Language.
Proceedings of the IEEE International Conference on Acoustics, 2019
Neural Source-filter-based Waveform Model for Statistical Parametric Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Cycle-consistent Adversarial Networks for Non-parallel Vocal Effort Based Speaking Style Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Waveform Generation for Text-to-speech Synthesis Using Pitch-synchronous Multi-scale Generative Adversarial Networks.
Proceedings of the IEEE International Conference on Acoustics, 2019
Audiovisual Speaker Conversion: Jointly and Simultaneously Transforming Facial Expression and Acoustic Characteristics.
Proceedings of the IEEE International Conference on Acoustics, 2019
Multi-task Learning for Detecting and Segmenting Manipulated Facial Images and Videos.
Proceedings of the 10th IEEE International Conference on Biometrics Theory, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
2018
IEEE ACM Trans. Audio Speech Lang. Process., 2018
IEEE ACM Trans. Audio Speech Lang. Process., 2018
A Comparison Between STRAIGHT, Glottal, and Sinusoidal Vocoding in Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2018
Speech Commun., 2018
Investigating different representations for modeling and controlling multiple emotions in DNN-based speech synthesis.
Speech Commun., 2018
Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis.
CoRR, 2018
Complex-Valued Restricted Boltzmann Machine for Direct Speech Parameterization from Complex Spectra.
CoRR, 2018
Wasserstein GAN and Waveform Loss-Based Acoustic Model Training for Multi-Speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder.
IEEE Access, 2018
Transforming acoustic characteristics to deceive playback spoofing countermeasures of speaker verification systems.
Proceedings of the 2018 IEEE International Workshop on Information Forensics and Security, 2018
Proceedings of the 2018 IEEE International Workshop on Information Forensics and Security, 2018
Scaling and Bias Codes for Modeling Speaker-Adaptive DNN-Based Speech Synthesis Systems.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation, 2018
The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018
Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018
A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018
t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018
Integrated Presentation Attack Detection and Automatic Speaker Verification: Common Features and Gaussian Back-end Fusion.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Investigating Accuracy of Pitch-accent Annotations in Neural Network-based Speech Synthesis and Denoising Effects.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Transformation on Computer-Generated Facial Image to Avoid Detection by Spoofing Detector.
Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018
A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
High-Quality Nonparallel Voice Conversion Based on Cycle-Consistent Adversarial Network.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018
Modular Convolutional Neural Network for Discriminating between Computer-Generated Images and Photographic Images.
Proceedings of the 13th International Conference on Availability, Reliability and Security, 2018
2017
Introduction to the Issue on Spoofing and Countermeasures for Automatic Speaker Verification.
IEEE J. Sel. Top. Signal Process., 2017
IEEE J. Sel. Top. Signal Process., 2017
Influence of speaker familiarity on blind and visually impaired children's and young adults' perception of synthetic voices.
Comput. Speech Lang., 2017
Proceedings of the 2017 IEEE Workshop on Information Forensics and Security, 2017
Distinguishing computer graphics from natural images using convolution neural networks.
Proceedings of the 2017 IEEE Workshop on Information Forensics and Security, 2017
An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech Synthesis.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Speech Intelligibility in Cars: The Effect of Speaking Style, Noise and Listener Age.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Direct Modeling of Frequency Spectra and Waveform Generation Based on Phase Recovery for DNN-Based Speech Synthesis.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Complex-Valued Restricted Boltzmann Machine for Direct Learning of Frequency Spectra.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Reducing Mismatch in Training of DNN-Based Glottal Excitation Models in a Statistical Parametric Text-to-Speech System.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Non-parallel voice conversion using i-vector PLDA: towards unifying speaker verification and transformation.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017
Proceedings of the Human-Harmonized Information Technology, Volume 2, 2017
2016
Constructing a Deep Neural Network Based Spectral Model for Statistical Speech Synthesis.
Proceedings of the Recent Advances in Nonlinear Speech Processing, 2016
Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance.
IEEE ACM Trans. Audio Speech Lang. Process., 2016
Investigation of Using Continuous Representation of Various Linguistic Units in Neural Network Based Text-to-Speech Synthesis.
IEICE Trans. Inf. Syst., 2016
Comput. Speech Lang., 2016
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016
A Comparative Study of the Performance of HMM, DNN, and RNN based Speech Synthesis Systems Trained on Very Large Speaker-Dependent Corpora.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016
Speaker Adaptation of Various Components in Deep Neural Network based Speech Synthesis.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016
Development of a statistical parametric synthesis system for operatic singing in German.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016
Voice Liveness Detection for Speaker Verification based on a Tandem Single/Double-channel Pop Noise Detector.
Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Enhance the Word Vector with Prosodic Information for the Recurrent Neural Network Based TTS System.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Applying Spectral Normalisation and Efficient Envelope Estimation and Statistical Transformation for the Voice Conversion Challenge 2016.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System Using Deep Recurrent Neural Networks.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Syllable-Level Representations of Suprasegmental Features for DNN-Based Text-to-Speech Synthesis.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Using Text and Acoustic Features in Predicting Glottal Excitation Waveforms for Parametric Speech Synthesis with Recurrent Neural Networks.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Majorisation-Minimisation Based Optimisation of the Composite Autoregressive System with Application to Glottal Inverse Filtering.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
A deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Wavelet-based decomposition of F0 as a secondary task for DNN-based speech synthesis with multi-task learning.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Testing the consistency assumption: Pronunciation variant forced alignment in read and spontaneous speech synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the COLING 2016, 2016
Proceedings of the Blizzard Challenge 2016, Cuppertino, CA, USA, September 16, 2016, 2016
2015
Proceedings of the Encyclopedia of Biometrics, Second Edition, 2015
A Deep Generative Architecture for Postfiltering in Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2015
Speech Commun., 2015
Intelligibility of time-compressed synthetic speech: Compression method and speaking style.
Speech Commun., 2015
Comput. Speech Lang., 2015
A Comparison of Manual and Automatic Voice Repair for Individual with Vocal Disabilities.
Proceedings of the 6th Workshop on Speech and Language Processing for Assistive Technologies, 2015
Automatic speaker verification spoofing and countermeasures (ASVspoof 2015): open discussion and future plans.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Multiple feed-forward deep neural networks for statistical parametric speech synthesis.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
A perceptual investigation of wavelet-based decomposition of f0 for text-to-speech synthesis.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Influence of speaker familiarity on blind and visually impaired children's perception of synthetic voices in audio games.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Deep neural network context embeddings for model selection in rich-context HMM synthesis.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Fusion of multiple parameterisations for DNN-based sinusoidal speech synthesis with multi-task learning.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Methods for applying dynamic sinusoidal models to statistical parametric speech synthesis.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
2014
Proceedings of the Handbook of Biometric Anti-Spoofing, 2014
IEEE J. Sel. Top. Signal Process., 2014
IEEE J. Sel. Top. Signal Process., 2014
Intelligibility enhancement of HMM-generated speech in additive noise by modifying Mel cepstral coefficients to increase the glimpse proportion.
Comput. Speech Lang., 2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Development of a genre-dependent TTS system with cross-speaker speaking-style transplantation.
Proceedings of the 2nd International Workshop on Speech, Language and Audio in Multimedia, 2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
An investigation of the application of dynamic sinusoidal models to statistical parametric speech synthesis.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Neural net word representations for phrase-break prediction without a part of speech tagger.
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the Advances in Speech and Language Technologies for Iberian Languages, 2014
2013
Articulatory Control of HMM-Based Parametric Speech Synthesis Using Feature-Space-Switched Multiple Regression.
IEEE Trans. Speech Audio Process., 2013
Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis.
Comput. Speech Lang., 2013
Building personalised synthetic voices for individuals with severe speech impairment.
Comput. Speech Lang., 2013
Unsupervised and lightly-supervised learning for rapid construction of TTS systems in multiple languages from 'found' data: evaluation and analysis.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013
Using neighbourhood density and selective SNR boosting to increase the intelligibility of synthetic speech in noise.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013
Using adaptation to improve speech transcription alignment in noisy and reverberant environments.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013
Mage - reactive articulatory feature control of HMM-based parametric speech synthesis.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013
Towards Personalised Synthesised Voices for Individuals with Vocal Disabilities: Voice Banking and Reconstruction.
Proceedings of the Fourth Workshop on Speech and Language Processing for Assistive Technologies, 2013
The voice bank corpus: Design, collection and data analysis of a large regional accent speech database.
Proceedings of the 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013
Combining perceptually-motivated spectral shaping with loudness and duration modification for intelligibility enhancement of HMM-based synthetic speech in noise.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
TUNDRA: a multilingual corpus of found data for TTS research created with light supervision.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Lightly supervised discriminative training of grapheme models for improved sentence-level alignment of speech and text data.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Improving intelligibility in noise of HMM-generated speech via noise-dependent and -independent methods.
Proceedings of the IEEE International Conference on Acoustics, 2013
Proceedings of the IEEE International Conference on Acoustics, 2013
2012
Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech.
IEEE Trans. Speech Audio Process., 2012
Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping.
Speech Commun., 2012
Speech Commun., 2012
Synthesis and evaluation of conversational characteristics in HMM-based speech synthesis.
Speech Commun., 2012
Noise-robust whispered speech recognition using a non-audible-murmur microphone with VTS compensation.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Using HMM-based Speech Synthesis to Reconstruct the Voice of Individuals with Degenerative Speech Disorders.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Mel cepstral coefficient modification based on the Glimpse Proportion measure for improving the intelligibility of HMM-generated synthetic speech in noise.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Evaluating speech intelligibility enhancement for HMM-based synthetic speech in noise.
Proceedings of the ISCA Workshop on Statistical And Perceptual Audition, 2012
Towards an Unsupervised Speaking Style Voice Building Framework: Multi-Style Speaker Diarization.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Synthetic Speech Discrimination using Pitch Pattern Statistics Derived from Image Analysis.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Cepstral analysis based on the glimpse proportion measure for improving the intelligibility of HMM-based synthetic speech in noise.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
2011
IEEE Trans. Speech Audio Process., 2011
The Romanian speech synthesis (RSS) corpus: Building a high quality HMM-based speech synthesis system using a high sampling rate.
Speech Commun., 2011
Unsupervised Continuous-Valued Word Features for Phrase-Break Prediction without a Part-of-Speech Tagger.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Can Objective Measures Predict the Intelligibility of Modified HMM-Based Synthetic Speech in Noise?
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Feature-Space Transform Tying in Unified Acoustic-Articulatory Modelling for Articulatory Control of HMM-Based Speech Synthesis.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Evaluation of objective measures for intelligibility prediction of HMM-based synthetic speech in noise.
Proceedings of the IEEE International Conference on Acoustics, 2011
Proceedings of the IEEE International Conference on Acoustics, 2011
An analysis of machine translation and speech synthesis in speech-to-speech translation system.
Proceedings of the IEEE International Conference on Acoustics, 2011
Proceedings of the IEEE International Conference on Acoustics, 2011
Proceedings of the IEEE International Conference on Acoustics, 2011
Proceedings of the 13th International ACM SIGACCESS Conference on Computers and Accessibility, 2011
2010
Thousands of Voices for HMM-Based Speech Synthesis-Analysis and Application of TTS Systems Built on Various ASR Corpora.
IEEE Trans. Speech Audio Process., 2010
IEEE Trans. Speech Audio Process., 2010
Modeling and interpolation of Austrian German and Viennese dialect in HMM-based speech synthesis.
Speech Commun., 2010
Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech.
Speech Commun., 2010
IEEE J. Sel. Top. Signal Process., 2010
Speaker adaptation and the evaluation of speaker similarity in the EMIME speech-to-speech translation project.
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010
Proceedings of the Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, June 28, 2010
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Synthesis of fast speech with interpolation of adapted HSMMs and its evaluation by blind and sighted listeners.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
HMM-based text-to-articulatory-movement prediction and analysis of critical articulators.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Proceedings of the IEEE International Conference on Acoustics, 2010
Proceedings of the IEEE International Conference on Acoustics, 2010
Revisiting the security of speaker verification systems against imposture using synthetic speech.
Proceedings of the IEEE International Conference on Acoustics, 2010
Proceedings of the Blizzard Challenge 2010, Kansai Science City, Japan, September 25, 2010, 2010
2009
IEEE Trans. Speech Audio Process., 2009
Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm.
IEEE Trans. Speech Audio Process., 2009
IEEE Trans. Speech Audio Process., 2009
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
Identification of contrast and its emphatic realization in HMM based speech synthesis.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
Analysis of Unsupervised and Noise-Robust Speaker-Adaptive HMM-Based Speech Synthesis Systems toward a Unified ASR and TTS Framework.
Proceedings of the Blizzard Challenge 2009, Edinburgh, Scotland, UK, September 4, 2009, 2009
Glottal Source and Prosodic Prominence Modelling in HMM-based Speech Synthesis for the Blizzard Challenge 2009.
Proceedings of the Blizzard Challenge 2009, Edinburgh, Scotland, UK, September 4, 2009, 2009
2008
Proceedings of the First Workshop on Child, Computer and Interaction, 2008
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
Articulatory control of HMM-based parametric speech synthesis driven by phonetic knowledge.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
Performance evaluation of the speaker-independent HMM-based speech synthesis system "HTS 2007" for the Blizzard Challenge 2007.
Proceedings of the IEEE International Conference on Acoustics, 2008
The HTS-2008 System: Yet Another Evaluation of the Speaker-Adaptive HMM-based Speech Synthesis System in The 2008 Blizzard Challenge.
Proceedings of the Blizzard Challenge 2008, 2008
2007
Average-Voice-Based Speech Synthesis Using HSMM-Based Speaker Adaptation and Adaptive Training.
IEICE Trans. Inf. Syst., 2007
IEICE Trans. Inf. Syst., 2007
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007
Improved average-voice-based speech synthesis using gender-mixed modeling and a parameter generation algorithm considering GV.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007
Utilization of an HMM-based feature generation module in 5 ms segment concatenative speech synthesis.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007
Towards an improved modeling of the glottal source in statistical parametric speech synthesis.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007
Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2007
Performance evaluation of HMM-based style classification with a small amount of training data.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Proceedings of the IEEE International Conference on Acoustics, 2007
Speaker-independent HMM-based speech synthesis system - HTS-2007 system for the Blizzard Challenge 2007.
Proceedings of the Evaluation of text-to-speech systems: Blizzard Challenge 2007, 2007
Proceedings of the Evaluation of text-to-speech systems: Blizzard Challenge 2007, 2007
2006
A Style Adaptation Technique for Speech Synthesis Using HSMM and Suprasegmental Features.
IEICE Trans. Inf. Syst., 2006
A technique for controlling voice quality of synthetic speech using multiple regression HSMM.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
Acoustic model training based on linear transformation and MAP modification for HSMM-based speech synthesis.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
Constrained structural maximum a posteriori linear regression for average-voice-based speech synthesis.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006
Developing a Test Bed of English Text-to-Speech System XIMERA for the Blizzard Challenge 2006.
Proceedings of the Blizzard Challenge 2006, Pittsburgh, PA, USA, September 16, 2006, 2006
2005
Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis.
IEICE Trans. Inf. Syst., 2005
Speech Synthesis with Various Emotional Expressions and Speaking Styles by Style Interpolation and Morphing.
IEICE Trans. Inf. Syst., 2005
IEICE Trans. Inf. Syst., 2005
Performance evaluation of style adaptation for hidden semi-Markov model based speech synthesis.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005
Model adaptation and adaptive training using ESAT algorithm for HMM-based speech synthesis.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005
Proceedings of the 4th International Conference on Cyberworlds (CW 2005), 2005
2004
Proceedings of the 8th International Conference on Spoken Language Processing, 2004
Speaking style adaptation using context clustering decision tree for HMM-based speech synthesis.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004
2003
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2003
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003
A training method for average voice model based on shared decision tree context clustering and speaker adaptive training.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003
2002
A context clustering technique for average voice model in HMM-based speech synthesis.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002