Tomoki Toda

Orcid: 0000-0001-8146-1279

According to our database1, Tomoki Toda authored at least 429 papers between 2000 and 2024.

Collaborative distances:



In proceedings 
PhD thesis 


Online presence:



Dual-Channel Target Speaker Extraction Based on Conditional Variational Autoencoder and Directional Information.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Pretraining and Adaptation Techniques for Electrolaryngeal Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Unequally Spaced Sound Field Interpolation for Rotation-Robust Beamforming.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Multi-Speaker Text-to-Speech Training With Speaker Anonymized Data.
IEEE Signal Process. Lett., 2024

Improved Architecture for High-resolution Piano Transcription to Efficiently Capture Acoustic Characteristics of Music Signals.
CoRR, 2024

Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions.
CoRR, 2024

Improvements of Discriminative Feature Space Training for Anomalous Sound Detection in Unlabeled Conditions.
CoRR, 2024

The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction.
CoRR, 2024

SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge.
CoRR, 2024

Quantifying the effect of speech pathology on automatic and human speaker verification.
CoRR, 2024

2DP-2MRC: 2-Dimensional Pointer-based Machine Reading Comprehension Method for Multimodal Moment Retrieval.
CoRR, 2024

CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.
CoRR, 2024

SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan.
CoRR, 2024

Learning Multidimensional Disentangled Representations of Instrumental Sounds for Musical Similarity Assessment.
CoRR, 2024

Automatic design optimization of preference-based subjective evaluation with online learning in crowdsourcing environment.
CoRR, 2024

Fast Neural Speech Waveform Generative Models With Fully-Connected Layer-Based Upsampling.
IEEE Access, 2024

An Investigation of Fundamental Frequency Pattern Prediction for Japanese Electrolaryngeal Speech Enhancement Based on Frame-Wise Phoneme Representations.
IEEE Access, 2024

Electrolaryngeal Speech Intelligibility Enhancement through Robust Linguistic Encoders.
Proceedings of the IEEE International Conference on Acoustics, 2024

Convnext-TTS And Convnext-VC: Convnext-Based Fast End-To-End Sequence-To-Sequence Text-To-Speech And Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2024

FIRNet: Fundamental Frequency Controllable Fast Neural Vocoder With Trainable Finite Impulse Response Filter.
Proceedings of the IEEE International Conference on Acoustics, 2024

Audio Difference Learning for Audio Captioning.
Proceedings of the IEEE International Conference on Acoustics, 2024

MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, Asr Error Detection, and Asr Error Correction.
Proceedings of the IEEE International Conference on Acoustics, 2024

Unsupervised Training of Neural Network-Based Virtual Microphone Estimator.
Proceedings of the 32nd European Signal Processing Conference, 2024

Discriminative Neighborhood Smoothing for Generative Anomalous Sound Detection.
Proceedings of the 32nd European Signal Processing Conference, 2024

High-Fidelity and Pitch-Controllable Neural Vocoder Based on Unified Source-Filter Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Noisy-to-Noisy Voice Conversion Under Variations of Noisy Condition.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Harmonic-Net: Fundamental Frequency and Speech Rate Controllable Fast Neural Vocoder.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

On the Effectiveness of ASR Representations in Real-world Noisy Speech Emotion Recognition.
CoRR, 2023

AAS-VC: On the Generalization Ability of Automatic Alignment Search based Non-autoregressive Sequence-to-sequence Voice Conversion.
CoRR, 2023

The Singing Voice Conversion Challenge 2023.
CoRR, 2023

Directional Target Speaker Extraction under Noisy Underdetermined Conditions through Conditional Variational Autoencoder with Global Style Tokens.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

Differentiable Representation of Warping Based on Lie Group Theory.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

Semi-supervised Multimodal Emotion Recognition with Consensus Decision-making and Label Correction.
Proceedings of the 1st International Workshop on Multimodal and Responsible Affective Computing, 2023

Sequence-to-Sequence Network Training Methods for Automatic Guitar Transcription With Tokenized Outputs.
Proceedings of the 24th International Society for Music Information Retrieval Conference, 2023

Analysis of Mean Opinion Scores in Subjective Evaluation of Synthetic Speech Based on Tail Probabilities.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Emotion Awareness in Multi-utterance Turn for Improving Emotion Prediction in Multi-Speaker Conversation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

E2E-S2S-VC: End-To-End Sequence-To-Sequence Voice Conversion.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Preference-based training framework for automatic speech quality assessment using deep neural network.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Reverberation-Controllable Voice Conversion Using Reverberation Time Estimator.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural Vocoder.
Proceedings of the IEEE International Conference on Acoustics, 2023

Text-To-Speech Synthesis Based on Latent Variable Conversion Using Diffusion Probabilistic Model and Variational Autoencoder.
Proceedings of the IEEE International Conference on Acoustics, 2023

NNSVS: A Neural Network-Based Singing Voice Synthesis Toolkit.
Proceedings of the IEEE International Conference on Acoustics, 2023

Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Representation of Vocal Tract Length Transformation Based on Group Theory.
Proceedings of the IEEE International Conference on Acoustics, 2023

Low-Latency Electrolaryngeal Speech Enhancement Based on Fastspeech2-Based Voice Conversion and Self-Supervised Speech Representation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Analysis Of Noisy-Target Training For Dnn-Based Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2023

Sound Field Interpolation with Unsupervised Calibration for Freely Spaced Circular Microphone Array in Rotation-Robust Beamforming.
Proceedings of the 31st European Signal Processing Conference, 2023

A Comparative Study of Voice Conversion Models With Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

WaveNeXt: ConvNeXt-Based Fast Neural Vocoder Without ISTFT layer.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

The Singing Voice Conversion Challenge 2023.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

ED-CEC: Improving Rare word Recognition Using ASR Postprocessing Based on Error Detection and Context-Aware Error Correction.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Improving Severity Preservation of Healthy-to-Pathological Voice Conversion With Global Style Tokens.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

The Voicemos Challenge 2023: Zero-Shot Subjective Speech Quality Prediction for Multiple Domains.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

An Analysis of Personalized Speech Recognition System Development for the Deaf and Hard-of-Hearing.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

Neural speech-rate conversion with multispeaker WaveNet vocoder.
Speech Commun., 2022

Investigation of Japanese PnG BERT Language Model in Text-to-Speech Synthesis for Pitch Accent Language.
IEEE J. Sel. Top. Signal Process., 2022

A Comparative Study of Self-Supervised Speech Representation Based Voice Conversion.
IEEE J. Sel. Top. Signal Process., 2022

Music Similarity Calculation of Individual Instrumental Sounds Using Metric Learning.
CoRR, 2022

A Cyclical Approach to Synthetic and Natural Speech Mismatch Refinement of Neural Post-filter for Low-cost Text-to-speech System.
CoRR, 2022

Two-Stage Training Method for Japanese Electrolaryngeal Speech Enhancement Based on Sequence-to-Sequence Voice Conversion.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Spoken-Text-Style Transfer with Conditional Variational Autoencoder and Content Word Storage.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

The VoiceMOS Challenge 2022.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

An Evaluation of Three-Stage Voice Conversion Framework for Noisy and Reverberant Conditions.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Direct Noisy Speech Modeling for Noisy-To-Noisy Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2022

S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations.
Proceedings of the IEEE International Conference on Acoustics, 2022

Towards Identity Preserving Normal to Dysarthric Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2022

LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech.
Proceedings of the IEEE International Conference on Acoustics, 2022

An Investigation of Streaming Non-Autoregressive sequence-to-sequence Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2022

Generalization Ability of MOS Prediction Networks.
Proceedings of the IEEE International Conference on Acoustics, 2022

Modified Sound Field Interpolation Method for Rotation-robust Beamforming with Unequally Spaced Circular Microphone Array.
Proceedings of the 30th European Signal Processing Conference, 2022

Improvement of Serial Approach to Anomalous Sound Detection by Incorporating Two Binary Cross-Entropies for Outlier Exposure.
Proceedings of the 30th European Signal Processing Conference, 2022

Note-level Automatic Guitar Transcription Using Attention Mechanism.
Proceedings of the 30th European Signal Processing Conference, 2022

Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Quasi-Periodic Parallel WaveGAN: A Non-Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Many-to-Many Voice Transformer Network.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Pretraining Techniques for Sequence-to-Sequence Voice Conversion.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

The AS-NU System for the M2VoC Challenge.
CoRR, 2021

Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio With a CPU.
IEEE Access, 2021

Low-latency real-time non-parallel voice conversion based on cyclic variational autoencoder and multiband WaveRNN with data-driven linear prediction.
Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Singing Fundamental Frequency Contour Generation Using Generalized Command-Response Model and Score-Conditional Variational Autoencoder.
Proceedings of the 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP), 2021

Unified Source-Filter GAN: Unified Source-Filter Network Based On Factorization of Quasi-Periodic Parallel WaveGAN.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Relational Data Selection for Data Augmentation of Speaker-Dependent Multi-Band MelGAN Vocoder.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

High-Fidelity and Low-Latency Universal Neural Vocoder Based on Multiband WaveRNN with Data-Driven Linear Prediction for Discrete Waveform Modeling.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Noise Level Limited Sub-Modeling for Diffusion Probabilistic Vocoders.
Proceedings of the IEEE International Conference on Acoustics, 2021

High-Intelligibility Speech Synthesis for Dysarthric Speakers with LPCNet-Based TTS and CycleVAE-Based VC.
Proceedings of the IEEE International Conference on Acoustics, 2021

Crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder.
Proceedings of the IEEE International Conference on Acoustics, 2021

Speech Recognition by Simply Fine-Tuning Bert.
Proceedings of the IEEE International Conference on Acoustics, 2021

Non-Autoregressive Sequence-To-Sequence Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2021

Speech Emotion Recognition Based on Listener Adaptive Models.
Proceedings of the IEEE International Conference on Acoustics, 2021

Anomalous Sound Detection Using a Binary Classification Model and Class Centroids.
Proceedings of the 29th European Signal Processing Conference, 2021

An Ensemble Approach to Anomalous Sound Detection Based on Conformer-Based Autoencoder and Binary Classifier Incorporated with Metric Learning.
Proceedings of the 6th Workshop on Detection and Classification of Acoustic Scenes and Events 2021 (DCASE 2021), 2021

Mandarin Electrolaryngeal Speech Voice Conversion with Sequence-to-Sequence Modeling.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Multi-Stream HiFi-GAN with Data-Driven Waveform Decomposition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

On Prosody Modeling for ASR+TTS Based Voice Conversion.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

HASA-Net: A Non-Intrusive Hearing-Aid Speech Assessment Network.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Noisy-to-Noisy Voice Conversion Framework with Denoising Model.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Mandarin Electro-Laryngeal Speech Enhancement based on Statistical Voice Conversion and Manual Tone Control.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Investigation of Text-to-Speech-based Synthetic Parallel Data for Sequence-to-Sequence Non-Parallel Voice Conversion.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Time Alignment using Lip Images for Frame-based Electrolaryngeal Voice Conversion.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Customer Satisfaction Estimation in Contact Center Calls Based on a Hierarchical Multi-Task Model.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Any-to-One Sequence-to-Sequence Voice Conversion using Self-Supervised Discrete Speech Representations.
CoRR, 2020

Non-Parallel Voice Conversion System With WaveNet Vocoder and Collapsed Speech Suppression.
IEEE Access, 2020

A Cyclical Post-Filtering Approach to Mismatch Refinement of Neural Vocoder for Text-to-Speech Systems.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Quasi-Periodic Parallel WaveGAN Vocoder: A Non-Autoregressive Pitch-Dependent Dilated Convolution Model for Parametric Speech Generation.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Cyclic Spectral Modeling for Unsupervised Unit Discovery into Voice Conversion with Excitation and Waveform Modeling.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Semi-Supervised Self-Produced Speech Enhancement and Suppression Based on Joint Source Modeling of Air- and Body-Conducted Signals Using Variational Autoencoder.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Intelligibility Enhancement Based on Speech Waveform Modification Using Hearing Impairment.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Efficient Shallow Wavenet Vocoder Using Multiple Samples Output Based on Laplacian Distribution and Linear Prediction.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Transformer-Based Text-to-Speech with Weighted Forced Attention.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Weakly-Supervised Sound Event Detection with Self-Attention.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Espnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Semi-Supervised Enhancement and Suppression of Self-Produced Speech Using Correspondence between Air- and Body-Conducted Signals.
Proceedings of the 28th European Signal Processing Conference, 2020

Implementation of low-latency electrolaryngeal speech enhancement based on multi-task CLDNN.
Proceedings of the 28th European Signal Processing Conference, 2020

Conformer-Based Sound Event Detection with Semi-Supervised Learning and Data Augmentation.
Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), 2020

Baseline System of Voice Conversion Challenge 2020 with Cyclic Variational Autoencoder and Parallel WaveGAN.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

The NU Voice Conversion System for the Voice Conversion Challenge 2020: On the Effectiveness of Sequence-to-sequence Models and Autoregressive Neural Vocoders.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

Voice Conversion Challenge 2020 -- Intra-lingual semi-parallel and cross-lingual voice conversion --.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

Cross-Lingual Voice Conversion using a Cyclic Variational Auto-encoder and a WaveNet Vocoder.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Phoneme Embeddings on Predicting Fundamental Frequency Pattern for Electrolaryngeal Speech.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Speech-to-Singing Voice Conversion: The Challenges and Strategies for Improving Vocal Conversion Processes.
IEEE Signal Process. Mag., 2019

The ASVspoof 2019 database.
CoRR, 2019

Voice Conversion With CycleRNN-Based Spectral Mapping and Finely Tuned WaveNet Vocoder.
IEEE Access, 2019

Underdetermined Source Separation Based on Generalized Multichannel Variational Autoencoder.
IEEE Access, 2019

Statistical Voice Conversion with Quasi-periodic WaveNet Vocoder.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Generalization of Spectrum Differential based Direct Waveform Modification for Voice Conversion.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

An Investigation of Features for Fundamental Frequency Pattern Prediction in Electrolaryngeal Speech Enhancement.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Improving Singing Aid System for Laryngectomees With Statistical Voice Conversion and VAE-SPACE.
Proceedings of the 20th International Society for Music Information Retrieval Conference, 2019

Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Non-Parallel Voice Conversion with Cyclic Variational Autoencoder.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Real-Time Neural Text-to-Speech with Sequence-to-Sequence Acoustic Model and WaveGlow or Single Gaussian WaveRNN Vocoders.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Robustness of Statistical Voice Conversion Based on Direct Waveform Modification Against Background Sounds.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Investigation of F0 Conditioning and Fully Convolutional Networks in Variational Autoencoder Based Voice Conversion.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Pre-Trained Text Embeddings for Enhanced Text-to-Speech Synthesis.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Voice Conversion with Cyclic Recurrent Neural Network and Fine-tuned Wavenet Vocoder.
Proceedings of the IEEE International Conference on Acoustics, 2019

Investigations of Real-time Gaussian Fftnet and Parallel Wavenet Neural Vocoders with Simple Acoustic Features.
Proceedings of the IEEE International Conference on Acoustics, 2019

Scene-dependent Anomalous Acoustic-event Detection Based on Conditional Wavenet and I-vector.
Proceedings of the IEEE International Conference on Acoustics, 2019

Generalized Multichannel Variational Autoencoder for Underdetermined Source Separation.
Proceedings of the 27th European Signal Processing Conference, 2019

Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion.
Proceedings of the 27th European Signal Processing Conference, 2019

Development of a Real-time Bionic Voice Generation System based on Statistical Excitation Prediction.
Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility, 2019

Investigation of Shallow Wavenet Vocoder with Laplacian Distribution Output.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Tacotron-Based Acoustic Model Using Phoneme Alignment for Practical Neural Text-to-Speech Systems.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Intra-gender statistical singing voice conversion with direct waveform modification using log-spectral differential.
Speech Commun., 2018

An end-to-end model for cross-lingual transformation of paralinguistic information.
Mach. Transl., 2018

Stereophonic Music Separation Based on Non-Negative Tensor Factorization with Cepstral Distance Regularization.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2018

Daily Activity Recognition with Large-Scaled Real-Life Recording Datasets Based on Deep Neural Network Using Multi-Modal Signals.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2018

Frequency domain variants of velvet noise and their application to speech processing and synthesis: with appendices.
CoRR, 2018

An Evaluation of Deep Spectral Mappings and WaveNet Vocoder for Voice Conversion.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Improving FFTNet Vocoder with Noise Shaping and Subband Approaches.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Back-Translation-Style Data Augmentation for end-to-end ASR.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

The NU Non-Parallel Voice Conversion System for the Voice Conversion Challenge 2018.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

NU Voice Conversion System for the Voice Conversion Challenge 2018.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

sprocket: Open-Source Voice Conversion Software.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Collapsed Speech Segment Detection and Suppression for WaveNet Vocoder.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Audio-visual Voice Conversion Using Deep Canonical Correlation Analysis for Deep Bottleneck Features.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Frequency Domain Variants of Velvet Noise and Their Application to Speech Processing and Synthesis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Multi-Head Decoder for End-to-End Speech Recognition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Designing a Pneumatic Bionic Voice Prosthesis - A Statistical Approach for Source Excitation Generation.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

An Investigation of Noise Shaping with Perceptual Weighting for Wavenet-Based Speech Generation.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

An Investigation of Subband Wavenet Vocoder Covering Entire Audible Frequency Range with Limited Acoustic Features.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Connectionist Temporal Classification-based Sound Event Encoder for Converting Sound Events into Onomatopoeic Representations.
Proceedings of the 26th European Signal Processing Conference, 2018

Electrolaryngeal Speech Enhancement with Statistical Voice Conversion based on CLDNN.
Proceedings of the 26th European Signal Processing Conference, 2018

Anomalous Sound Event Detection Based on WaveNet.
Proceedings of the 26th European Signal Processing Conference, 2018

Development of "KamiRepo" system with automatic student identification to handle handwritten assignments on LMS.
Proceedings of the 2018 IEEE Global Engineering Education Conference, 2018

Self-Produced Speech Enhancement and Suppression Method using Air- and Body-Conductive Microphones.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Articulatory Controllable Speech Modification Based on Statistical Inversion and Production Mappings.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Duration-Controlled LSTM for Polyphonic Sound Event Detection.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Preserving Word-Level Emphasis in Speech-to-Speech Translation.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

A Vibration Control Method of an Electrolarynx Based on Statistical <i>F</i><sub>0</sub> Pattern Prediction.
IEICE Trans. Inf. Syst., 2017

A modulation property of time-frequency derivatives of filtered phase and its application to aperiodicity and fo estimation.
CoRR, 2017

Missing component restoration for masked speech signals based on time-domain spectrogram factorization.
Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing, 2017

Physically Constrained Statistical F<sub>0</sub> Prediction for Electrolaryngeal Speech Enhancement.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Speaker-Dependent WaveNet Vocoder.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Speech Enhancement Using Non-Negative Spectrogram Models with Mel-Generalized Cepstral Regularization.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Statistical Voice Conversion with WaveNet-Based Waveform Generation.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A New Cosine Series Antialiasing Function and its Application to Aliasing-Free Glottal Source Models for Speech and Singing Synthesis.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A Modulation Property of Time-Frequency Derivatives of Filtered Phase and its Application to Aperiodicity and f<sub>o</sub> Estimation.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A noise suppression method for body-conducted soft speech based on non-negative tensor factorization of air- and body-conducted signals.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

BLSTM-HMM hybrid system combined with sound activity detection network for polyphonic Sound Event Detection.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Stereophonic music separation based on non-negative tensor factorization with cepstrum regularization.
Proceedings of the 25th European Signal Processing Conference, 2017

Subband wavenet with overlapped single-sideband filterbanks.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

An investigation of multi-speaker training for wavenet vocoder.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Deep acoustic-to-articulatory inversion mapping with latent trajectory modeling.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

An investigation of recurrent neural network for daily activity recognition using multi-modal signals.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Electrolaryngeal speech modification towards singing aid system for laryngectomees.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

An investigation of how to design control parameters for statistical voice timbre control.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Accurate estimation of f0 and aperiodicity based on periodicity detector residuals and deviations of phase derivatives.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Teaching Social Communication Skills Through Human-Agent Interaction.
ACM Trans. Interact. Intell. Syst., 2016

Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Postfilters to Modify the Modulation Spectrum for Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Learning cooperative persuasive dialogue policies using framing.
Speech Commun., 2016

A Statistical Sample-Based Approach to GMM-Based Voice Conversion Using Tied-Covariance Acoustic Models.
IEICE Trans. Inf. Syst., 2016

Non-Native Text-to-Speech Preserving Speaker Individuality Based on Partial Correction of Prosodic and Phonetic Characteristics.
IEICE Trans. Inf. Syst., 2016

Enhancing Event-Related Potentials Based on Maximum a Posteriori Estimation with a Spatial Correlation Prior.
IEICE Trans. Inf. Syst., 2016

Improvements of Voice Timbre Control Based on Perceived Age in Singing Voice Conversion.
IEICE Trans. Inf. Syst., 2016

Nonaudible murmur enhancement based on statistical voice conversion and noise suppression with external noise monitoring.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

F0 transformation techniques for statistical voice conversion with direct waveform modification with spectral differential.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Active Learning for Example-Based Dialog Systems.
Proceedings of the Dialogues with Social Robots, 2016

The Voice Conversion Challenge 2016.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Acoustic-to-Articulatory Inversion Mapping Based on Latent Trajectory Gaussian Mixture Model.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Model Integration for HMM- and DNN-Based Speech Synthesis Using Product-of-Experts Framework.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

The NU-NAIST Voice Conversion System for the Voice Conversion Challenge 2016.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

A Hybrid System for Continuous Word-Level Emphasis Modeling Based on HMM State Clustering and Adaptive Training.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

An estimation method of voice timbre evaluation values using feature extraction with Gaussian mixture model based on reference singer.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Statistical F0 prediction for electrolaryngeal speech enhancement considering generative process of F0 contours within product of experts framework.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Noise suppression method for body-conducted soft speech enhancement based on external noise monitoring.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Implementation of F0 transformation for statistical singing voice conversion based on direct waveform modification.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Real-time vibration control of an electrolarynx based on statistical F0 contour prediction.
Proceedings of the 24th European Signal Processing Conference, 2016

Removing noise from event-related potentials using a probabilistic generative model with grouped covariance matrices.
Proceedings of the 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2016

Bidirectional LSTM-HMM Hybrid System for Polyphonic Sound Event Detection.
Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, 2016

Semantic Parsing of Ambiguous Input through Paraphrasing and Verification.
Trans. Assoc. Comput. Linguistics, 2015

NOCOA+: Multimodal Computer-Based Training for Social and Communication Skills.
IEICE Trans. Inf. Syst., 2015

An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering.
Proceedings of the Tenth Workshop on Statistical Machine Translation, 2015

Construction and analysis of social-affective interaction corpus in English and Indonesian.
Proceedings of the 2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2015

Ckylark: A More Robust PCFG-LA Parser.
Proceedings of the NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31, 2015

Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation (T).
Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, 2015

Pseudogen: A Tool to Automatically Generate Pseudo-Code from Source Code.
Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, 2015

Improving translation of emphasis with pause prediction in speech-to-speech translation systems.
Proceedings of the 12th International Workshop on Spoken Language Translation: Papers, 2015

Automated Social Skills Trainer.
Proceedings of the 20th International Conference on Intelligent User Interfaces, 2015

Articulatory controllable speech modification based on Gaussian mixture models with direct waveform modification using spectrum differential.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Modulation spectrum-constrained trajectory training algorithm for HMM-based speech synthesis.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Non-audible murmur enhancement based on statistical conversion using air- and body-conductive microphones in noisy environments.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Non-native speech synthesis preserving speaker individuality based on partial correction of prosodic and phonetic characteristics.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A latent variable model for joint pause prediction and dependency parsing.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Speed or accuracy? a study in evaluation of simultaneous speech translation.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Statistical singing voice conversion based on direct waveform modification with global variance.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Preserving word-level emphasis in speech-to-speech translation using linear regression HSMMs.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

SAS: A speaker verification spoofing database containing diverse attacks.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Combination of two-dimensional cochleogram and spectrogram features for deep learning-based ASR.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Modulation spectrum-constrained trajectory training algorithm for GMM-based Voice Conversion.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Parameter generation algorithm considering Modulation Spectrum for HMM-based speech synthesis.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

EEG signal enhancement using multi-channel wiener filter with a spatial correlation prior.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

An evaluation of EEG ocular artifact removal with a multi-channel wiener filter based on probabilistic generative model.
Proceedings of the 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2015

The NAIST Text-to-Speech System for the Blizzard Challenge 2015.
Proceedings of the Blizzard Challenge 2015, 2015

An Enhanced Electrolarynx with Automatic Fundamental Frequency Control based on Statistical Prediction.
Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility, 2015

Incremental sentence compression using LSTM recurrent networks.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Adaptive selection from multiple response candidates in example-based dialogue.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

A study of social-affective communication: Automatic prediction of emotion triggers and responses in television talk shows.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

The NAIST ASR system for the 2015 Multi-Genre Broadcast challenge: On combination of deep learning systems using a rank-score function.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Aliasing-free implementation of discrete-time glottal source models and their applications to speech synthesis and F0 extractor evaluation.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

Syntax-based Simultaneous Translation through Prediction of Unseen Syntactic Constituents.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015

Improving Pivot Translation by Remembering the Pivot.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015

An Analysis Towards Dialogue-Based Deception Detection.
Proceedings of the Natural Language Dialog Systems and Intelligent Assistants, 2015

Unknown Word Detection Based on Event-Related Brain Desynchronization Responses.
Proceedings of the Natural Language Dialog Systems and Intelligent Assistants, 2015

Linguistic Individuality Transformation for Spoken Language.
Proceedings of the Natural Language Dialog Systems and Intelligent Assistants, 2015

A Study on Natural Expressive Speech: Automatic Memorable Spoken Quote Detection.
Proceedings of the Natural Language Dialog Systems and Intelligent Assistants, 2015

Evaluation of a Fully Automatic Cooperative Persuasive Dialogue System.
Proceedings of the Natural Language Dialog Systems and Intelligent Assistants, 2015

Alaryngeal Speech Enhancement Based on One-to-Many Eigenvoice Conversion.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

Parameter Generation Methods With Rich Context Models for High-Quality and Flexible Text-To-Speech Synthesis.
IEEE J. Sel. Top. Signal Process., 2014

A Hybrid Approach to Electrolaryngeal Speech Enhancement Based on Noise Reduction and Statistical Excitation Generation.
IEICE Trans. Inf. Syst., 2014

Utilizing Human-to-Human Conversation Examples for a Multi Domain Chat-Oriented Dialog System.
IEICE Trans. Inf. Syst., 2014

Structured Adaptive Regularization of Weight Vectors for a Robust Grapheme-to-Phoneme Conversion Model.
IEICE Trans. Inf. Syst., 2014

Voice Timbre Control Based on Perceived Age in Singing Voice Conversion.
IEICE Trans. Inf. Syst., 2014

Rule-based Syntactic Preprocessing for Syntax-based Machine Translation.
Proceedings of SSST@EMNLP 2014, 2014

Improving the robustness of example-based dialog retrieval using recursive neural network paraphrase identification.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Conversation dialog corpora from television and movie scripts.
Proceedings of the 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014

Building a free, general-domain paraphrase database for Japanese.
Proceedings of the 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014

Memorable spoken quote corpora of TED public speaking.
Proceedings of the 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014

Collection and analysis of a Japanese-English emphasized speech corpora.
Proceedings of the 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014

Collection of a Simultaneous Translation Corpus for Comparative Analysis.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Towards Multilingual Conversations in the Medical Domain: Development of Multilingual Medical Data and A Network-based ASR System.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Emotion and Its Triggers in Human Spoken Dialogue: Recognition and Analysis.
Proceedings of the Situated Dialog in Speech-Based Human-Computer Interaction, 2014

Construction and Analysis of a Persuasive Dialogue Corpus.
Proceedings of the Situated Dialog in Speech-Based Human-Computer Interaction, 2014

Articulatory controllable speech modification based on statistical feature mapping with Gaussian mixture models.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Direct F<sub>0</sub> control of an electrolarynx based on statistical excitation feature prediction and its evaluation through simulation.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Data-driven generation of text balloons based on linguistic and acoustic features of a comics-anime corpus.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Structured soft margin confidence weighted learning for grapheme-to-phoneme conversion.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Statistical singing voice conversion with direct waveform modification based on the spectrum differential.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Excitation source analysis for high-quality speech manipulation systems based on an interference-free representation of group delay with minimum phase response compensation.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A hearing impairment simulation method using audiogram-based approximation of auditory charatecteristics.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

An evaluation of excitation feature prediction in a hybrid approach to electrolaryngeal speech enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2014

A postfilter to modify the modulation spectrum in HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2014

Narrow Adaptive Regularization of weights for grapheme-to-phoneme conversion.
Proceedings of the IEEE International Conference on Acoustics, 2014

Regression approaches to perceptual age control in singing voice conversion.
Proceedings of the IEEE International Conference on Acoustics, 2014

Augmented speech production based on real-time statistical voice conversion.
Proceedings of the 2014 IEEE Global Conference on Signal and Information Processing, 2014

Modified post-filter to recover modulation spectrum for HMM-based speech synthesis.
Proceedings of the 2014 IEEE Global Conference on Signal and Information Processing, 2014

Acquiring a Dictionary of Emotion-Provoking Events.
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 2014

Reinforcement Learning of Cooperative Persuasive Dialogue Policies using Framing.
Proceedings of the COLING 2014, 2014

Discriminative Language Models as a Tool for Machine Translation Error Analysis.
Proceedings of the COLING 2014, 2014

Unnecessary utterance detection for avoiding digressions in discussion.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

An evaluation of target speech for a nonaudible murmur enhancement system in noisy environments.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

An inter-speaker evaluation through simulation of electrolarynx control based on statistical F0 prediction.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

Modulation spectrum-based post-filter for GMM-based Voice Conversion.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

An event-related brain potential study on the impact of speech recognition errors.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

Recursive neural network paraphrase identification for example-based dialog retrieval.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

The use of semantic and acoustic features for open-domain TED talk summarization.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

Gender-dependent spectrum differential models for perceived age control based on direct waveform modification in singing voice conversion.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

Excitation source design for high-quality speech manipulation systems based on a temporally static group delay representation of periodic signals.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

Optimizing Segmentation Strategies for Simultaneous Speech Translation.
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014

Linguistic and Acoustic Features for Automatic Identification of Autism Spectrum Disorders in Children's Narrative.
Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 2014

Speech Synthesis Based on Hidden Markov Models.
Proc. IEEE, 2013

Investigation of intra-speaker spectral parameter variation and its prediction towards improvement of spectral conversion metric.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Constructing a speech translation system using simultaneous interpretation data.
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers, 2013

The NAIST English speech recognition system for IWSLT 2013.
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign@IWSLT 2013, 2013

A hybrid approach to electrolaryngeal speech enhancement based on spectral subtraction and statistical voice conversion.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Improvements to HMM-based speech synthesis based on parameter generation with rich context models.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

An empirical comparison of joint optimization techniques for speech translation.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

A digital signal processor implementation of silent/electrolaryngeal speech enhancement based on real-time statistical voice conversion.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Grapheme-to-phoneme conversion based on adaptive regularization of weight vectors.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

An investigation of acoustic features for singing voice conversion based on perceptual age.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Beyond bandlimited sampling of speech spectral envelope imposed by the harmonic structure of voiced sounds.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Generalizing continuous-space translation of paralinguistic information.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Simple, lexicalized choice of translation timing for simultaneous speech translation.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Evaluation of a singing voice conversion method based on many-to-many eigenvoice conversion.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Modality and contextual differences in computer based non-verbal communication training.
Proceedings of the IEEE 4th International Conference on Cognitive Infocommunications, 2013

NAIST at the CLEF 2013 QA4MRE Pilot Task.
Proceedings of the Working Notes for CLEF 2013 Conference , 2013

Inter-Sentence Features and Thresholded Minimum Error Rate Training: NAIST at CLEF 2013 QA4MRE.
Proceedings of the Working Notes for CLEF 2013 Conference , 2013

Dialogue management for leading the conversation in persuasive dialogue systems.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Towards High-Reliability Speech Translation in the Medical Domain.
Proceedings of the First Workshop on Natural Language Processing for Medical and Healthcare Fields@IJCNLP 2013, 2013

Statistical Voice Conversion Techniques for Body-Conducted Unvoiced Speech Enhancement.
IEEE Trans. Speech Audio Process., 2012

Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech.
Speech Commun., 2012

Learning Novel Objects for Extended Mobile Manipulation.
J. Intell. Robotic Syst., 2012

The 2012 KIT and KIT-NAIST English ASR systems for the IWSLT evaluation.
Proceedings of the 2012 International Workshop on Spoken Language Translation, 2012

The NAIST machine translation system for IWSLT2012.
Proceedings of the 2012 International Workshop on Spoken Language Translation, 2012

A method for translation of paralinguistic information.
Proceedings of the 2012 International Workshop on Spoken Language Translation, 2012

The KIT-NAIST (contrastive) English ASR system for IWSLT 2012.
Proceedings of the 2012 International Workshop on Spoken Language Translation, 2012

Developing Non-goal Dialog System Based on Examples of Drama Television.
Proceedings of the Natural Interaction with Robots, 2012

Blind speech extraction for Non-Audible Murmur speech with speaker's movement noise.
Proceedings of the IEEE International Symposium on Signal Processing and Information Technology, 2012

Implementation of Computationally Efficient Real-Time Voice Conversion.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

An Evaluation of Parameter Generation Methods with Rich Context Models in HMM-Based Speech Synthesis.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Statistical approach to voice quality control in esophageal speech enhancement.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Non-verbal cognitive skills and autistic conditions: An analysis and training tool.
Proceedings of the IEEE 3rd International Conference on Cognitive Infocommunications, 2012

Singing voice conversion method based on many-to-many eigenvoice conversion and training data generation using a singing-to-singing synthesis system.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

Speaker-Adaptive Speech Synthesis Based on Eigenvoice Conversion and Language-Dependent Prosodic Conversion in Speech-to-Speech Translation.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

An evaluation of alaryngeal speech enhancement methods based on voice conversion techniques.
Proceedings of the IEEE International Conference on Acoustics, 2011

Acoustic model training for non-audible murmur recognition using transformed normal speech data.
Proceedings of the IEEE International Conference on Acoustics, 2011

Blind noise suppression for Non-Audible Murmur recognition with stereo signal processing.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

Introduction to the Special Section on Voice Transformation.
IEEE Trans. Speech Audio Process., 2010

Improvement to a NAM-captured whisper-to-speech system.
Speech Commun., 2010

Silent-speech enhancement using body-conducted vocal-tract resonance signals.
Speech Commun., 2010

Improvements of the One-to-Many Eigenvoice Conversion System.
IEICE Trans. Inf. Syst., 2010

Adaptive Training for Voice Conversion Based on Eigenvoices.
IEICE Trans. Inf. Syst., 2010

Evaluation of Extremely Small Sound Source Signals Used in Speaking-Aid System with Statistical Voice Conversion.
IEICE Trans. Inf. Syst., 2010

Esophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models.
IEICE Trans. Inf. Syst., 2010

Linear transformation approaches to many-to-one voice conversion.
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

Improved training of excitation for HMM-based parametric speech synthesis.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Adaptive voice-quality control based on one-to-many eigenvoice conversion.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

The use of air-pressure sensor in electrolaryngeal speech enhancement based on statistical voice conversion.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Non-parallel training for many-to-many eigenvoice conversion.
Proceedings of the IEEE International Conference on Acoustics, 2010

Statistical approach to enhancing esophageal speech based on Gaussian mixture models.
Proceedings of the IEEE International Conference on Acoustics, 2010

NICT Blizzard Challenge 2010 Entry.
Proceedings of the Blizzard Challenge 2010, Kansai Science City, Japan, September 25, 2010, 2010

Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis.
IEEE Trans. Speech Audio Process., 2009

Techniques in rapid unsupervised speaker adaptation based on HMM-Sufficient Statistics.
Speech Commun., 2009

Multimodal HMM-based NAM-to-speech conversion.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Technologies for processing body-conducted speech detected with non-audible murmur microphone.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Many-to-many eigenvoice conversion with reference voice.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Electrolaryngeal speech enhancement based on statistical voice conversion.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

A decision tree-based clustering approach to state definition in an excitation modeling framework for HMM-based speech synthesis.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Cross-language voice conversion based on eigenvoices.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Probablistic modelling of F0 in unvoiced regions in HMM based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2009

Trajectory training considering global variance for HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2009

Voice conversion for various types of body transmitted speech.
Proceedings of the IEEE International Conference on Acoustics, 2009

Acoustic compensation methods for body transmitted speech conversion.
Proceedings of the IEEE International Conference on Acoustics, 2009

The NICT Entry for the Blizzard Challenge 2009: an Enhanced HMM-based Speech Synthesis System with Trajectory Training considering Global Variance and State-Dependent Mixed Excitation.
Proceedings of the Blizzard Challenge 2009, Edinburgh, Scotland, UK, September 4, 2009, 2009

Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model.
Speech Commun., 2008

The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006.
IEICE Trans. Inf. Syst., 2008

Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method.
IEICE Trans. Inf. Syst., 2008

Cost Reduction of Acoustic Modeling for Real-Environment Applications Using Unsupervised and Selective Training.
IEICE Trans. Inf. Syst., 2008

Simultaneous Acoustic, Prosodic, and Phrasing Model Training for TTs Conversion Systems.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Simultaneous conversion of duration and spectrum based on statistical models including time-sequence matching.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Maximum a posteriori adaptation for many-to-one eigenvoice conversion.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

An improved one-to-many eigenvoice conversion system.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Evaluation of speaking-aid system with voice conversion for laryngectomees toward its use in practical environments.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Performance evaluation of the speaker-independent HMM-based speech synthesis system "HTS 2007" for the Blizzard Challenge 2007.
Proceedings of the IEEE International Conference on Acoustics, 2008

Statistical approach to vocal tract transfer function estimation based on factor analyzed trajectory HMM.
Proceedings of the IEEE International Conference on Acoustics, 2008

On the state definition for a trainable excitation model in HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2008

The HTS-2008 System: Yet Another Evaluation of the Speaker-Adaptive HMM-based Speech Synthesis System in The 2008 Blizzard Challenge.
Proceedings of the Blizzard Challenge 2008, 2008

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008.
Proceedings of the Blizzard Challenge 2008, 2008

Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory.
IEEE Trans. Speech Audio Process., 2007

Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005.
IEICE Trans. Inf. Syst., 2007

A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis.
IEICE Trans. Inf. Syst., 2007

Reducing Computation Time of the Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics.
IEICE Trans. Inf. Syst., 2007

Improved average-voice-based speech synthesis using gender-mixed modeling and a parameter generation algorithm considering GV.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

An evaluation of many-to-one voice conversion algorithms with pre-stored speaker data sets.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

Communicative speech synthesis with XIMERA: a first step.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

Regression approaches to voice quality controll based on one-to-many eigenvoice conversion.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

Spectral conversion based on statistical models including time-sequence matching.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

An excitation model for HMM-based speech synthesis based on residual modeling.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

Speaker adaptive training for one-to-many eigenvoice conversion based on Gaussian mixture model.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Impact of various small sound source signals on voice conversion accuracy in speech communication aid for laryngectomees.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

A trainable excitation model for HMM-based speech synthesis.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Rapid unsupervised speaker adaptation using single utterance based on MLLR and speaker selection.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Development of preschool children subsystem for ASR and q&a in a real-environment speech-oriented guidance task.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

One-to-Many and Many-to-One Voice Conversion Based on Eigenvoices.
Proceedings of the IEEE International Conference on Acoustics, 2007

Speaker-independent HMM-based speech synthesis system - HTS-2007 system for the Blizzard Challenge 2007.
Proceedings of the Evaluation of text-to-speech systems: Blizzard Challenge 2007, 2007

ATRECSS - ATR English speech corpus for speech synthesis.
Proceedings of the Evaluation of text-to-speech systems: Blizzard Challenge 2007, 2007

An evaluation of cost functions sensitively capturing local degradation of naturalness for segment selection in concatenative speech synthesis.
Speech Commun., 2006

Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models.
IEICE Trans. Inf. Syst., 2006

Utterance-Based Selective Training for the Automatic Creation of Task-Dependent Acoustic Models.
IEICE Trans. Inf. Syst., 2006

Voice conversion based on mixtures of factor analyzers.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Eigenvoice conversion based on Gaussian mixture model.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Speaking aid system for total laryngectomees using voice conversion of body transmitted artificial speech.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Improving body transmitted unvoiced speech with statistical voice conversion.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Acoustic modeling for spoken dialogue systems based on unsupervised utterance-based selective training.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

On the Use of Phonetic Information for Mapping from Articulatory Movements to Vocal Tract Spectrum.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Improving Rapid Unsupervised Speaker Adaptation Based On Hmm Sufficient Statistics.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Developing a Test Bed of English Text-to-Speech System XIMERA for the Blizzard Challenge 2006.
Proceedings of the Blizzard Challenge 2006, Pittsburgh, PA, USA, September 16, 2006, 2006

Designing Target Cost Function Based on Prosody of Speech Database.
IEICE Trans. Inf. Syst., 2005

An overview of nitech HMM-based speech synthesis system for blizzard challenge 2005.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

NAM-to-speech conversion with Gaussian mixture models.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Spectral Conversion Based on Maximum Likelihood Estimation Considering Global Variance of Converted Parameter.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Mapping from articulatory movements to vocal tract spectrum with Gaussian mixture model for articulatory speech synthesis.
Proceedings of the Fifth ISCA ITRW on Speech Synthesis, 2004

XIMERA: a new TTS from ATR based on corpus-based technologies.
Proceedings of the Fifth ISCA ITRW on Speech Synthesis, 2004

Perceptual Evaluation of Quality Deterioration Owing to Prosody Modification.
Proceedings of the Fourth International Conference on Language Resources and Evaluation, 2004

Acoustic-to-articulatory inversion mapping with Gaussian mixture model.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Optimizing sub-cost functions for segment selection based on perceptual evaluations in concatenative speech synthesis.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

An evaluation of automatic phone segmentation for concatenative speech synthesis.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Optimizing integrated cost function for segment selection in concatenative speech synthesis based on perceptual evaluations.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Simple designing methods of corpus-based visual speech synthesis.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

GMM-based voice conversion applied to emotional speech synthesis.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Segment selection considering local degradation of naturalness in concatenative speech synthesis.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Designing speech database with prosodic variety for expressive TTS system.
Proceedings of the Third International Conference on Language Resources and Evaluation, 2002

Evaluation of cross-language voice conversion using bilingual and non-bilingual databases.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Designing Japanese speech database covering wide range in prosody for hybrid speech synthesizer.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Unit selection algorithm for Japanese speech synthesis based on both phoneme unit and diphone unit.
Proceedings of the IEEE International Conference on Acoustics, 2002

High quality voice conversion based on Gaussian mixture model with dynamic frequency warping.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Evaluation of cross-language voice conversion based on GMM and straight.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum.
Proceedings of the IEEE International Conference on Acoustics, 2001

Straight-based voice conversion algorithm based on Gaussian mixture model.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000
