Kou Tanaka

Orcid: 0009-0003-7107-607X

According to our database1, Kou Tanaka authored at least 55 papers between 2013 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
VoiceGrad: Non-Parallel Any-to-Many Voice Conversion With Annealed Langevin Dynamics.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation.
CoRR, 2024

Selecting N-Lowest Scores for Training MOS Prediction Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator.
Proceedings of the IEEE International Conference on Acoustics, 2024

Learning to Assess Subjective Impressions from Speech.
Proceedings of the 32nd European Signal Processing Conference, 2024

2023
Non-Parallel Whisper-to-Normal Speaking Style Conversion Using Auxiliary Classifier Variational Autoencoder.
IEEE Access, 2023

PRVAE-VC: Non-Parallel Many-to-Many Voice Conversion with Perturbation-Resistant Variational Autoencoder.
Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

CFVC: Conditional Filtering for Controllable Voice Conversion.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

JSV-VC: Jointly Trained Speaker Verification and Voice Conversion Models.
Proceedings of the IEEE International Conference on Acoustics, 2023

Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2023

W2N-AVSC: Audiovisual Extension For Whisper-To-Normal Speech Conversion.
Proceedings of the 31st European Signal Processing Conference, 2023

2022
Distilling Sequence-to-Sequence Voice Conversion Models for Streaming Conversion Applications.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

MISRNet: Lightweight Neural Vocoder Using Multi-Input Single Shared Residual Blocks.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

CAUSE: Crossmodal Action Unit Sequence Estimation from Speech.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

ISTFTNET: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Many-to-Many Voice Transformer Network.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

FastS2S-VC: Streaming Non-Autoregressive Sequence-to-Sequence Voice Conversion.
CoRR, 2021

Maskcyclegan-VC: Learning Non-Parallel Voice Conversion with Filling in Frames.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
ConvS2S-VC: Fully Convolutional Sequence-to-Sequence Voice Conversion.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Nonparallel Voice Conversion With Augmented Classifier Star Generative Adversarial Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Collection and Analysis of Dialogues Provided by Two Speakers Acting as One.
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2020

CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-Spectrogram Conversion.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Phoneme Embeddings on Predicting Fundamental Frequency Pattern for Electrolaryngeal Speech.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019
ACVAE-VC: Non-Parallel Voice Conversion With Auxiliary Classifier Variational Autoencoder.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

The ASVspoof 2019 database.
CoRR, 2019

Crossmodal Voice Conversion.
CoRR, 2019

WaveCycleGAN2: Time-domain Neural Post-filter for Speech Waveform Generation.
CoRR, 2019

An Investigation of Features for Fundamental Frequency Pattern Prediction in Electrolaryngeal Speech Enhancement.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

ATTS2S-VC: Sequence-to-sequence Voice Conversion with Attention and Context Preservation Mechanisms.
Proceedings of the IEEE International Conference on Acoustics, 2019

Cyclegan-VC2: Improved Cyclegan-based Non-parallel Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2019

2018
ConvS2S-VC: Fully convolutional sequence-to-sequence voice conversion.
CoRR, 2018

WaveCycleGAN: Synthetic-to-natural speech waveform conversion using cycle-consistent adversarial networks.
CoRR, 2018

ACVAE-VC: Non-parallel many-to-many voice conversion with auxiliary classifier variational autoencoder.
CoRR, 2018

StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks.
CoRR, 2018

Generative adversarial network-based approach to signal reconstruction from magnitude spectrograms.
CoRR, 2018

Synthetic-to-Natural Speech Waveform Conversion Using Cycle-Consistent Adversarial Networks.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

StarGAN-VC: non-parallel many-to-many Voice Conversion Using Star Generative Adversarial Networks.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Vae-Space: Deep Generative Model of Voice Fundamental Frequency Contours.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Generative adversarial network-based approach to signal reconstruction from magnitude spectrogram.
Proceedings of the 26th European Signal Processing Conference, 2018

Automatic Speech Pronunciation Correction with Dynamic Frequency Warping-Based Spectral Conversion.
Proceedings of the 26th European Signal Processing Conference, 2018

2017
A Vibration Control Method of an Electrolarynx Based on Statistical <i>F</i><sub>0</sub> Pattern Prediction.
IEICE Trans. Inf. Syst., 2017

Physically Constrained Statistical F<sub>0</sub> Prediction for Electrolaryngeal Speech Enhancement.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016
Statistical F0 prediction for electrolaryngeal speech enhancement considering generative process of F0 contours within product of experts framework.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Real-time vibration control of an electrolarynx based on statistical F0 contour prediction.
Proceedings of the 24th European Signal Processing Conference, 2016

2015
Non-audible murmur enhancement based on statistical conversion using air- and body-conductive microphones in noisy environments.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

The NAIST Text-to-Speech System for the Blizzard Challenge 2015.
Proceedings of the Blizzard Challenge 2015, 2015

An Enhanced Electrolarynx with Automatic Fundamental Frequency Control based on Statistical Prediction.
Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility, 2015

2014
A Hybrid Approach to Electrolaryngeal Speech Enhancement Based on Noise Reduction and Statistical Excitation Generation.
IEICE Trans. Inf. Syst., 2014

Direct F<sub>0</sub> control of an electrolarynx based on statistical excitation feature prediction and its evaluation through simulation.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

An evaluation of excitation feature prediction in a hybrid approach to electrolaryngeal speech enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2014

An evaluation of target speech for a nonaudible murmur enhancement system in noisy environments.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

An inter-speaker evaluation through simulation of electrolarynx control based on statistical F0 prediction.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

2013
A hybrid approach to electrolaryngeal speech enhancement based on spectral subtraction and statistical voice conversion.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013


  Loading...