Shinnosuke Takamichi
Orcid: 0000-0003-0520-7847
According to our database1,
Shinnosuke Takamichi
authored at least 141 papers
between 2012 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
JNV corpus: A corpus of Japanese nonverbal vocalizations with diverse phrases and emotions.
Speech Commun., January, 2024
IEEE ACM Trans. Audio Speech Lang. Process., 2024
CoRR, 2024
J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling.
CoRR, 2024
CoRR, 2024
Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals.
CoRR, 2024
Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment.
CoRR, 2024
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis.
CoRR, 2024
Building speech corpus with diverse voice characteristics for its prompt-based representation.
CoRR, 2024
SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics.
CoRR, 2024
JVNV: A Corpus of Japanese Emotional Speech With Verbal Content and Nonverbal Expressions.
IEEE Access, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Diversity-Based Core-Set Selection for Text-to-Speech with Linguistic and Acoustic Features.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
2023
JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions.
Dataset, October, 2023
CoRR, 2023
IEEE Access, 2023
Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion.
Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023
TimToShape: Supporting Practice of Musical Instruments by Visualizing Timbre with 2D Shapes based on Crossmodal Correspondences.
Proceedings of the 28th International Conference on Intelligent User Interfaces, 2023
Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023
Improving Speech Prosody of Audiobook Text-To-Speech Synthesis with Acoustic and Textual Contexts.
Proceedings of the IEEE International Conference on Acoustics, 2023
MID-Attribute Speaker Generation Using Optimal-Transport-Based Interpolation of Gaussian Mixture Models.
Proceedings of the IEEE International Conference on Acoustics, 2023
Visual Onoma-to-Wave: Environmental Sound Synthesis from Visual Onomatopoeias and Sound-Source Images.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
COCO-NUT: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-Based Control.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
2022
Lightweight and irreversible speech pseudonymization based on data-driven optimization of cascaded voice modification modules.
Comput. Speech Lang., 2022
CoRR, 2022
Spontaneous speech synthesis with linguistic-speech consistency training using pseudo-filled pauses.
CoRR, 2022
Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis.
CoRR, 2022
Exploring the Effectiveness of Self-supervised Learning and Classifier Chains in Emotion Recognition of Nonverbal Vocalizations.
CoRR, 2022
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
Robustness of Signal Processing-Based Pseudonymization Method Against Decryption Attack.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
2021
Perceptual-Similarity-Aware Deep Speaker Representation Learning for Multi-Speaker Generative Modeling.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Incremental Text-to-Speech Synthesis Using Pseudo Lookahead With Large Pretrained Language Model.
IEEE Signal Process. Lett., 2021
Real-Time Full-Band Voice Conversion with Sub-Band Modeling and Data-Driven Phase Estimation of Spectral Differentials.
IEICE Trans. Inf. Syst., 2021
DNN-Based Low-Musical-Noise Single-Channel Speech Enhancement Based on Higher-Order-Moments Matching.
IEICE Trans. Inf. Syst., 2021
Noise Robust Acoustic Anomaly Detection System with Nonnegative Matrix Factorization Based on Generalized Gaussian Distribution.
IEICE Trans. Inf. Syst., 2021
JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification.
CoRR, 2021
Accent Modeling of Low-Resourced Dialect in Pitch Accent Language Using Variational Autoencoder.
Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021
Audiobook Speech Synthesis Conditioned by Cross-Sentence Context-Aware Word Embeddings.
Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021
Lightweight Voice Anonymization Based on Data-Driven Optimization of Cascaded Voice Modification Modules.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Cross-Lingual Speaker Adaptation Using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Digital Speech Makeup: Voice Conversion Based Altered Auditory Feedback for Transforming Self-Representation.
Proceedings of the ICMI '21: International Conference on Multimodal Interaction, 2021
Disentangled Speaker and Language Representations Using Mutual Information Minimization and Domain Adaptation for Cross-Lingual TTS.
Proceedings of the IEEE International Conference on Acoustics, 2021
Humanacgan: Conditional Generative Adversarial Network with Human-Based Auxiliary Classifier and its Evaluation in Phoneme Perception.
Proceedings of the IEEE International Conference on Acoustics, 2021
Japanese EFL Learners' Speaking Practice Utilizing Text-to-Speech Technology Within a Team-Based Flipped Learning Framework.
Proceedings of the Learning and Collaboration Technologies: New Challenges and Learning Experiences, 2021
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
Emotion-Controllable Speech Synthesis Using Emotion Soft Labels and Fine-Grained Prosody Factors.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021
2020
Acoustic model-based subword tokenization and prosodic-context extraction without language knowledge for text-to-speech synthesis.
Speech Commun., 2020
Phase reconstruction from amplitude spectrograms based on directional-statistics deep neural networks.
Signal Process., 2020
IEICE Trans. Inf. Syst., 2020
Generative Moment Matching Network-Based Neural Double-Tracking for Synthesized and Natural Singing Voices.
IEICE Trans. Inf. Syst., 2020
Proceedings of The 12th Language Resources and Evaluation Conference, 2020
SMASH Corpus: A Spontaneous Speech Corpus Recording Third-person Audio Commentaries on Gameplay.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020
Investigating Effective Additional Contextual Factors in DNN-Based Spontaneous Speech Synthesis.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Cross-Lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
End-to-End Text-to-Speech Synthesis with Unaligned Multiple Language Units Based on Attention.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Lifter Training and Sub-Band Modeling for Computationally Efficient and High-Quality Voice Conversion Using Spectral Differentials.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Humangan: Generative Adversarial Network With Human-Based Discriminator And Its Evaluation In Speech Perception Modeling.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), 2020
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020
2019
IEEE ACM Trans. Audio Speech Lang. Process., 2019
Prosody Correction Preserving Speaker Individuality for Chinese-Accented Japanese HMM-Based Text-to-Speech Synthesis.
IEICE Trans. Inf. Syst., 2019
Vocoder-free text-to-speech synthesis incorporating generative adversarial networks using low-/multi-frequency STFT amplitude spectra.
Comput. Speech Lang., 2019
Overview of Tasks and Investigation of Subjective Evaluation Methods in Environmental Sound Synthesis and Conversion.
CoRR, 2019
TransVoice: Real-Time Voice Conversion for Augmenting Near-Field Speech Communication.
Proceedings of the Adjunct Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology, 2019
DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019
Subword tokenization based on DNN-based acoustic model for end-to-end prosody generation.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019
Implementation of DNN-based real-time voice conversion and its improvements by audio data augmentation and mask-shaped device.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019
Proceedings of the Increasing Naturalness and Flexibility in Spoken Dialogue Interaction, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking.
Proceedings of the IEEE International Conference on Acoustics, 2019
Estimating confidence in voices using crowdsourcing for alleviating tension with altered auditory feedback.
Proceedings of the AsianCHI@CHI 2019: Asian CHI Symposium, 2019
2018
Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2018
Mach. Transl., 2018
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018
Phase Reconstruction from Amplitude Spectrograms Based on Von-Mises-Distribution Deep Neural Network.
Proceedings of the 16th International Workshop on Acoustic Signal Enhancement, 2018
Text-to-Speech Synthesis Using STFT Spectra Based on Low-/Multi-Resolution Generative Adversarial Networks.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Non-Parallel Voice Conversion Using Variational Autoencoders Conditioned by Phonetic Posteriorgrams and D-Vectors.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 26th European Signal Processing Conference, 2018
Generative approach using the noise generation models for DNN-based speech synthesis trained from noisy speech.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018
A Revisit to Feature Handling for High-quality Voice Conversion Based on Gaussian Mixture Model.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018
Data augmentation with moment-matching networks for i-vector based speaker verification.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018
Prosody-aware subword embedding considering Japanese intonation systems and its application to DNN-based multi-dialect speech synthesis.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018
2017
IEICE Trans. Inf. Syst., 2017
JSUT corpus: free large-scale Japanese speech corpus for end-to-end speech synthesis.
CoRR, 2017
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Training algorithm to deceive Anti-Spoofing Verification for DNN-based speech synthesis.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Blind source separation based on independent low-rank matrix analysis with sparse regularization for time-series activity.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Proceedings of the Blizzard Challenge 2017, Stockholm, Sweden, August 25, 2017, 2017
Modulation spectrum-based speech parameter trajectory smoothing for DNN-based speech synthesis using FFT spectra.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017
2016
Postfilters to Modify the Modulation Spectrum for Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2016
A Statistical Sample-Based Approach to GMM-Based Voice Conversion Using Tied-Covariance Acoustic Models.
IEICE Trans. Inf. Syst., 2016
Non-Native Text-to-Speech Preserving Speaker Individuality Based on Partial Correction of Prosodic and Phonetic Characteristics.
IEICE Trans. Inf. Syst., 2016
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
2015
Modulation spectrum-constrained trajectory training algorithm for HMM-based speech synthesis.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Non-native speech synthesis preserving speaker individuality based on partial correction of prosodic and phonetic characteristics.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Preserving word-level emphasis in speech-to-speech translation using linear regression HSMMs.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Modulation spectrum-constrained trajectory training algorithm for GMM-based Voice Conversion.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Parameter generation algorithm considering Modulation Spectrum for HMM-based speech synthesis.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Proceedings of the Blizzard Challenge 2015, 2015
2014
Parameter Generation Methods With Rich Context Models for High-Quality and Flexible Text-To-Speech Synthesis.
IEEE J. Sel. Top. Signal Process., 2014
A hearing impairment simulation method using audiogram-based approximation of auditory charatecteristics.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the 2014 IEEE Global Conference on Signal and Information Processing, 2014
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014
2013
Improvements to HMM-based speech synthesis based on parameter generation with rich context models.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
2012
Proceedings of the 2012 International Workshop on Spoken Language Translation, 2012
An Evaluation of Parameter Generation Methods with Rich Context Models in HMM-Based Speech Synthesis.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012