Tomoki Hayashi

Orcid: 0000-0001-8782-4093

According to our database1, Tomoki Hayashi authored at least 84 papers between 2010 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 




Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study.
CoRR, 2023

Low-Latency Electrolaryngeal Speech Enhancement Based on Fastspeech2-Based Voice Conversion and Self-Supervised Speech Representation.
Proceedings of the IEEE International Conference on Acoustics, 2023

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023

A Comparative Study of Self-Supervised Speech Representation Based Voice Conversion.
IEEE J. Sel. Top. Signal Process., 2022

Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis.
CoRR, 2022

Efficient Training Method for Point Cloud-Based Object Detection Models by Combining Environmental Transitions and Active Learning.
Proceedings of the Robot Intelligence Technology and Applications 7, 2022

Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations.
Proceedings of the IEEE International Conference on Acoustics, 2022

An Investigation of Streaming Non-Autoregressive sequence-to-sequence Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2022

Improvement of Serial Approach to Anomalous Sound Detection by Incorporating Two Binary Cross-Entropies for Outlier Exposure.
Proceedings of the 30th European Signal Processing Conference, 2022

Note-level Automatic Guitar Transcription Using Attention Mechanism.
Proceedings of the 30th European Signal Processing Conference, 2022

Improving Dense Representation Learning by Superpixelization and Contrasting Cluster Assignment.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Quasi-Periodic Parallel WaveGAN: A Non-Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Pretraining Techniques for Sequence-to-Sequence Voice Conversion.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem.
CoRR, 2021

ViCE: Self-Supervised Visual Concept Embeddings as Contextual and Pixel Appearance Invariant Semantic Representations.
CoRR, 2021

ESPnet2-TTS: Extending the Edge of TTS Research.
CoRR, 2021

ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for ASR Integration.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Acoustic Event Detection with Classifier Chains.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder.
Proceedings of the IEEE International Conference on Acoustics, 2021

Any-to-One Sequence-to-Sequence Voice Conversion Using Self-Supervised Discrete Speech Representations.
Proceedings of the IEEE International Conference on Acoustics, 2021

Non-Autoregressive Sequence-To-Sequence Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2021

Recent Developments on Espnet Toolkit Boosted By Conformer.
Proceedings of the IEEE International Conference on Acoustics, 2021

Anomalous Sound Detection Using a Binary Classification Model and Class Centroids.
Proceedings of the 29th European Signal Processing Conference, 2021

Spontaneous Speech Summarization: Transformers All The Way Through.
Proceedings of the 29th European Signal Processing Conference, 2021

Leveraging State-of-the-art ASR Techniques to Audio Captioning.
Proceedings of the 6th Workshop on Detection and Classification of Acoustic Scenes and Events 2021 (DCASE 2021), 2021

An Ensemble Approach to Anomalous Sound Detection Based on Conformer-Based Autoencoder and Binary Classifier Incorporated with Metric Learning.
Proceedings of the 6th Workshop on Detection and Classification of Acoustic Scenes and Events 2021 (DCASE 2021), 2021

On Prosody Modeling for ASR+TTS Based Voice Conversion.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans.
CoRR, 2020

Any-to-One Sequence-to-Sequence Voice Conversion using Self-Supervised Discrete Speech Representations.
CoRR, 2020

DiscreTalk: Text-to-Speech as a Machine Translation Problem.
CoRR, 2020

Non-Parallel Voice Conversion System With WaveNet Vocoder and Collapsed Speech Suppression.
IEEE Access, 2020

Quasi-Periodic Parallel WaveGAN Vocoder: A Non-Autoregressive Pitch-Dependent Dilated Convolution Model for Parametric Speech Generation.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Cyclic Spectral Modeling for Unsupervised Unit Discovery into Voice Conversion with Excitation and Waveform Modeling.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Intelligibility Enhancement Based on Speech Waveform Modification Using Hearing Impairment.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-to-End Automatic Speech Recognition Integrated with CTC-Based Voice Activity Detection.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Efficient Shallow Wavenet Vocoder Using Multiple Samples Output Based on Laplacian Distribution and Linear Prediction.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Weakly-Supervised Sound Event Detection with Self-Attention.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Semi-Supervised Speaker Adaptation for End-to-End Speech Synthesis with Pretrained Models.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Espnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Conformer-Based Sound Event Detection with Semi-Supervised Learning and Data Augmentation.
Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), 2020

The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

ESPnet-ST: All-in-One Speech Translation Toolkit.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2020

Voice Conversion With CycleRNN-Based Spectral Mapping and Finely Tuned WaveNet Vocoder.
IEEE Access, 2019

Statistical Voice Conversion with Quasi-periodic WaveNet Vocoder.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Non-Parallel Voice Conversion with Cyclic Variational Autoencoder.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Investigation of F0 Conditioning and Fully Convolutional Networks in Variational Autoencoder Based Voice Conversion.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Pre-Trained Text Embeddings for Enhanced Text-to-Speech Synthesis.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Voice Conversion with Cyclic Recurrent Neural Network and Fine-tuned Wavenet Vocoder.
Proceedings of the IEEE International Conference on Acoustics, 2019

Scene-dependent Anomalous Acoustic-event Detection Based on Conditional Wavenet and I-vector.
Proceedings of the IEEE International Conference on Acoustics, 2019

Cycle-consistency Training for End-to-end Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion.
Proceedings of the 27th European Signal Processing Conference, 2019

Investigation of Shallow Wavenet Vocoder with Laplacian Distribution Output.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Attention-Based Speech Recognition Using Gaze Information.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

A Comparative Study on Transformer vs RNN in Speech Applications.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Daily Activity Recognition with Large-Scaled Real-Life Recording Datasets Based on Deep Neural Network Using Multi-Modal Signals.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2018

An Evaluation of Deep Spectral Mappings and WaveNet Vocoder for Voice Conversion.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Back-Translation-Style Data Augmentation for end-to-end ASR.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

The NU Non-Parallel Voice Conversion System for the Voice Conversion Challenge 2018.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

NU Voice Conversion System for the Voice Conversion Challenge 2018.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Collapsed Speech Segment Detection and Suppression for WaveNet Vocoder.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

ESPnet: End-to-End Speech Processing Toolkit.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Multi-Head Decoder for End-to-End Speech Recognition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Connectionist Temporal Classification-based Sound Event Encoder for Converting Sound Events into Onomatopoeic Representations.
Proceedings of the 26th European Signal Processing Conference, 2018

Anomalous Sound Event Detection Based on WaveNet.
Proceedings of the 26th European Signal Processing Conference, 2018

Duration-Controlled LSTM for Polyphonic Sound Event Detection.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Hybrid CTC/Attention Architecture for End-to-End Speech Recognition.
IEEE J. Sel. Top. Signal Process., 2017

Speaker-Dependent WaveNet Vocoder.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Statistical Voice Conversion with WaveNet-Based Waveform Generation.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

BLSTM-HMM hybrid system combined with sound activity detection network for polyphonic Sound Event Detection.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

An investigation of multi-speaker training for wavenet vocoder.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

An investigation of recurrent neural network for daily activity recognition using multi-modal signals.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Bidirectional LSTM-HMM Hybrid System for Polyphonic Sound Event Detection.
Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, 2016

Exploring multi-channel features for denoising-autoencoder-based speech enhancement.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Daily activity recognition based on DNN using environmental sound and acceleration signals.
Proceedings of the 23rd European Signal Processing Conference, 2015

Noisy speech recognition using blind spatial subtraction array technique and deep bottleneck features.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

Non-rigid Surface Tracking for Virtual Fitting System.
Proceedings of the VISAPP 2013, 2013

Dream board: a visualization system by handwriting recognition.
Proceedings of the SIGGRAPH Asia 2013, 2013

Texture Overlay onto Non-rigid Surface using Commodity Depth Camera.
Proceedings of the VISAPP 2012, 2012

Skeleton Features Distribution for 3D Object Retrieval.
Proceedings of the IAPR Conference on Machine Vision Applications (IAPR MVA 2011), 2011

An Augmented Reality Setup with an Omnidirectional Camera Based on Multiple Object Detection.
Proceedings of the 20th International Conference on Pattern Recognition, 2010
