Katsutoshi Itoyama

Orcid: 0000-0002-7098-3896

According to our database1, Katsutoshi Itoyama authored at least 122 papers between 2006 and 2024.

Collaborative distances:



In proceedings 
PhD thesis 


Online presence:

On csauthors.net:


Can all variations within the unified mask-based beamformer framework achieve identical peak extraction performance?
EURASIP J. Audio Speech Music. Process., December, 2024

SLAM-Based Joint Calibration of Multiple Asynchronous Microphone Arrays and Sound Source Localization.
IEEE Trans. Robotics, 2024

From Blurry to Brilliant Detection: YOLOv5-Based Aerial Object Detection with Super Resolution.
CoRR, 2024

Real Time Sound Source Localization Using von-Mises ResNet.
Proceedings of the IEEE/SICE International Symposium on System Integration, 2024

Improving Impressions of Response Delay in AI-based Spoken Dialogue Systems.
Proceedings of the 33rd IEEE International Conference on Robot and Human Interactive Communication, 2024

Improving Noise Robustness of Automatic Speech Recognition Based on a Parallel Adapter Model with Near-Identity Initialization.
Proceedings of the Advances and Trends in Artificial Intelligence. Theory and Applications, 2024

UAV-Enhanced Combination to Application: Comprehensive Analysis and Benchmarking of a Human Detection Dataset for Disaster Scenarios.
Proceedings of the Pattern Recognition - 27th International Conference, 2024

A Video Vision Transformer for Sound Source Localization.
Proceedings of the 32nd European Signal Processing Conference, 2024

FPGA-based Low Power Acceleration of HARK Sound Source Localization.
Proceedings of the IEEE Symposium in Low-Power and High-Speed Chips, 2024

Audio-Visual Class Association Based on Two-stage Self-supervised Contrastive Learning towards Robust Scene Analysis.
Proceedings of the IEEE/SICE International Symposium on System Integration, 2023

Assessment of Simultaneous Calibration for Positions, Orientations, and Time Offsets in Multiple Microphone Arrays Systems.
Proceedings of the IEEE/SICE International Symposium on System Integration, 2023

Metric-Based Multimodal Meta-Learning for Human Movement Identification Via Footstep Recognition.
Proceedings of the IEEE/SICE International Symposium on System Integration, 2023

Reconstruction of Depth Scenes Based on Echolocation.
Proceedings of the IEEE/SICE International Symposium on System Integration, 2023

FPGA based Power-Efficient Edge Server to Accelerate Speech Interface for Socially Assistive Robotics.
Proceedings of the IEEE/SICE International Symposium on System Integration, 2023

An Ensemble Method for Multiple Speech Enhancement Using Deep Learning.
Proceedings of the IEEE/SICE International Symposium on System Integration, 2023

Improving Sign Language Understanding Introducing Label Smoothing.
Proceedings of the 32nd IEEE International Conference on Robot and Human Interactive Communication, 2023

Unsupervised Domain Adaptation of Universal Source Separation Based on Neural Full-Rank Spatial Covariance Analysis.
Proceedings of the 33rd IEEE International Workshop on Machine Learning for Signal Processing, 2023

miniStreamer: Enhancing Small Conformer with Chunked-Context Masking for Streaming ASR Applications on the Edge.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Is the Ideal Ratio Mask Really the Best? - Exploring the Best Extraction Performance and Optimal Mask of Mask-based Beamformers.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

Outdoor evaluation of sound source localization for drone groups using microphone arrays.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2022

Spotforming by NMF Using Multiple Microphone Arrays.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2022

Weakly-Supervised Neural Full-Rank Spatial Covariance Analysis for a Front-End System of Distant Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Multichannel environmental sound segmentation.
Appl. Intell., 2021

Detecting earthquakes: a novel deep learning-based approach for effective disaster response.
Appl. Intell., 2021

Assessment of a Beamforming Implementation Developed for Surface Sound Source Separation.
Proceedings of the IEEE/SICE International Symposium on System Integration, 2021

Sound Source Tracking Using Integrated Direction Likelihood for Drones with Microphone Arrays.
Proceedings of the IEEE/SICE International Symposium on System Integration, 2021

Multi-channel Environmental Sound Segmentation utilizing Sound Source Localization and Separation U-Net.
Proceedings of the IEEE/SICE International Symposium on System Integration, 2021

EMC: Earthquake Magnitudes Classification on Seismic Signals via Convolutional Recurrent Networks.
Proceedings of the IEEE/SICE International Symposium on System Integration, 2021

Assessment of von Mises-Bernoulli Deep Neural Network in Sound Source Localization.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Bayesian Singing Transcription Based on a Hierarchical Generative Model of Keys, Musical Notes, and F0 Trajectories.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Sound event aware environmental sound segmentation with Mask U-Net.
Adv. Robotics, 2020

Design and Assessment of a Scan-and-sum Beamformer for Surface Sound Source Separation.
Proceedings of the 2020 IEEE/SICE International Symposium on System Integration, 2020

Sound Source Tracking by Drones with Microphone Arrays.
Proceedings of the 2020 IEEE/SICE International Symposium on System Integration, 2020

Multi-channel Environmental sound segmentation.
Proceedings of the 2020 IEEE/SICE International Symposium on System Integration, 2020

Sound Source Localization Based on von-Mises-Bernoulli Deep Neural Network.
Proceedings of the 2020 IEEE/SICE International Symposium on System Integration, 2020

Audio-Visual 3D Reconstruction Framework for Dynamic Scenes.
Proceedings of the 2020 IEEE/SICE International Symposium on System Integration, 2020

Synchronization of Microphones Based on Rank Minimization of Warped Spectrum for Asynchronous Distributed Recording.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2020

Calibration of a Microphone Array Based on a Probabilistic Model of Microphone Positions.
Proceedings of the Trends in Artificial Intelligence Theory and Applications. Artificial Intelligence Practices, 2020

Detection of Ball Spin Direction using Hitting Sound in Tennis.
Proceedings of the 8th International Conference on Sport Sciences Research and Technology Support, 2020

Development of Tough Snake Robot Systems.
Proceedings of the Disaster Robotics - Results from the ImPACT Tough Robotics Challenge, 2019

ImPACT-TRC Thin Serpentine Robot Platform for Urban Search and Rescue.
Proceedings of the Disaster Robotics - Results from the ImPACT Tough Robotics Challenge, 2019

Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

2D sound source position estimation using microphone arrays and its application to a VR-based bird song analysis system.
Adv. Robotics, 2019

Design and assessment of multiple-sound source localization using microphone arrays.
Proceedings of the IEEE/SICE International Symposium on System Integration, 2019

Environmental sound segmentation utilizing Mask U-Net.
Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2019

Joint Transcription of Lead, Bass, and Rhythm Guitars Based on a Factorial Hidden Semi-Markov Model.
Proceedings of the IEEE International Conference on Acoustics, 2019

Improvement of DOA Estimation by using Quaternion Output in Sound Event Localization and Detection.
Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events 2019 (DCASE 2019), 2019

Bayesian Multichannel Audio Source Separation Based on Integrated Source and Spatial Models.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Speech Enhancement Based on Bayesian Low-Rank and Sparse Decomposition of Multichannel Magnitude Spectrograms.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Signal Restoration based on Bi-directional LSTM with Spectral Filtering for Robot Audition.
Proceedings of the 27th IEEE International Symposium on Robot and Human Interactive Communication, 2018

Interactive Arrangement of Chords and Melodies Based on a Tree-Structured Generative Model.
Proceedings of the 19th International Society for Music Information Retrieval Conference, 2018

Unsupervised Beamforming Based on Multichannel Nonnegative Matrix Factorization for Noisy Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Statistical Speech Enhancement Based on Probabilistic Integration of Variational Autoencoder and Non-Negative Matrix Factorization.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Sequential Generation of Singing F0 Contours from Musical Note Sequences Based on WaveNet.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Simultaneous Identification and Localization of Still and Mobile Speakers Based on Binaural Robot Audition.
J. Robotics Mechatronics, 2017

Layout Optimization of Cooperative Distributed Microphone Arrays Based on Estimation of Source Separation Performance.
J. Robotics Mechatronics, 2017

Audio-Visual Beat Tracking Based on a State-Space Model for a Robot Dancer Performing with a Human Dancer.
J. Robotics Mechatronics, 2017

Low Latency and High Quality Two-Stage Human-Voice-Enhancement System for a Hose-Shaped Rescue Robot.
J. Robotics Mechatronics, 2017

Generative Statistical Models with Self-Emergent Grammar of Chord Sequences.
CoRR, 2017

Infinite probabilistic latent component analysis for audio source separation.
Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing, 2017

Semi-Blind speech enhancement basedon recurrent neural network for source separation and dereverberation.
Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing, 2017

Function- and Rhythm-Aware Melody Harmonization Based on Tree-Structured Parsing and Split-Merge Sampling of Chord Sequences.
Proceedings of the 18th International Society for Music Information Retrieval Conference, 2017

Scale- and Rhythm-Aware Musical Note Estimation for Vocal F0 Trajectories Based on a Semi-Tatum-Synchronous Hierarchical Hidden Semi-Markov Model.
Proceedings of the 18th International Society for Music Information Retrieval Conference, 2017

Bayesian multichannel nonnegative matrix factorization for audio source separation and localization.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Singing Voice Separation and Vocal F0 Estimation Based on Mutual Combination of Robust Principal Component Analysis and Subharmonic Summation.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Sound-based online localization for an in-pipe snake robot.
Proceedings of the 2016 IEEE International Symposium on Safety, 2016

Parallel Speech Corpora of Japanese Dialects.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Student's t multichannel nonnegative matrix factorization for blind source separation.
Proceedings of the IEEE International Workshop on Acoustic Signal Enhancement, 2016

A Hierarchical Bayesian Model of Chords, Pitches, and Spectrograms for Multipitch Analysis.
Proceedings of the 17th International Society for Music Information Retrieval Conference, 2016

Musical Note Estimation for F0 Trajectories of Singing Voices Based on a Bayesian Semi-Beat-Synchronous HMM.
Proceedings of the 17th International Society for Music Information Retrieval Conference, 2016

Online simultaneous localization and mapping of multiple sound sources and asynchronous microphone arrays.
Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2016

Student's T nonnegative matrix factorization and positive semidefinite tensor factorization for single-channel audio source separation.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Rhythm transcription of MIDI performances based on hierarchical Bayesian modelling of repetition and modification of musical note patterns.
Proceedings of the 24th European Signal Processing Conference, 2016

A unified Bayesian model of time-frequency clustering and low-rank approximation for multi-channel source separation.
Proceedings of the 24th European Signal Processing Conference, 2016

Variational Bayesian multi-channel robust NMF for human-voice enhancement with a deformable and partially-occluded microphone array.
Proceedings of the 24th European Signal Processing Conference, 2016

Automatic Speech Recognition for Mixed Dialect Utterances by Mixing Dialect Language Models.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

HMM-based Attacks on Google's ReCAPTCHA with Continuous Visual and Audio Symbols.
J. Inf. Process., 2015

Toward a quizmaster robot for speech-based multiparty interaction.
Adv. Robotics, 2015

Posture estimation of hose-shaped robot by using active microphone array.
Adv. Robotics, 2015

Unified inter- and intra-recording duration model for multiple music audio alignment.
Proceedings of the 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2015

Human-voice enhancement based on online RPCA for a hose-shaped rescue robot with a microphone array.
Proceedings of the 2015 IEEE International Symposium on Safety, 2015

Identification and Localization of One or Two Concurrent Speakers in a Binaural Robotic Context.
Proceedings of the 2015 IEEE International Conference on Systems, 2015

Infinite Superimposed Discrete All-Pole Modeling for Multipitch Analysis of Wavelet Spectrograms.
Proceedings of the 16th International Society for Music Information Retrieval Conference, 2015

Optimizing the layout of multiple mobile robots for cooperative sound source separation.
Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2015

Audio-visual beat tracking based on a state-space model for a music robot dancing with humans.
Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2015

Microphone-accelerometer based 3D posture estimation for a hose-shaped rescue robot.
Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2015

Bayesian integration of sound source separation and speech recognition: a new approach to simultaneous speech recognition.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A feedback framework for improved chord recognition based on NMF-based approximate note transcription.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Singing voice analysis and editing based on mutually dependent F0 estimation and source separation.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Challenges in deploying a microphone array to localize and separate sound sources in real auditory scenes.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Recognition of In-Field Frog Chorusing Using Bayesian Nonparametric Microphone Array Processing.
Proceedings of the Computational Sustainability, 2015

Nonparametric Bayesian dereverberation of power spectrograms based on infinite-order autoregressive processes.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

A sound-based online method for estimating the time-varying posture of a hose-shaped robot.
Proceedings of the 2014 IEEE International Symposium on Safety, 2014

Sound annotation tool for multidirectional sounds based on spatial information extracted by HARK robot audition software.
Proceedings of the 2014 IEEE International Conference on Systems, Man, and Cybernetics, 2014

Bayesian Audio Alignment based on a Unified Model of Music Composition and Performance.
Proceedings of the 15th International Society for Music Information Retrieval Conference, 2014

Visualization of auditory awareness based on sound source positions estimated by depth sensor and microphone array.
Proceedings of the 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2014

Transferring Vocal Expression of F0 Contour Using Singing Voice Synthesizer.
Proceedings of the Modern Advances in Applied Intelligence, 2014

Parameter Estimation of Virtual Musical Instrument Synthesizers.
Proceedings of the Music Technology meets Philosophy, 2014

Automatic transcription of guitar tablature from audio signals in accordance with player's proficiency.
Proceedings of the IEEE International Conference on Acoustics, 2014

Transcribing vocal expression from polyphonic music.
Proceedings of the IEEE International Conference on Acoustics, 2014

A robot quizmaster that can localize, separate, and recognize simultaneous utterances for a fastest-voice-first quiz game.
Proceedings of the 14th IEEE-RAS International Conference on Humanoid Robots, 2014

Robust Multipitch Analyzer against Initialization based on Latent Harmonic Allocation using Overtone Corpus.
J. Inf. Process., 2013

Noise correlation matrix estimation for improving sound source localization by multirotor UAV.
Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2013

Posture estimation of hose-shaped robot using microphone array localization.
Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2013

Automatic estimation of dialect mixing ratio for dialect speech recognition.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Audio-based guitar tablature transcription using multipitch analysis and playability constraints.
Proceedings of the IEEE International Conference on Acoustics, 2013

Initialization-robust Bayesian multipitch analyzer based on psychoacoustical and musical criteria.
Proceedings of the IEEE International Conference on Acoustics, 2013

Multiple index combination for Japanese spoken term detection with optimum index selection based on OOV-region classifier.
Proceedings of the IEEE International Conference on Acoustics, 2013

Automated Violin Fingering Transcription Through Analysis of an Audio Recording.
Comput. Music. J., 2012

Bayesian Nonnegative Harmonic-Temporal Factorization and Its Application to Multipitch Analysis.
Proceedings of the 13th International Society for Music Information Retrieval Conference, 2012

Automatic Chord Recognition Based on Probabilistic Integration of Acoustic Features, Bass Sounds, and Chord Transition.
Proceedings of the Advanced Research in Applied Artificial Intelligence, 2012

Initialization-robust multipitch estimation based on latent harmonic allocation using overtone corpus.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

A musical mood trajectory estimation method using lyrics and acoustic features.
Proceedings of the 1st international ACM workshop on Music information retrieval with user-centered and multimodal strategies, Scottsdale, AZ, USA, November 28, 2011

Simultaneous processing of sound source separation and musical instrument identification using Bayesian spectral modeling.
Proceedings of the IEEE International Conference on Acoustics, 2011

Violin Fingering Estimation Based on Violin Pedagogical Fingering Model Constrained by Bowed Sequence Estimation from Audio Input.
Proceedings of the Trends in Applied Intelligent Systems, 2010

Parameter Estimation for Harmonic and Inharmonic Models by Using Timbre Feature Distributions.
J. Inf. Process., 2009

Changing timbre and phrase in existing musical performances as you like: manipulations of single part using harmonic and inharmonic models.
Proceedings of the 17th International Conference on Multimedia 2009, 2009

Bowed String Sequence Estimation of a Violin Based on Adaptive Audio Signal Classification and Context-Dependent Error Correction.
Proceedings of the 11th IEEE International Symposium on Multimedia, 2009

Automatic Chord Recognition Based on Probabilistic Integration of Chord Transition and Bass Pitch Estimation.
Proceedings of the ISMIR 2008, 2008

Instrument Equalizer for Query-by-Example Retrieval: Improving Sound Source Separation Based on Integrated Harmonic and Inharmonic Models.
Proceedings of the ISMIR 2008, 2008

Integration and Adaptation of Harmonic and Inharmonic Models for Separating Polyphonic Musical Signals.
Proceedings of the IEEE International Conference on Acoustics, 2007

Automatic Feature Weighting in Automatic Transcription of Specified Part in Polyphonic Music.
Proceedings of the ISMIR 2006, 2006
