Hirokazu Kameoka

Orcid: 0000-0003-3102-0162

According to our database1, Hirokazu Kameoka authored at least 193 papers between 2004 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
VoiceGrad: Non-Parallel Any-to-Many Voice Conversion With Annealed Langevin Dynamics.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation.
CoRR, 2024

GE2E-AC: Generalized End-to-End Loss Training for Accent Classification.
CoRR, 2024

Selecting N-Lowest Scores for Training MOS Prediction Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator.
Proceedings of the IEEE International Conference on Acoustics, 2024

Learning to Assess Subjective Impressions from Speech.
Proceedings of the 32nd European Signal Processing Conference, 2024

2023
FastMVAE2: On Improving and Accelerating the Fast Variational Autoencoder-Based Source Separation Algorithm for Determined Mixtures.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Non-Parallel Whisper-to-Normal Speaking Style Conversion Using Auxiliary Classifier Variational Autoencoder.
IEEE Access, 2023

PRVAE-VC: Non-Parallel Many-to-Many Voice Conversion with Perturbation-Resistant Variational Autoencoder.
Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

CFVC: Conditional Filtering for Controllable Voice Conversion.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

JSV-VC: Jointly Trained Speaker Verification and Voice Conversion Models.
Proceedings of the IEEE International Conference on Acoustics, 2023

Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2023

W2N-AVSC: Audiovisual Extension For Whisper-To-Normal Speech Conversion.
Proceedings of the 31st European Signal Processing Conference, 2023

DisC-VC: Disentangled and F0-Controllable Neural Voice Conversion.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

Multiple Sound Source Tracking Based on Generative Modeling and Recursive Bayesian Filtering of Spatial Gradient Spectra.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022
Speak Like a Dog: Human to Non-human creature Voice Conversion.
CoRR, 2022

Application of non-negative matrix factorization in oncology: one approach for establishing precision medicine.
Briefings Bioinform., 2022

Distilling Sequence-to-Sequence Voice Conversion Models for Streaming Conversion Applications.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

MISRNet: Lightweight Neural Vocoder Using Multi-Input Single Shared Residual Blocks.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

CAUSE: Crossmodal Action Unit Sequence Estimation from Speech.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Investigation And Comparison of Optimization Methods for Variational Autoencoder-Based Underdetermined Multichannel Source Separation.
Proceedings of the IEEE International Conference on Acoustics, 2022

HBP: An Efficient Block Permutation Solver Using Hungarian Algorithm and Spectrogram Inpainting for Multichannel Audio Source Separation.
Proceedings of the IEEE International Conference on Acoustics, 2022

ISTFTNET: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform.
Proceedings of the IEEE International Conference on Acoustics, 2022

Attentionpit: Soft Permutation Invariant Training for Audio Source Separation with Attention Mechanism.
Proceedings of the IEEE International Conference on Acoustics, 2022

Multiple Sound Source Localization Based on Stochastic Modeling of Spatial Gradient Spectra.
Proceedings of the 30th European Signal Processing Conference, 2022

2021
Harmonic-Temporal Factor Decomposition for Unsupervised Monaural Separation of Harmonic Sounds.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Many-to-Many Voice Transformer Network.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Pretraining Techniques for Sequence-to-Sequence Voice Conversion.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

X-DC: Explainable Deep Clustering Based on Learnable Spectrogram Templates.
Neural Comput., 2021

FastS2S-VC: Streaming Non-Autoregressive Sequence-to-Sequence Voice Conversion.
CoRR, 2021

StarGAN-VC+ASR: StarGAN-Based Non-Parallel Voice Conversion Regularized by Automatic Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Maskcyclegan-VC: Learning Non-Parallel Voice Conversion with Filling in Frames.
Proceedings of the IEEE International Conference on Acoustics, 2021

SepNet: A Deep Separation Matrix Prediction Network for Multichannel Audio Source Separation.
Proceedings of the IEEE International Conference on Acoustics, 2021

StarGAN-based Emotional Voice Conversion for Japanese Phrases.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020
ConvS2S-VC: Fully Convolutional Sequence-to-Sequence Voice Conversion.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Nonparallel Voice Conversion With Augmented Classifier Star Generative Adversarial Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Majorization-Minimization Algorithm for Discriminative Non-Negative Matrix Factorization.
IEEE Access, 2020

FastMVAE: A Fast Optimization Algorithm for the Multichannel Variational Autoencoder Method.
IEEE Access, 2020

Determined Audio Source Separation with Multichannel Star Generative Adversarial Network.
Proceedings of the 30th IEEE International Workshop on Machine Learning for Signal Processing, 2020

CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-Spectrogram Conversion.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Phoneme Embeddings on Predicting Fundamental Frequency Pattern for Electrolaryngeal Speech.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019
ACVAE-VC: Non-Parallel Voice Conversion With Auxiliary Classifier Variational Autoencoder.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Supervised Determined Source Separation with Multichannel Variational Autoencoder.
Neural Comput., 2019

Independent Low-Rank Matrix Analysis Based on Generalized Kullback-Leibler Divergence.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2019

The ASVspoof 2019 database.
CoRR, 2019

Crossmodal Voice Conversion.
CoRR, 2019

WaveCycleGAN2: Time-domain Neural Post-filter for Speech Waveform Generation.
CoRR, 2019

Training a Neural Speech Waveform Model using Spectral Losses of Short-Time Fourier Transform and Continuous Wavelet Transform.
CoRR, 2019

Underdetermined Source Separation Based on Generalized Multichannel Variational Autoencoder.
IEEE Access, 2019

An Investigation of Features for Fundamental Frequency Pattern Prediction in Electrolaryngeal Speech Enhancement.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

A Modified Algorithm for Multiple Input Spectrogram Inversion.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

ATTS2S-VC: Sequence-to-sequence Voice Conversion with Attention and Context Preservation Mechanisms.
Proceedings of the IEEE International Conference on Acoustics, 2019

Fast MVAE: Joint Separation and Classification of Mixed Sources Based on Multichannel Variational Autoencoder with Auxiliary Classifier.
Proceedings of the IEEE International Conference on Acoustics, 2019

Cyclegan-VC2: Improved Cyclegan-based Non-parallel Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2019

Seeing through Sounds: Predicting Visual Semantic Segmentation Results from Multichannel Audio Signals.
Proceedings of the IEEE International Conference on Acoustics, 2019

Joint Separation and Dereverberation of Reverberant Mixtures with Multichannel Variational Autoencoder.
Proceedings of the IEEE International Conference on Acoustics, 2019

Generalized Multichannel Variational Autoencoder for Underdetermined Source Separation.
Proceedings of the 27th European Signal Processing Conference, 2019

2018
Nonnegative Matrix Factorization With Basis Clustering Using Cepstral Distance Regularization.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

ConvS2S-VC: Fully convolutional sequence-to-sequence voice conversion.
CoRR, 2018

WaveCycleGAN: Synthetic-to-natural speech waveform conversion using cycle-consistent adversarial networks.
CoRR, 2018

ACVAE-VC: Non-parallel many-to-many voice conversion with auxiliary classifier variational autoencoder.
CoRR, 2018

Semi-blind source separation with multichannel variational autoencoder.
CoRR, 2018

StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks.
CoRR, 2018

Generative adversarial network-based approach to signal reconstruction from magnitude spectrograms.
CoRR, 2018

Synthetic-to-Natural Speech Waveform Conversion Using Cycle-Consistent Adversarial Networks.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

StarGAN-VC: non-parallel many-to-many Voice Conversion Using Star Generative Adversarial Networks.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Vae-Space: Deep Generative Model of Voice Fundamental Frequency Contours.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Deep Clustering with Gated Convolutional Networks.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Joint Separation and Dereverberation of Reverberant Mixtures with Determined Multichannel Non-Negative Matrix Factorization.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Speech Waveform Synthesis from MFCC Sequences with Generative Adversarial Networks.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Generative adversarial network-based approach to signal reconstruction from magnitude spectrogram.
Proceedings of the 26th European Signal Processing Conference, 2018

CycleGAN-VC: Non-parallel Voice Conversion Using Cycle-Consistent Adversarial Networks.
Proceedings of the 26th European Signal Processing Conference, 2018

Automatic Speech Pronunciation Correction with Dynamic Frequency Warping-Based Spectral Conversion.
Proceedings of the 26th European Signal Processing Conference, 2018

2017
Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks.
CoRR, 2017

Missing component restoration for masked speech signals based on time-domain spectrogram factorization.
Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing, 2017

Mel-Generalized cepstral regularization for discriminative non-negative matrix factorization.
Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing, 2017

Physically Constrained Statistical F<sub>0</sub> Prediction for Electrolaryngeal Speech Enhancement.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Direct Modeling of Frequency Spectra and Waveform Generation Based on Phase Recovery for DNN-Based Speech Synthesis.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Speech Enhancement Using Non-Negative Spectrogram Models with Mel-Generalized Cepstral Regularization.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Generative Adversarial Network-Based Postfilter for STFT Spectrograms.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Sequence-to-Sequence Voice Conversion with Similarity Metric Learned Using Generative Adversarial Networks.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

DNN-SPACE: DNN-HMM-Based Generative Model of Voice F<sub>0</sub> Contours for Statistical Phrase/Accent Command Estimation.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A noise suppression method for body-conducted soft speech based on non-negative tensor factorization of air- and body-conducted signals.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Fast algorithm for statistical phrase/accent command estimation based on generative model incorporating spectral features.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Generative adversarial network-based postfilter for statistical parametric speech synthesis.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Complex NMF with the generalized Kullback-Leibler divergence.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

A majorization-minimization algorithm with projected gradient updates for time-domain spectrogram factorization.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Discriminative non-negative matrix factorization with majorization-minimization.
Proceedings of the Hands-free Speech Communications and Microphone Arrays, 2017

Deep acoustic-to-articulatory inversion mapping with latent trajectory modeling.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Non-native speech conversion with consistency-aware recursive network and generative adversarial network.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016
Determined Blind Source Separation Unifying Independent Vector Analysis and Nonnegative Matrix Factorization.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Acoustic-to-Articulatory Inversion Mapping Based on Latent Trajectory Gaussian Mixture Model.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Semi-Supervised Joint Enhancement of Spectral and Cepstral Sequences of Noisy Speech.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Majorisation-Minimisation Based Optimisation of the Composite Autoregressive System with Application to Glottal Inverse Filtering.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Statistical F0 prediction for electrolaryngeal speech enhancement considering generative process of F0 contours within product of experts framework.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Shifted and convolutive source-filter non-negative matrix factorization for monaural audio source separation.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Sparse sound field decomposition with multichannel extension of complex NMF.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Reverberation-robust underdetermined source separation with non-negative tensor double deconvolution.
Proceedings of the 24th European Signal Processing Conference, 2016

Self-localization and channel synchronization of smartphone arrays using sound emissions.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Non-negative periodic component analysis for music source separation.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015
Optimal Coding of Generalized-Gaussian-Distributed Frequency Spectra for Low-Delay Audio Coder With Powered All-Pole Spectrum Estimation.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Resolution Warped Spectral Representation for Low-Delay and Low-Bit-Rate Audio Coder.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Multichannel Signal Separation Combining Directional Clustering and Nonnegative Matrix Factorization with Spectrogram Restoration.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Generative Modeling of Voice Fundamental Frequency Contours.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Modeling speech parameter sequences with latent trajectory Hidden Markov model.
Proceedings of the 25th IEEE International Workshop on Machine Learning for Signal Processing, 2015

L<sub>p</sub>-norm non-negative matrix factorization and its application to singing voice enhancement.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Efficient multichannel nonnegative matrix factorization exploiting rank-1 spatial model.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Multi-resolution signal decomposition with time-domain spectrogram factorization.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Relaxation of rank-1 spatial constraint in overdetermined blind source separation.
Proceedings of the 23rd European Signal Processing Conference, 2015

Unified approach for audio source separation with multichannel factorial HMM and DOA mixture model.
Proceedings of the 23rd European Signal Processing Conference, 2015

2014
Harmonic/percussive sound separation based on anisotropic smoothness of spectrograms.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

Maximum reconstruction probability training of Restricted Boltzmann machines with auxiliary function approach.
Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing, 2014

Training Restricted Boltzmann Machines with auxiliary function approach.
Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing, 2014

Joint audio source separation and dereverberation based on multichannel factorial hidden Markov model.
Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing, 2014

Harmonic-Temporal Factor Decomposition Incorporating Music Prior Information for Informed Monaural Source Separation.
Proceedings of the 15th International Society for Music Information Retrieval Conference, 2014

Speech prosody generation for text-to-speech synthesis based on generative model of F<sub>0</sub> contours.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A unified approach for underdetermined blind signal separation and source activity detection by multichannel factorial hidden Markov models.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Mixture of Gaussian process experts for predicting sung melodic contour with expressive dynamic fluctuations.
Proceedings of the IEEE International Conference on Acoustics, 2014

Mondrian hidden Markov model for music signal processing.
Proceedings of the IEEE International Conference on Acoustics, 2014

Timbre replacement of harmonic and drum components for music audio signals.
Proceedings of the IEEE International Conference on Acoustics, 2014

Underdetermined blind separation and tracking of moving sources based ONDOA-HMM.
Proceedings of the IEEE International Conference on Acoustics, 2014

Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation.
Proceedings of the 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays, 2014

Golomb-rice coding optimized via LPC for frequency domain audio coder.
Proceedings of the 2014 IEEE Global Conference on Signal and Information Processing, 2014

Unified approach for underdetermined BSS, VAD, dereverberation and DOA estimation with multichannel factorial HMM.
Proceedings of the 2014 IEEE Global Conference on Signal and Information Processing, 2014

Direct linear conversion of LSP parameters for perceptual control in speech and audio coding.
Proceedings of the 22nd European Signal Processing Conference, 2014

Representation of spectral envelope with warped frequency resolution for audio coder.
Proceedings of the 22nd European Signal Processing Conference, 2014

Fast Signal Reconstruction from Magnitude Spectrogram of Continuous Wavelet Transform Based on Spectrogram Consistency.
Proceedings of the 17th International Conference on Digital Audio Effects, 2014

Hybrid multichannel signal separation using supervised nonnegative matrix factorization with spectrogram restoration.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

2013
Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data.
IEEE Trans. Speech Audio Process., 2013

Input-Output HMM Applied to Automatic Arrangement for Guitars.
J. Inf. Process., 2013

Designing Various Multivariate Analysis at Will via Generalized Pairwise Expression.
Inf. Media Technol., 2013

SemiCCA: Efficient Semi-supervised Learning of Canonical Correlations.
Inf. Media Technol., 2013

Bayesian Nonparametric Approach to Blind Separation of Infinitely Many Sparse Sources.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2013

Text-to-speech synthesizer based on combination of composite wavelet and hidden Markov models.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Generative modeling of speech F<sub>0</sub> contours.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Probabilistic speech F<sub>0</sub> contour model incorporating statistical vocabulary model of phrase-accent command sequence.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Bayesian semi-supervised audio event transcription based on Markov indian buffet process.
Proceedings of the IEEE International Conference on Acoustics, 2013

Probabilistic model of two-dimensional rhythm tree structure representation for automatic transcription of polyphonic MIDI signals.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

2012
Blind Separation of Infinitely Many Sparse Sources.
Proceedings of the IWAENC 2012 - International Workshop on Acoustic Signal Enhancement, Proceedings, RWTH Aachen University, Germany, September 4th, 2012

Context-free 2D Tree Structure Model of Musical Notes for Bayesian Modeling of Polyphonic Spectrograms.
Proceedings of the 13th International Society for Music Information Retrieval Conference, 2012

Hidden Markov Convolutive Mixture Model for Pitch Contour Analysis of Speech.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

A Stochastic Model of Singing Voice F0 Contours for Characterizing Expressive Dynamic Components.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Designing various component analysis at will.
Proceedings of the 21st International Conference on Pattern Recognition, 2012

Comparative evaluations of various harmonic/percussive sound separation algorithms based on anisotropic continuity of spectrogram.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Efficient algorithms for multichannel extensions of Itakura-Saito nonnegative matrix factorization.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Explicit beat structure modeling for non-negative matrix factorization-based multipitch analysis.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Bayesian nonparametric music parser.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Constrained and regularized variants of non-negative matrix factorization incorporating music-specific constraints.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
Computational auditory induction as a missing-data model-fitting problem with Bregman divergence.
Speech Commun., 2011

New formulations and efficient algorithms for multichannel NMF.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2011

Bayesian nonparametric spectrogram modeling based on infinite factorial infinite hidden Markov model.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2011

I-Divergence-based dereverberation method with auxiliary function approach.
Proceedings of the IEEE International Conference on Acoustics, 2011

Automatic audio tag classification via semi-supervised canonical density estimation.
Proceedings of the IEEE International Conference on Acoustics, 2011

Formulations and algorithms for multichannel complex NMF.
Proceedings of the IEEE International Conference on Acoustics, 2011

Infinite-state spectrum model for music signal analysis.
Proceedings of the IEEE International Conference on Acoustics, 2011

Automatic video annotation via Hierarchical Topic Trajectory Model considering cross-modal correlations.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
Harmonic and Percussive Sound Separation and Its Application to MIR-Related Tasks.
Proceedings of the Advances in Music Information Retrieval, 2010

Speech Spectrum Modeling for Joint Estimation of Spectral Envelope and Fundamental Frequency.
IEEE Trans. Speech Audio Process., 2010

SEMANTIC INDEXING AND KNOWN ITEM SEARCH BASED ON A UNIFIED MODEL WITH TOPIC TRANSITION REPRESENTATION.
Proceedings of the TRECVID 2010 workshop participants notebook papers, 2010

Statistical modeling of F0 dynamics in singing voices based on Gaussian processes with multiple oscillation bases.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

A statistical model of speech F0 contours.
Proceedings of the ISCA Workshop on Statistical And Perceptual Audition, 2010

A sparse component model of source signals and its application to blind source separation.
Proceedings of the IEEE International Conference on Acoustics, 2010

Consistent Wiener Filtering: Generalized Time-Frequency Masking Respecting Spectrogram Consistency.
Proceedings of the Latent Variable Analysis and Signal Separation, 2010

Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms.
Proceedings of the Latent Variable Analysis and Signal Separation, 2010

Statistical Model of Speech Signals Based on Composite Autoregressive System with Application to Blind Source Separation.
Proceedings of the Latent Variable Analysis and Signal Separation, 2010

2009
Statistical models for speech dereverberation.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2009

Automatic Identification for Singing Style based on Sung Melodic Contour Characterized in Phase Plane.
Proceedings of the 10th International Society for Music Information Retrieval Conference, 2009

Composite Autoregressive System for Sparse Source-filter Representation of speech.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2009), 2009

Complex NMF: A new sparse representation for acoustic signals.
Proceedings of the IEEE International Conference on Acoustics, 2009

Robust speech dereverberation based on non-negativity and sparse nature of speech spectrograms.
Proceedings of the IEEE International Conference on Acoustics, 2009

2008
Specmurt Analysis of Polyphonic Music Signals.
IEEE Trans. Speech Audio Process., 2008

A Real-time Equalizer of Harmonic and Percussive Components in Music Signals.
Proceedings of the ISMIR 2008, 2008

Computational auditory induction by missing-data non-negative matrix factorization.
Proceedings of the ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition, 2008

Parameter estimation method of F0 control model for singing voices.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Modulation analysis of speech through orthogonal FIR filterbank optimization.
Proceedings of the IEEE International Conference on Acoustics, 2008

Harmonic-Temporal-Timbral Clustering (HTTC) for the analysis of multi-instrument polyphonic music signals.
Proceedings of the IEEE International Conference on Acoustics, 2008

Auxiliary function approach to parameter estimation of constrained sinusoidal model for monaural speech separation.
Proceedings of the IEEE International Conference on Acoustics, 2008

Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram.
Proceedings of the 2008 16th European Signal Processing Conference, 2008

2007
Single and Multiple F<sub>0</sub> Contour Estimation Through Parametric Spectrogram Modeling of Speech in Noisy Environments.
IEEE Trans. Speech Audio Process., 2007

A Multipitch Analyzer Based on Harmonic Temporal Structured Clustering.
IEEE Trans. Speech Audio Process., 2007

Automatic Decision of Piano Fingering Based on a Hidden Markov Models.
Proceedings of the IJCAI 2007, 2007

Harmonic-Temporal Clustering of Speech for Single and Multiple F0 Contour Estimation in Noisy Environments.
Proceedings of the IEEE International Conference on Acoustics, 2007

Probabilistic Approach to Automatic Music Transcription from Audio Signals.
Proceedings of the IEEE International Conference on Acoustics, 2007

2006
Speech analyzer using a joint estimation model of spectral envelope and fine structure.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

2005
Specmurt Analysis of Multi-Pitch Music Signals with Adaptive Estimation of Common Harmonic Structure .
Proceedings of the ISMIR 2005, 2005

Harmonic-Temporal Clustering via Deterministic Annealing EM Algorithm for Audio Feature Extraction.
Proceedings of the ISMIR 2005, 2005

Audio stream segregation of multi-pitch music signal based on time-space clustering using Gaussian kernel 2-dimensional model.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004
Specmurt anasylis: a piano-roll-visualization of polyphonic music signal by deconvolution of log-frequency spectrum.
Proceedings of the ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing, 2004

Multi-pitch trajectory estimation of concurrent speech based on harmonic GMM and nonlinear kalman filtering.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Separation of harmonic structures based on tied Gaussian mixture model and information criterion for concurrent sounds.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004


  Loading...