Takuya Yoshioka

Orcid: 0009-0003-7791-3545

According to our database1, Takuya Yoshioka authored at least 159 papers between 2004 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Target conversation extraction: Source separation using turn-taking dynamics.
CoRR, 2024

Knowledge boosting during low-latency inference.
CoRR, 2024

Anatomy of Industrial Scale Multilingual ASR.
CoRR, 2024

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

Diarist: Streaming Speech Translation with Speaker Diarization.
Proceedings of the IEEE International Conference on Acoustics, 2024

T-SOT FNT: Streaming Multi-Talker ASR with Text-Only Domain Adaptation Capability.
Proceedings of the IEEE International Conference on Acoustics, 2024

Profile-Error-Tolerant Target-Speaker Voice Activity Detection.
Proceedings of the IEEE International Conference on Acoustics, 2024

Look Once to Hear: Target Speech Hearing with Noisy Examples.
Proceedings of the CHI Conference on Human Factors in Computing Systems, 2024

2023
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer.
CoRR, 2023

i-Code Studio: A Configurable and Composable Framework for Integrative AI.
CoRR, 2023

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data.
CoRR, 2023

Semantic Hearing: Programming Acoustic Scenes with Binaural Hearables.
Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023

Experimental Demonstration of Fermionic QAOA with One-Dimensional Cyclic Driver Hamiltonian.
Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2023

Speaker Diarization for ASR Output with T-vectors: A Sequence Classification Approach.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Adapting Multi-Lingual ASR Models for Handling Multiple Talkers.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Factual Consistency Oriented Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Real-Time Joint Personalized Speech Enhancement and Acoustic Echo Cancellation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Simulating Realistic Speech Overlaps Improves Multi-Talker ASR.
Proceedings of the IEEE International Conference on Acoustics, 2023

Target Speaker Voice Activity Detection with Transformers and Its Integration with End-To-End Neural Diarization.
Proceedings of the IEEE International Conference on Acoustics, 2023

DATA2VEC-SG: Improving Self-Supervised Learning Representations for Speech Generation Tasks.
Proceedings of the IEEE International Conference on Acoustics, 2023

Real-Time Target Sound Extraction.
Proceedings of the IEEE International Conference on Acoustics, 2023

Breaking the Trade-Off in Personalized Speech Enhancement With Cross-Task Knowledge Distillation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Target Sound Extraction with Variable Cross-Modality Clues.
Proceedings of the IEEE International Conference on Acoustics, 2023

Vararray Meets T-Sot: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Self-Supervised Learning with Bi-Label Masked Speech Prediction for Streaming Multi-Talker Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Speech Separation with Large-Scale Self-Supervised Learning.
Proceedings of the IEEE International Conference on Acoustics, 2023

i-Code: An Integrative and Composable Multimodal Learning Framework.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.
IEEE J. Sel. Top. Signal Process., 2022

Breaking trade-offs in speech separation with sparsely-gated mixture of experts.
CoRR, 2022

Real-Time Joint Personalized Speech Enhancement and Acoustic Echo Cancellation with E3Net.
CoRR, 2022

Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization.
CoRR, 2022

Exploring WavLM on Speech Enhancement.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Separating Long-Form Speech with Group-wise Permutation Invariant Training.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Fast Real-time Personalized Speech Enhancement: End-to-End Enhancement Network (E3Net) and Knowledge Distillation.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Multi-Talker ASR with Token-Level Serialized Output Training.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

All-Neural Beamformer for Continuous Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2022

Continuous Speech Separation with Recurrent Selective Attention Network.
Proceedings of the IEEE International Conference on Acoustics, 2022

VarArray: Array-Geometry-Agnostic Continuous Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2022

Picknet: Real-Time Channel Selection for Ad Hoc Microphone Arrays.
Proceedings of the IEEE International Conference on Acoustics, 2022

Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction.
Proceedings of the IEEE International Conference on Acoustics, 2022

One Model to Enhance Them All: Array Geometry Agnostic Multi-Channel Personalized Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2022

Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR.
Proceedings of the IEEE International Conference on Acoustics, 2022

Personalized speech enhancement: new models and Comprehensive evaluation.
Proceedings of the IEEE International Conference on Acoustics, 2022

Icassp 2022 Deep Noise Suppression Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.
CoRR, 2021

Exploring End-to-End Multi-Channel ASR with Bias Information for Meeting Transcription.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Dual-Path RNN for Long Recording Speech Separation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Investigation of End-to-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Investigation of Practical Aspects of Single Channel Speech Separation for ASR.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-to-End Speaker-Attributed ASR with Transformer.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Ultra Fast Speech Separation Model with Teacher Student Learning.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.
Proceedings of the IEEE International Conference on Acoustics, 2021

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR.
Proceedings of the IEEE International Conference on Acoustics, 2021

Continuous Speech Separation with Conformer.
Proceedings of the IEEE International Conference on Acoustics, 2021

Don't Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer.
Proceedings of the IEEE International Conference on Acoustics, 2021

Hypothesis Stitcher for End-to-End Speaker-Attributed ASR on Long-Form Multi-Talker Recordings.
Proceedings of the IEEE International Conference on Acoustics, 2021

Continuous Speech Separation with Ad Hoc Microphone Arrays.
Proceedings of the 29th European Signal Processing Conference, 2021

A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer.
CoRR, 2020

Continuous speech separation: dataset and analysis.
CoRR, 2020

An End-to-End Architecture of Online Multi-Channel Speech Separation.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Neural Speech Separation Using Spatially Distributed Microphones.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Serialized Output Training for End-to-End Overlapped Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Dual-Path RNN: Efficient Long Sequence Modeling for Time-Domain Single-Channel Speech Separation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Continuous Speech Separation: Dataset and Analysis.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Meeting Transcription Using Virtual Microphone Arrays.
CoRR, 2019

Meeting Transcription Using Asynchronous Distant Microphones.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Low-latency Speaker-independent Continuous Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2019

Single-channel Speech Extraction Using Speaker Inventory and Attention Network.
Proceedings of the IEEE International Conference on Acoustics, 2019


Speech Separation Using Speaker Inventory.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Dover: A Method for Combining Diarization Outputs.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
Multi-Channel Overlapped Speech Recognition with Location Guided Speech Extraction Network.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Investigations on Data Augmentation and Loss Functions for Deep Learning Based Speech-Background Separation.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Multi-Microphone Neural Speech Separation for Far-Field Multi-Talker Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Efficient Integration of Fixed Beamformers and Speech Separation Networks for Multi-Channel Far-Field Speech Separation.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Exploring Practical Aspects of Neural Mask-Based Beamforming for Far-Field Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Online MVDR Beamformer Based on Complex Gaussian Mixture Model With Spatial Prior for Noise Robust ASR.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Unsupervised utterance-wise beamformer estimation with speech recognition-level criterion.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Online meeting recognition in noisy environments with time-frequency mask based MVDR beamforming.
Proceedings of the Hands-free Speech Communications and Microphone Arrays, 2017

Cracking the cocktail party problem by multi-beam deep attractor network.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

The REVERB Challenge: A Benchmark Task for Reverberation-Robust ASR Techniques.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

Multichannel Speech Enhancement Approaches to DNN-Based Far-Field Speech Recognition.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016
A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research.
EURASIP J. Adv. Signal Process., 2016

Sparseness-based multichannel nonnegative matrix factorization for blind source separation.
Proceedings of the IEEE International Workshop on Acoustic Signal Enhancement, 2016

Robust Example Search Using Bottleneck Features for Example-Based Speech Enhancement.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Optimization of Speech Enhancement Front-End with Speech Recognition-Level Criterion.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Context Adaptive Neural Network for Rapid Adaptation of Deep CNN Based Acoustic Models.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Noise robust speech recognition using recent developments in neural networks for computer vision.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Robust MVDR beamforming using time-frequency masks for online/offline ASR in noise.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Context adaptive deep neural networks for fast acoustic model adaptation in noisy conditions.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
Strategies for distant speech recognitionin reverberant environments.
EURASIP J. Adv. Signal Process., 2015

Environmentally robust ASR front-end for deep neural network acoustic models.
Comput. Speech Lang., 2015

Robust i-vector extraction for neural network adaptation in noisy environment.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Far-field speech recognition using CNN-DNN-HMM with convolution in time.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014
Multichannel sound source dereverberation and separation for arbitrary number of sources based on Bayesian nonparametrics.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

Relaxed disjointness based clustering for joint blind source separation and dereverberation.
Proceedings of the 14th International Workshop on Acoustic Signal Enhancement, 2014

Investigation of unsupervised adaptation of DNN acoustic models with filter bank input.
Proceedings of the IEEE International Conference on Acoustics, 2014

Impact of single-microphone dereverberation on DNN-based meeting transcription systems.
Proceedings of the IEEE International Conference on Acoustics, 2014

Defeating reverberation: Advanced dereverberation and recognition techniques for hands-free speech recognition.
Proceedings of the 2014 IEEE Global Conference on Signal and Information Processing, 2014

2013
Noise Model Transfer: Novel Approach to Robustness Against Nonstationary Noise.
IEEE Trans. Speech Audio Process., 2013

Feature Enhancement With Joint Use of Consecutive Corrupted and Noise Feature Vectors With Discriminative Region Weighting.
IEEE Trans. Speech Audio Process., 2013

Dominance Based Integration of Spatial and Spectral Features for Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2013

Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds.
Comput. Speech Lang., 2013

The reverb challenge: Acommon evaluation framework for dereverberation and recognition of reverberant speech.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013

Conditional emission densities for combining speech enhancement and recognition systems.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Formulation of the REMOS concept from an uncertainty decoding perspective.
Proceedings of the 18th International Conference on Digital Signal Processing, 2013

Noise model transfer using affine transformation with application to large vocabulary reverberant speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Coupling beamforming with spatial and spectral feature based spectral enhancement and its application to meeting recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Dereverberation for reverberation-robust microphone arrays.
Proceedings of the 21st European Signal Processing Conference, 2013

2012
Generalization of Multi-Channel Linear Prediction Methods for Blind MIMO Impulse Response Shortening.
IEEE Trans. Speech Audio Process., 2012

Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera.
IEEE Trans. Speech Audio Process., 2012

Making Machines Understand Us in Reverberant Rooms: Robustness Against Reverberation for Automatic Speech Recognition.
IEEE Signal Process. Mag., 2012

Noise Power Spectral Density Tracking: A Maximum Likelihood Perspective.
IEEE Signal Process. Lett., 2012

Log-normal matrix factorization with application to speech-music separation.
Proceedings of the ISCA Workshop on Statistical And Perceptual Audition, 2012

Time-varying residual noise feature model estimation for multi-microphone speech recognition.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

LogMax observation model with MFCC-based spectral prior for reduction of highly nonstationary ambient noise.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Survey on approaches to speech recognition in reverberant environments.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

2011
Blind Separation and Dereverberation of Speech Mixtures by Joint Optimization.
IEEE Trans. Speech Audio Process., 2011

Reduction of Highly Nonstationary Ambient Noise by Integrating Spectral and Locational Characteristics of Speech and Noise for Robust ASR.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Speech enhancement based on log spectral envelope model and harmonicity-derived spectral mask, and its coupling with feature compensation.
Proceedings of the IEEE International Conference on Acoustics, 2011

I-Divergence-based dereverberation method with auxiliary function approach.
Proceedings of the IEEE International Conference on Acoustics, 2011

Joint unsupervised learning of hidden Markov source models and source location models for multichannel source separation.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction.
IEEE Trans. Speech Audio Process., 2010

Real-time meeting recognition and understanding using distant microphones and omni-directional camera.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Multichannel source separation based on source location cue with log-spectral shaping by hidden Markov source model.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Noisy speech enhancement based on prior knowledge about spectral envelope and harmonic structure.
Proceedings of the IEEE International Conference on Acoustics, 2010

Music dereverberation using harmonic structure source model and Wiener filter.
Proceedings of the IEEE International Conference on Acoustics, 2010

Statistical Model of Speech Signals Based on Composite Autoregressive System with Application to Blind Source Separation.
Proceedings of the Latent Variable Analysis and Signal Separation, 2010

Inverse Filtering for Speech Dereverberation Without the Use of Room Acoustics Information.
Proceedings of the Speech Dereverberation., 2010

2009
Integrated Speech Enhancement Method Using Noise Suppression and Dereverberation.
IEEE Trans. Speech Audio Process., 2009

Statistical models for speech dereverberation.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2009

Adaptive dereverberation of speech signals with speaker-position change detection.
Proceedings of the IEEE International Conference on Acoustics, 2009

Real-time speech enhancement in noisy reverberant multi-talker environments based on a location-independent room acoustics model.
Proceedings of the IEEE International Conference on Acoustics, 2009

Robust speech dereverberation based on non-negativity and sparse nature of speech spectrograms.
Proceedings of the IEEE International Conference on Acoustics, 2009

Fast algorithm for conditional separation and dereverberation.
Proceedings of the 17th European Signal Processing Conference, 2009

2008
Speech Dereverberation Based on Maximum-Likelihood Estimation With Time-Varying Gaussian Source Model.
IEEE Trans. Speech Audio Process., 2008

Maximum likelihood approach to speech enhancement for noisy reverberant signals.
Proceedings of the IEEE International Conference on Acoustics, 2008

Adaptive suppression of non-stationary noise by using the variational Bayesian method.
Proceedings of the IEEE International Conference on Acoustics, 2008

Blind speech dereverberation with multi-channel linear prediction based on short time fourier transform representation.
Proceedings of the IEEE International Conference on Acoustics, 2008

An integrated method for blind separation and dereverberation of convolutive audio mixtures.
Proceedings of the 2008 16th European Signal Processing Conference, 2008

Principles and applications of dereverberation for noisy and reverberant audio signals.
Proceedings of the 42nd Asilomar Conference on Signals, Systems and Computers, 2008

2007
Dereverberation by Using Time-Variant Nature of Speech Production System.
EURASIP J. Adv. Signal Process., 2007

Robust blind dereverberation of speech signals based on characteristics of short-time speech segments.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2007), 2007

Study on Speech Dereverberation with Autocorrelation Codebook.
Proceedings of the IEEE International Conference on Acoustics, 2007

2006
Common Acoustical Pole Estimation from Multi-Channel Musical Audio Signals.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2006

Robust decomposition of inverse filter of channel and prediction error filter of speech signal for dereverberation.
Proceedings of the 14th European Signal Processing Conference, 2006

2004
Automatic Chord Transcription with Concurrent Recognition of Chord Symbols and Boundaries.
Proceedings of the ISMIR 2004, 2004


  Loading...