Tetsuji Ogawa

Proceedings of the Pattern Recognition - 27th International Conference, 2024

Parody Detection Using Source-Target Attention with Teacher-Forced Lyrics.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Normal with Occasional Anomalies: Feature Extraction for Detecting Non-Stationary Abnormal Events in Wind Turbines.

[BibT_eX]

[DOI]

Proceedings of the 32nd European Signal Processing Conference, 2024

Exploring Robust and Explainable Design for Facial Expression-Based Emotional State Estimation in Children with Profound Intellectual Multiple Disabilities.

[BibT_eX]

[DOI]

Proceedings of the 32nd European Signal Processing Conference, 2024

Differences Between Singer and Speaker Verification: Training Singer Feature Representation Extractor Utilizing Singing Voice Characteristics.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2024

What to Refer and How? - Exploring Handling of Auxiliary Information in Target Speaker Extraction.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2024

2023

Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Yosuke Higuchi

CoRR, 2023

Deep Multi-stream Network for Video-based Calving Sign Detection.

[BibT_eX]

[DOI]

Ryosuke Hyodo

Teppei Nakano

CoRR, 2023

Video Surveillance System Incorporating Expert Decision-making Process: A Case Study on Detecting Calving Signs in Cattle.

[BibT_eX]

[DOI]

CoRR, 2023

Self-Conditioning via Intermediate Predictions for End-to-End Neural Speaker Diarization.

[BibT_eX]

[DOI]

Yusuke Fujita

IEEE Access, 2023

Remixing-based Unsupervised Source Separation from Scratch.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Thermal Gait Dataset for Deep Learning-Oriented Gait Recognition.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2023

Learning Discriminative Feature Representations via Metric Learning for Early Operation of Wind Turbine Anomaly Detection Systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning and Applications, 2023

Masry: A Text-to-Speech System for the Egyptian Arabic.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Informatics in Control, 2023

Conversation-Oriented ASR with Multi-Look-Ahead CBS Architecture.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Self-Remixing: Unsupervised Speech Separation VIA Separation and Remixing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

BECTRA: Transducer-Based End-To-End ASR with Bert-Enhanced Encoder.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Intermpl: Momentum Pseudo-Labeling With Intermediate CTC Loss.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Neural Diarization with Non-Autoregressive Intermediate Attractors.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Mask-CTC-Based Encoder Pre-Training for Streaming End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 31st European Signal Processing Conference, 2023

Voice or Content? - Exploring Impact of Speech Content on Age Estimation from Voice.

[BibT_eX]

[DOI]

Proceedings of the 31st European Signal Processing Conference, 2023

Spotting Parodies: Detecting Alignment Collapse Between Lyrics and Singing Voice.

[BibT_eX]

[DOI]

Proceedings of the 31st European Signal Processing Conference, 2023

A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, And Extraction.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Multi-Source Domain Generalization Using Domain Attributes for Recurrent Neural Network Language Models.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2022

Text-Only Domain Adaptation Based on Intermediate CTC.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Unsupervised Training of Sequential Neural Beamformer Using Coarsely-separated and Non-separated Signals.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Confusion Detection for Adaptive Conversational Strategies of An Oral Proficiency Assessment Interview Agent.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Can Humans Correct Errors From System? Investigating Error Tendencies in Speaker Identification Using Crowdsourcing.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Remix-Cycle-Consistent Learning on Adversarially Learned Separator for Accurate and Stable Unsupervised Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

PostMe: Unsupervised Dynamic Microtask Posting For Efficient and Reliable Crowdsourcing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Big Data, 2022

2021

SIA-GAN: Scrambling Inversion Attack Using Generative Adversarial Network.

[BibT_eX]

[DOI]

IEEE Access, 2021

Analysis of Multimodal Features for Speaking Proficiency Scoring in an Interview Dialogue.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

VocalTurk: Exploring Feasibility of Crowdsourced Speaker Identification.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Efficient and Stable Adversarial Learning Using Unpaired Data for Unsupervised Multichannel Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Improved Mask-CTC for Non-Autoregressive End-to-End ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Scrambling Parameter Generation to Improve Perceptual Information Hiding.

[BibT_eX]

[DOI]

Proceedings of the Human Vision and Electronic Imaging 2021, Virtual Event, January 2021., 2021

An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Comparative Study on DNN-based Minimum Variance Beamforming Robust to Small Movements of Sound Sources.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020

Exploring Effectiveness of Inter-Microtask Qualification Tests in Crowdsourcing.

[BibT_eX]

[DOI]

CoRR, 2020

Block-wise Scrambled Image Recognition Using Adaptation Network.

[BibT_eX]

[DOI]

CoRR, 2020

Mentoring-Reverse Mentoring for Unsupervised Multi-Channel Speech Source Separation.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Crowdsourced Verification for Operating Calving Surveillance Systems at an Early Stage.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Pattern Recognition, 2020

Toward Building a Data-Driven System For Detecting Mounting Actions of Black Beef Cattle.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Pattern Recognition, 2020

Feature Representation Learning for Calving Detection of Cows Using Video Frames.

[BibT_eX]

[DOI]

Ryosuke Hyodo

Teppei Nakano

Proceedings of the 25th International Conference on Pattern Recognition, 2020

Frame-Level Phoneme-Invariant Speaker Embedding for Text-Independent Speaker Recognition on Extremely Short Utterances.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Deep Speech Extraction with Time-Varying Spatial Filtering Guided By Desired Direction Attractor.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Noise-robust Attention Learning for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 28th European Signal Processing Conference, 2020

Investigation of Network Architecture for Single-Channel End-to-End Denoising.

[BibT_eX]

[DOI]

Takuya Hasumi

Proceedings of the 28th European Signal Processing Conference, 2020

Exploiting Narrative Context and A Priori Knowledge of Categories in Textual Emotion Classification.

[BibT_eX]

[DOI]

Proceedings of the 28th International Conference on Computational Linguistics, 2020

Efficient Human-In-The-Loop Object Detection using Bi-Directional Deep SORT and Annotation-Free Segment Identification.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019

SemSeq: A Regime for Training Widely-Applicable Word-Sequence Encoders.

[BibT_eX]

[DOI]

Proceedings of the Computational Linguistics, 2019

Multi-Channel Speech Enhancement Using Time-Domain Convolutional Denoising Autoencoder.

[BibT_eX]

[DOI]

Naohiro Tawara

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Speaker Adversarial Training of DPGMM-Based Feature Extractor for Zero-Resource Languages.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Visual explanation of neural network based rotation machinery anomaly detection system.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Conference on Prognostics and Health Management, 2019

Postfiltering Using an Adversarial Denoising Autoencoder with Noise-aware Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Waseda_Meisei at TRECVID 2018: Ad-hoc Video Search.

[BibT_eX]

[DOI]

Proceedings of the 2018 TREC Video Retrieval Evaluation, 2018

Sequential Fish Catch Forecasting Using Bayesian State Space Models.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Pattern Recognition, 2018

Tandem Connectionist Anomaly Detection: Use of Faulty Vibration Signals in Feature Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Prognostics and Health Management, 2018

Speaker Invariant Feature Extraction for Zero-Resource Languages with Adversarial Learning.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Language Model Domain Adaptation Via Recurrent Neural Networks with Domain-Shared and Domain-Specific Representations.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Adversarial autoencoder for reducing nonlinear distortion.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017

Real-Time Large-Scale Map Matching Using Mobile Phone Data.

[BibT_eX]

[DOI]

Essam Algizawy

Ahmed El-Mahdy

ACM Trans. Knowl. Discov. Data, 2017

Associative Memory Model-Based Linear Filtering and Its Application to Tandem Connectionist Blind Source Separation.

[BibT_eX]

[DOI]

Motoi Omachi

IEEE ACM Trans. Audio Speech Lang. Process., 2017

Waseda_Meisei at TRECVID 2017: Ad-hoc Video Search.

[BibT_eX]

[DOI]

Proceedings of the 2017 TREC Video Retrieval Evaluation, 2017

Exploiting end of sentences and speaker alternations in language modeling for multiparty conversations.

[BibT_eX]

[DOI]

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016

A new efficient measure for accuracy prediction and its application to multistream-based unsupervised adaptation.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Pattern Recognition, 2016

Video semantic indexing using object detection-derived features.

[BibT_eX]

[DOI]

Proceedings of the 24th European Signal Processing Conference, 2016

2015

Bilinear map of filter-bank outputs for DNN-based speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Autoencoder based multi-stream combination for noise robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A comparative study of spectral clustering for i-vector-based speaker clustering under noisy conditions.

[BibT_eX]

[DOI]

Naohiro Tawara

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Towards machines that know when they do not know: Summary of work done at 2014 Frederick Jelinek Memorial Workshop.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Separation matrix optimization using associative memory model for blind source separation.

[BibT_eX]

[DOI]

Proceedings of the 23rd European Signal Processing Conference, 2015

Uncertainty estimation of DNN classifiers.

[BibT_eX]

[DOI]

Hynek Hermansky

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

Effect of frequency weighting on MLP-based speaker canonicalization.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

2013

Blocked Gibbs sampling based multi-scale mixture model for speaker clustering on noisy data.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing, 2013

Stream selection and integration in multistream ASR using GMM-based performance monitoring.

[BibT_eX]

[DOI]

Feipeng Li

Hynek Hermansky

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Integration of MKL-Based and I-Vector-Based Speaker Verification by Short Utterances.

[BibT_eX]

[DOI]

Hideitsu Hino

Proceedings of the 2nd IAPR Asian Conference on Pattern Recognition, 2013

2012

Fully Bayesian speaker clustering based on hierarchically structured utterance-oriented Dirichlet process mixture model.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

An improved entropy-based multiple kernel learning.

[BibT_eX]

[DOI]

Hideitsu Hino

Proceedings of the 21st International Conference on Pattern Recognition, 2012

Fully Bayesian inference of multi-mixture Gaussian model and its evaluation using speaker clustering.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Class-Distance-Based Discriminant Analysis and Its Application to Supervised Automatic Age Estimation.

[BibT_eX]

[DOI]

Kazuya Ueki

IEICE Trans. Inf. Syst., 2011

Speaker Clustering Based on Utterance-Oriented Dirichlet Process Mixture Model.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Spatial Filter Calibration Based on Minimization of Modified LSD.

[BibT_eX]

[DOI]

Nobuaki Tanaka

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Speaker Verification Robust to Talking Style Variation Using Multiple Kernel Learning Based on Conditional Entropy Minimization.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Speaker recognition using multiple kernel learning based on conditional entropy minimization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

2010

Speech Enhancement Using a Square Microphone Array in the Presence of Directional and Diffuse Noise.

[BibT_eX]

[DOI]

IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2010

Development of zonal beamformer and its application to robot audition.

[BibT_eX]

[DOI]

Proceedings of the 18th European Signal Processing Conference, 2010

CENSREC-1-AV: an audio-visual corpus for noisy bimodal speech recognition.

[BibT_eX]

[DOI]

Proceedings of the Auditory-Visual Speech Processing, 2010

2009

Influence of Lombard Effect: Accuracy Analysis of Simulation-Based Assessments of Noisy Speech Recognition Systems for Various Recognition Conditions.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2009

Robot auditory system using head-mounted square microphone array.

[BibT_eX]

[DOI]

Kosuke Hosoya

Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2009

Direction-of-arrival estimation under noisy condition using four-line omni-directional microphones mounted on a robot head.

[BibT_eX]

[DOI]

Proceedings of the 17th European Signal Processing Conference, 2009

2008

Ears of the Robot: Direction of Arrival Estimation Based on Pattern Recognition Using Robot-Mounted Microphones.

[BibT_eX]

[DOI]

Naoya Mochiki

IEICE Trans. Inf. Syst., 2008

CENSREC-4: development of evaluation framework for distant-talking speech recognition under reverberant environments.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Speech enhancement using square microphone array for mobile devices.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

Kernel PCA-based resolution enhancement approach of still images using different levels of pyramid structure.

[BibT_eX]

[DOI]

Miki Haseyama

Proceedings of the IEEE International Conference on Acoustics, 2008

A Kalman filter based restoration method for in-vehicle camera images in foggy conditions.

[BibT_eX]

[DOI]

Tomoki Hiramatsu

Miki Haseyama

Proceedings of the IEEE International Conference on Acoustics, 2008

Ears of the robot: Noise reduction using four-line ultra-micro omni-directional microphones mounted on a robot head.

[BibT_eX]

[DOI]

Proceedings of the 2008 16th European Signal Processing Conference, 2008

2007

Ears of the Robot: Three Simultaneous Speech Segregation and Recognition Using Robot-Mounted Microphones.

[BibT_eX]

[DOI]

Naoya Mochiki

IEICE Trans. Inf. Syst., 2007

Adequacy Analysis of Simulation-Based Assessment of Speech Recognition System.

[BibT_eX]

[DOI]

Satoshi Kanba

Proceedings of the IEEE International Conference on Acoustics, 2007

2006

Genetic Algorithm Based Optimization of Partly-Hidden Markov Model Structure Using Discriminative Criterion.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2006

Manifold HLDA and its application to robust speech recognition.

[BibT_eX]

[DOI]

Toshiaki Kubo

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

A method for solving the permutation problem of frequency-domain BSS using reference signal.

[BibT_eX]

[DOI]

Proceedings of the 14th European Signal Processing Conference, 2006

Source separation using multiple directivity patterns produced by ICA-based BSS.

[BibT_eX]

[DOI]

Proceedings of the 14th European Signal Processing Conference, 2006

2005

An extension of the state-observation dependency in partly hidden Markov models and its application to continuous speech recognition.

[BibT_eX]

[DOI]

Syst. Comput. Jpn., 2005

Extension of Hidden Markov Models for Multiple Candidates and Its Application to Gesture Recognition.

[BibT_eX]

[DOI]

Yosuke Sato

IEICE Trans. Inf. Syst., 2005

Optimizing the structure of partly-hidden Markov models using weighted likelihood-ratio maximization criterion.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

2004

Recognition of three simultaneous utterance of speech by four-line directivity microphone mounted on head of robot.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

2003

Speech recognition of double talk using SAFIA-based audio segregation.

[BibT_eX]

[DOI]

Toshiyuki Sekiya

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Hybrid modeling of PHMM and HMM for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002

Generalization of state-observation-dependency in partly hidden Markov models.

[BibT_eX]

[DOI]