Tetsunori Kobayashi

According to our database1, Tetsunori Kobayashi authored at least 208 papers between 1982 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 


On csauthors.net:


Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems.
CoRR, 2024

Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition.
CoRR, 2023

Self-Conditioning via Intermediate Predictions for End-to-End Neural Speaker Diarization.
IEEE Access, 2023

Improving the response timing estimation for spoken dialogue systems by reducing the effect of speech recognition delay.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Conversation-Oriented ASR with Multi-Look-Ahead CBS Architecture.
Proceedings of the IEEE International Conference on Acoustics, 2023

BECTRA: Transducer-Based End-To-End ASR with Bert-Enhanced Encoder.
Proceedings of the IEEE International Conference on Acoustics, 2023

Intermpl: Momentum Pseudo-Labeling With Intermediate CTC Loss.
Proceedings of the IEEE International Conference on Acoustics, 2023

Mask-CTC-Based Encoder Pre-Training for Streaming End-to-End Speech Recognition.
Proceedings of the 31st European Signal Processing Conference, 2023

A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, And Extraction.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Multi-Source Domain Generalization Using Domain Attributes for Recurrent Neural Network Language Models.
IEICE Trans. Inf. Syst., 2022

Response Timing Estimation for Spoken Dialog Systems Based on Syntactic Completeness Prediction.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Response Timing Estimation for Spoken Dialog System using Dialog Act Estimation.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Confusion Detection for Adaptive Conversational Strategies of An Oral Proficiency Assessment Interview Agent.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units.
Proceedings of the IEEE International Conference on Acoustics, 2022

BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Phrase-Level Localization of Inconsistency Errors in Summarization by Weak Supervision.
Proceedings of the 29th International Conference on Computational Linguistics, 2022

PostMe: Unsupervised Dynamic Microtask Posting For Efficient and Reliable Crowdsourcing.
Proceedings of the IEEE International Conference on Big Data, 2022

Personalized Extractive Summarization for a News Dialogue System.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Analysis of Multimodal Features for Speaking Proficiency Scoring in an Interview Dialogue.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Personalized Extractive Summarization with Discourse Structure Constraints Towards Efficient and Coherent Dialog-Based News Delivery.
Proceedings of the Conversational AI for Natural Human-Centric Interaction, 2021

A WoZ Study for an Incremental Proficiency Scoring Interview Agent Eliciting Ratable Samples.
Proceedings of the Conversational AI for Natural Human-Centric Interaction, 2021

Efficient and Stable Adversarial Learning Using Unpaired Data for Unsupervised Multichannel Speech Separation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Timing Generating Networks: Neural Network Based Precise Turn-Taking Timing Prediction in Multiparty Conversation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Improved Mask-CTC for Non-Autoregressive End-to-End ASR.
Proceedings of the IEEE International Conference on Acoustics, 2021

An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Comparative Study on DNN-based Minimum Variance Beamforming Robust to Small Movements of Sound Sources.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Exploring Effectiveness of Inter-Microtask Qualification Tests in Crowdsourcing.
CoRR, 2020

Word Attribute Prediction Enhanced by Lexical Entailment Tasks.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Mentoring-Reverse Mentoring for Unsupervised Multi-Channel Speech Source Separation.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Exploring and Exploiting the Hierarchical Structure of a Scene for Scene Graph Generation.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

Deep Speech Extraction with Time-Varying Spatial Filtering Guided By Desired Direction Attractor.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Noise-robust Attention Learning for End-to-End Speech Recognition.
Proceedings of the 28th European Signal Processing Conference, 2020

Investigation of Network Architecture for Single-Channel End-to-End Denoising.
Proceedings of the 28th European Signal Processing Conference, 2020

Exploiting Narrative Context and A Priori Knowledge of Categories in Textual Emotion Classification.
Proceedings of the 28th International Conference on Computational Linguistics, 2020

Sentiment Analysis for Emotional Speech Synthesis in a News Dialogue System.
Proceedings of the 28th International Conference on Computational Linguistics, 2020

Efficient Human-In-The-Loop Object Detection using Bi-Directional Deep SORT and Annotation-Free Segment Identification.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Predicting the Working Time of Microtasks Based on Workers' Perception of Prediction Errors.
Hum. Comput., 2019

TurkScanner: Predicting the Hourly Wage of Microtasks.
Proceedings of the World Wide Web Conference, 2019

Waseda_Meisei_SoftBank at TRECVID 2019: Ad-hoc Video Search.
Proceedings of the 2019 TREC Video Retrieval Evaluation, 2019

SemSeq: A Regime for Training Widely-Applicable Word-Sequence Encoders.
Proceedings of the Computational Linguistics, 2019

Multi-Channel Speech Enhancement Using Time-Domain Convolutional Denoising Autoencoder.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Recognition of Intentions of Users' Short Responses for Conversational News Delivery System.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Speaker Adversarial Training of DPGMM-Based Feature Extractor for Zero-Resource Languages.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Regularized Adversarial Training for Single-Shot Virtual Try-On.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

Postfiltering Using an Adversarial Denoising Autoencoder with Noise-aware Training.
Proceedings of the IEEE International Conference on Acoustics, 2019

MicroLapse: Measuring Workers' Leniency to Prediction Errors of Microtasks' Working Times.
Proceedings of the Companion Publication of the 2019 ACM Conference on Computer Supported Cooperative Work and Social Computing, 2019

Towards Answer-unaware Conversational Question Generation.
Proceedings of the 2nd Workshop on Machine Reading for Question Answering, 2019

Waseda_Meisei at TRECVID 2018: Ad-hoc Video Search.
Proceedings of the 2018 TREC Video Retrieval Evaluation, 2018

Investigation of Users' Short Responses in Actual Conversation System and Automatic Recognition of their Intentions.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Social Image Tags as a Source of Word Embeddings: A Task-oriented Evaluation.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Fine-grained Video Retrieval using Query Phrases - Waseda_Meisei TRECVID 2017 AVS System -.
Proceedings of the 24th International Conference on Pattern Recognition, 2018

Sequential Fish Catch Forecasting Using Bayesian State Space Models.
Proceedings of the 24th International Conference on Pattern Recognition, 2018

Speaker Invariant Feature Extraction for Zero-Resource Languages with Adversarial Learning.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Language Model Domain Adaptation Via Recurrent Neural Networks with Domain-Shared and Domain-Specific Representations.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Answerable or Not: Devising a Dataset for Extending Machine Reading Comprehension.
Proceedings of the 27th International Conference on Computational Linguistics, 2018

Adversarial autoencoder for reducing nonlinear distortion.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Ad-hoc Video Search Improved by the Word Sense Filtering of Query Terms.
Proceedings of the Information Retrieval Technology, 2018

Associative Memory Model-Based Linear Filtering and Its Application to Tandem Connectionist Blind Source Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Object Detection Oriented Feature Pooling for Video Semantic Indexing.
Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2017) - Volume 5: VISAPP, Porto, Portugal, February 27, 2017

Waseda_Meisei at TRECVID 2017: Ad-hoc Video Search.
Proceedings of the 2017 TREC Video Retrieval Evaluation, 2017

Incorporating visual features into word embeddings: A bimodal autoencoder-based approach.
Proceedings of the IWCS 2017 - 12th International Conference on Computational Semantics - Short papers, Montpellier, France, September 19, 2017

Prosody Control of Utterance Sequence for Information Delivering.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Exploiting end of sentences and speaker alternations in language modeling for multiparty conversations.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Waseda at TRECVID 2016: Ad-hoc Video Search.
Proceedings of the 2016 TREC Video Retrieval Evaluation, 2016

Evaluation of Collaborative Video Surveillance Platform: Prototype Development of Abandoned Object Detection.
Proceedings of the 10th International Conference on Distributed Smart Camera, 2016

Improving semantic video indexing: Efforts in Waseda TRECVID 2015 SIN system.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Image retrieval under very noisy annotations.
Proceedings of the 24th European Signal Processing Conference, 2016

Video semantic indexing using object detection-derived features.
Proceedings of the 24th European Signal Processing Conference, 2016

Towards a Framework for Collaborative Video Surveillance System Using Crowdsourcing.
Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing, 2016

A Spoken Dialog System for Coordinating Information Consumption and Exploration.
Proceedings of the 2016 ACM Conference on Human Information Interaction and Retrieval, 2016

Multi-feature based fast depth decision in HEVC inter prediction for VLSI implementation.
Proceedings of the 9th International Congress on Image and Signal Processing, 2016

Automatic Expressive Opinion Sentence Generation for Enjoyable Conversational Systems.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Four-participant group conversation: A facilitation robot controlling engagement density as the fourth participant.
Comput. Speech Lang., 2015

Waseda @ TRECVID 2015: Semantic Indexing.
Proceedings of the 2015 TREC Video Retrieval Evaluation, 2015

Multi-layer feature extractions for image classification - Knowledge from deep CNNs.
Proceedings of the International Conference on Systems, Signals and Image Processing, 2015

Bilinear map of filter-bank outputs for DNN-based speech recognition.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Multiscale recurrent neural network based language model.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A comparative study of spectral clustering for i-vector-based speaker clustering under noisy conditions.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Separation matrix optimization using associative memory model for blind source separation.
Proceedings of the 23rd European Signal Processing Conference, 2015

Towards a Computational Model of Small Group Facilitation.
Proceedings of the 2015 AAAI Spring Symposia, 2015

Effect of frequency weighting on MLP-based speaker canonicalization.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Expression of speaker's intentions through sentence-final particle/ intonation combinations in Japanese conversational speech synthesis.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

A Four-Participant Group Facilitation Framework for Conversational Robots.
Proceedings of the SIGDIAL 2013 Conference, 2013

Blocked Gibbs sampling based multi-scale mixture model for speaker clustering on noisy data.
Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing, 2013

Speaker's intentions conveyed to listeners by sentence-final particles and their intonations in Japanese conversational speech.
Proceedings of the IEEE International Conference on Acoustics, 2013

Fully Bayesian speaker clustering based on hierarchically structured utterance-oriented Dirichlet process mixture model.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Expressing Speaker's Intentions through Sentence-Final Intonations for Japanese Conversational Speech Synthesis.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Fully Bayesian inference of multi-mixture Gaussian model and its evaluation using speaker clustering.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

AAM fitting using shape parameter distribution.
Proceedings of the 20th European Signal Processing Conference, 2012

Class-Distance-Based Discriminant Analysis and Its Application to Supervised Automatic Age Estimation.
IEICE Trans. Inf. Syst., 2011

Multiparty Conversation Facilitation Strategy Using Combination of Question Answering and Spontaneous Utterances.
Proceedings of the Paralinguistic Information and its Integration in Spoken Dialogue Systems, 2011

Conversational Speech Synthesis System with Communication Situation Dependent HMMs.
Proceedings of the Paralinguistic Information and its Integration in Spoken Dialogue Systems, 2011

Speaker Clustering Based on Utterance-Oriented Dirichlet Process Mixture Model.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Spatial Filter Calibration Based on Minimization of Modified LSD.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Speaker Verification Robust to Talking Style Variation Using Multiple Kernel Learning Based on Conditional Entropy Minimization.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Speaker recognition using multiple kernel learning based on conditional entropy minimization.
Proceedings of the IEEE International Conference on Acoustics, 2011

Subspace pursuit method for kernel-log-linear models.
Proceedings of the IEEE International Conference on Acoustics, 2011

A Sequential Pattern Classifier Based on Hidden Markov Kernel Machine and Its Application to Phoneme Classification.
IEEE J. Sel. Top. Signal Process., 2010

Speech Enhancement Using a Square Microphone Array in the Presence of Directional and Diffuse Noise.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2010

Psychological evaluation of a group communication activation robot in a party game.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

A regularized discriminative training method of acoustic models derived by minimum relative entropy discrimination.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Development of zonal beamformer and its application to robot audition.
Proceedings of the 18th European Signal Processing Conference, 2010

Robot as a multimodal human interface device.
Proceedings of the Auditory-Visual Speech Processing, 2010

Framework of Communication Activation Robot Participating in Multiparty Conversation.
Proceedings of the Dialog with Robots, 2010

Influence of Lombard Effect: Accuracy Analysis of Simulation-Based Assessments of Noisy Speech Recognition Systems for Various Recognition Conditions.
IEICE Trans. Inf. Syst., 2009

SCHEMA: multi-party interaction-oriented humanoid robot.
Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2009

Upper-Body Contour Extraction Using Face and Body Shape Variance Information.
Proceedings of the Advances in Image and Video Technology, Third Pacific Rim Symposium, 2009

Robot auditory system using head-mounted square microphone array.
Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2009

Conversation robot participating in and activating a group communication.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

System design of group communication activator: an entertainment task for elderly care.
Proceedings of the 4th ACM/IEEE International Conference on Human Robot Interaction, 2009

Direction-of-arrival estimation under noisy condition using four-line omni-directional microphones mounted on a robot head.
Proceedings of the 17th European Signal Processing Conference, 2009

Social Robots that Interact with People.
Proceedings of the Springer Handbook of Robotics, 2008

Gender Classification Based on Integration of Multiple Classifiers Using Various Features of Facial and Neck Images.
Inf. Media Technol., 2008

Mutual Information Based Dynamic Integration of Multiple Feature Streams for Robust Real-Time LVCSR.
IEICE Trans. Inf. Syst., 2008

Ears of the Robot: Direction of Arrival Estimation Based on Pattern Recognition Using Robot-Mounted Microphones.
IEICE Trans. Inf. Syst., 2008

Design and formulation for speech interface based on flexible shortcuts.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Speech enhancement using square microphone array for mobile devices.
Proceedings of the IEEE International Conference on Acoustics, 2008

Incorporation of phrase intonation to context clustering for average voice models in HMM-based Thai speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2008

Designing communication activation system in group communication.
Proceedings of the 8th IEEE-RAS International Conference on Humanoid Robots, 2008

Upper-body contour extraction and tracking using face and body shape variance information.
Proceedings of the 8th IEEE-RAS International Conference on Humanoid Robots, 2008

Multi-modal integration for personalized conversation: Towards a humanoid in daily life.
Proceedings of the 8th IEEE-RAS International Conference on Humanoid Robots, 2008

An ASM fitting method based on machine learning that provides a robust parameter initialization for AAM fitting.
Proceedings of the 8th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2008), 2008

Ears of the robot: Noise reduction using four-line ultra-micro omni-directional microphones mounted on a robot head.
Proceedings of the 2008 16th European Signal Processing Conference, 2008

Spectrum conversion using prosodic information.
Syst. Comput. Jpn., 2007

Fusion-Based Age-Group Classification Method Using Multiple Two-Dimensional Feature Extraction Algorithms.
IEICE Trans. Inf. Syst., 2007

Ears of the Robot: Three Simultaneous Speech Segregation and Recognition Using Robot-Mounted Microphones.
IEICE Trans. Inf. Syst., 2007

Dynamic integration of multiple feature streams for robust real-time LVCSR.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Adequacy Analysis of Simulation-Based Assessment of Speech Recognition System.
Proceedings of the IEEE International Conference on Acoustics, 2007

Extensible speech recognition system using proxy-agent.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

Introduction of the METI project "development of fundamental speech recognition technology".
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

Adaptive understanding of proposal-requesting expressions for conversational information retrieval system.
Syst. Comput. Jpn., 2006

Recognition of positive/negative attitude and its application to a spoken dialogue system.
Syst. Comput. Jpn., 2006

Hybrid Voice Conversion of Unit Selection and Generation Using Prosody Dependent HMM.
IEICE Trans. Inf. Syst., 2006

Genetic Algorithm Based Optimization of Partly-Hidden Markov Model Structure Using Discriminative Criterion.
IEICE Trans. Inf. Syst., 2006

Manifold HLDA and its application to robust speech recognition.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

MONEA: Message-oriented Networked-robot Architecture.
Proceedings of the 2006 IEEE International Conference on Robotics and Automation, 2006

Two-dimensional Heteroscedastic Linear Discriminant Analysis for Age-group Classification.
Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), 2006

Conversation Robot with the Function of Gaze Recognition.
Proceedings of the 2006 6th IEEE-RAS International Conference on Humanoid Robots, 2006

Subspace-based Age-group Classification Using Facial Images under Various Lighting Conditions.
Proceedings of the Seventh IEEE International Conference on Automatic Face and Gesture Recognition (FGR 2006), 2006

A method for solving the permutation problem of frequency-domain BSS using reference signal.
Proceedings of the 14th European Signal Processing Conference, 2006

Source separation using multiple directivity patterns produced by ICA-based BSS.
Proceedings of the 14th European Signal Processing Conference, 2006

An extension of the state-observation dependency in partly hidden Markov models and its application to continuous speech recognition.
Syst. Comput. Jpn., 2005

Extension of Hidden Markov Models for Multiple Candidates and Its Application to Gesture Recognition.
IEICE Trans. Inf. Syst., 2005

Optimizing the structure of partly-hidden Markov models using weighted likelihood-ratio maximization criterion.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Back-channel feedback generation using linguistic and nonlinguistic information and its application to spoken dialogue system.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Speech recognition in the blind condition based on multiple directivity patterns using a microphone array.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Design and implementation of data sharing architecture for multifunctional robot development.
Syst. Comput. Jpn., 2004

A Low-Band Spectrum Envelope Reconstruction Method for PSOLA-Based <i>F</i><sub>0</sub> Modification.
IEICE Trans. Inf. Syst., 2004

Speech-Recognition Interfaces for Music Information Retrieval: 'Speech Completion' and 'Speech Spotter'.
Proceedings of the ISMIR 2004, 2004

Recognition of three simultaneous utterance of speech by four-line directivity microphone mounted on head of robot.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Speech spotter: on-demand speech recognition in human-human conversation on the telephone or in face-to-face situations.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Prosody based attitude recognition with feature selection and its application to spoken dialog system as para-linguistic information.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

A Method of Gender Classification by Integrating Facial, Hairstyle, and Clothing Images.
Proceedings of the 17th International Conference on Pattern Recognition, 2004

Speech enhancement based on multiple directivity patterns using a microphone array.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

A low-band spectrum envelope modeling for high quality pitch modification.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Robust language modeling for a small corpus of target tasks using class-combined word statistics and selective use of a general corpus.
Syst. Comput. Jpn., 2003

Dictation of multiparty conversation considering speaker individuality and turn taking.
Syst. Comput. Jpn., 2003

Speech recognition of double talk using SAFIA-based audio segregation.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Speech starter: noise-robust endpoint detection by using filled pauses.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Speech shift: direct speech-input-mode switching through intentional control of voice pitch.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Hybrid modeling of PHMM and HMM for speech recognition.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Humanoid Robots in Waseda University-Hadaly-2 and WABIAN.
Auton. Robots, 2002

Inter-module cooperation architecture for interactive robot.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Lausanne, Switzerland, September 30, 2002

Generalization of state-observation-dependency in partly hidden Markov models.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Media-Integrated Biometric Person Recognition Based on the Dempster-Shafer Theory.
Proceedings of the 16th International Conference on Pattern Recognition, 2002

Extension of Hidden Markov Models to Deal with Multiple Candidates of Observations and its Application to Mobile-Robot-Oriented Gesture Recognition.
Proceedings of the 16th International Conference on Pattern Recognition, 2002

Design and collection of acoustic sound data for hands-free speech recognition and sound scene understanding.
Proceedings of the 2002 IEEE International Conference on Multimedia and Expo, 2002

Modeling of conversational strategy for the robot participating in the group conversation.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Estimating positions of multiple adjacent speakers based on MUSIC spectra correlation using a microphone array.
Proceedings of the IEEE International Conference on Acoustics, 2001

A conversational robot utilizing facial and body expressions.
Proceedings of the IEEE International Conference on Systems, 2000

IPA Japanese Dictation Free Software Project.
Proceedings of the Second International Conference on Language Resources and Evaluation, 2000

Free software toolkit for Japanese large vocabulary continuous speech recognition.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Dictation of multiparty conversation using statistical turn taking model and speaker model.
Proceedings of the IEEE International Conference on Acoustics, 2000

Multi-person conversation via multi-modal interface - a robot who communicate with multi-user -.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

Class-combined word n-gram for robust language modeling.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

Partly hidden Markov model and its application to speech recognition.
Proceedings of the 1999 IEEE International Conference on Acoustics, 1999

Controlling gaze of humanoid in communication with human.
Proceedings of the Proceedings 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems. Innovations in Theory, 1998

Source-extended language model for large vocabulary continuous speech recognition.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Sharable software repository for Japanese large vocabulary continuous speech recognition.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

The design of the newspaper-based Japanese large vocabulary continuous speech recognition corpus.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Partly-hidden Markov model and its application to gesture recognition.
Proceedings of the 1997 IEEE International Conference on Acoustics, 1997

Speech recognition in nonstationary noise based on parallel HMMs and spectral subtraction.
Syst. Comput. Jpn., 1996

ALICE: acquisition of language in conversational environment - an approach to weakly supervised training of spoken language system for language porting.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

Handling of user interruption to achieve timing-free utterances for spoken dialogue interface.
Syst. Comput. Jpn., 1995

Generation of prosody in speech synthesis using large speech data-base.
Proceedings of the 3rd International Conference on Spoken Language Processing, 1994

Phoneme recognition in various styles of utterance based on mutual information criterion.
Proceedings of the 3rd International Conference on Spoken Language Processing, 1994

Multimodal drawing tool using speech, mouse and key-board.
Proceedings of the 3rd International Conference on Spoken Language Processing, 1994

Automatic training of phoneme dictionary based on mutual information criterion.
Proceedings of ICASSP '94: IEEE International Conference on Acoustics, 1994

Markov model based noise modeling and its application to noisy speech recognition using dynamical features of speech.
Proceedings of ICASSP '94: IEEE International Conference on Acoustics, 1994

Word spotting in conversational speech based on phonemic unit likelihood by mutual information criterion.
Proceedings of the Third European Conference on Speech Communication and Technology, 1993

Speech recognition under the unstationary noise based on the noise Markov model and spectral-subtraction.
Proceedings of the Third European Conference on Speech Communication and Technology, 1993

Phoneme recognition in continuous speech based on mutual information considering phonemic duration and connectivity.
Proceedings of the Second International Conference on Spoken Language Processing, 1992

Spectral mapping onto probabilistic domain using neural networks and its application to speaker adaptive phoneme recognition.
Proceedings of the Second International Conference on Spoken Language Processing, 1992

Speaker adaptive phoneme recognition based on feature mapping from spectral domain to probabilistic domain.
Proceedings of the 1992 IEEE International Conference on Acoustics, 1992

Text-to-speech synthesizer using superposition of sinusoidal waves generated by synchronized oscillators.
Proceedings of the Second European Conference on Speech Communication and Technology, 1991

Application of neural networks to articulatory motion estimation.
Proceedings of the 1991 International Conference on Acoustics, 1991

Dependence of phonemic feature on context.
Proceedings of the 1990 International Conference on Acoustics, 1990

Statistical properties of fluctuation of pitch intervals and its modeling for natural synthetic speech.
Proceedings of the 1990 International Conference on Acoustics, 1990

Contextual factor analysis of vowel distribution.
Proceedings of the First European Conference on Speech Communication and Technology, 1989

The robot musician 'wabot-2' (waseda robot-2).
Robotics, 1987

Description of task dependent knowledge for speech understanding system.
Proceedings of the European Conference on Speech Technology, 1987

Estimating articulatory motion from speech wave.
Speech Commun., 1986

Estimation of articulatory parameters by table look-up method and its application for speaker independent phoneme recognition.
Proceedings of the IEEE International Conference on Acoustics, 1986

A network model dealing with focus of conversation for speech understanding system.
Proceedings of the IEEE International Conference on Acoustics, 1986

Phrase speech recognition of large vocabulary using feature in articulatory domain.
Proceedings of the IEEE International Conference on Acoustics, 1984

Considerations on articulatory dynamics for continuous speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 1983

Recognition of semivowels and consonants in continuous speech using articulatory parameters.
Proceedings of the IEEE International Conference on Acoustics, 1982
