Satoshi Nakamura

Orcid: 0000-0001-6956-3803

Affiliations:
  • Nara Institute of Science and Technology, Ikoma, Japan
  • ATR Spoken Language Communication Labs, Kyoto, Japan
  • National Institute of Information and Communications Technology (NICT), Spoken Language Communication Group, Keihanna Science City, Japan
  • Sharp Corporation, Nara, Japan
  • Kyoto University, Japan (PhD 1992)


According to our database1, Satoshi Nakamura authored at least 719 papers between 1988 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Continual few-shot patch-based learning for anime-style colorization.
Comput. Vis. Media, August, 2024

Improving Speech Translation Accuracy and Time Efficiency With Fine-Tuned wav2vec 2.0-Based Speech Segmentation.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Adaptive virtual agent: Design and evaluation for real-time human-agent interaction.
Int. J. Hum. Comput. Stud., 2024

Neural End-To-End Speech Translation Leveraged by ASR Posterior Distribution.
IEICE Trans. Inf. Syst., 2024

A Neural Transformer Framework for Simultaneous Tasks of Segmentation, Classification, and Caller Identification of Marmoset Vocalization.
CoRR, 2024

A Word Order Synchronization Metric for Evaluating Simultaneous Interpretation and Translation.
CoRR, 2024

NAIST Simultaneous Speech Translation System for IWSLT 2024.
CoRR, 2024

Word Order in English-Japanese Simultaneous Interpretation: Analyses and Evaluation using Chunk-wise Monotonic Translation.
CoRR, 2024

Response Generation for Cognitive Behavioral Therapy with Large Language Models: Comparative Study with Socratic Questioning.
CoRR, 2024

Do as I Demand, Not as I Say: A Dataset for Developing a Reflective Life-Support Robot.
IEEE Access, 2024

Applying Syntax-Prosody Mapping Hypothesis and Boundary-Driven Theory to Neural Sequence-to-Sequence Speech Synthesis.
IEEE Access, 2024

Subspace Representations for Soft Set Operations and Sentence Similarities.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

TransLLaMa: LLM-based Simultaneous Translation System.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

LLMs Are Zero-Shot Context-Aware Simultaneous Translators.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

NAIST-SIC-Aligned: An Aligned English-Japanese Simultaneous Interpretation Corpus.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Automated Essay Scoring Using Grammatical Variety and Errors with Multi-Task Learning and Item Response Theory.
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications, 2024

2023
Eye-movement analysis on facial expression for identifying children and adults with neurodevelopmental disorders.
Frontiers Digit. Health, March, 2023

End-to-end dialogue structure parsing on multi-floor dialogue based on multi-task learning.
Frontiers Robotics AI, February, 2023

Reflective action selection based on positive-unlabeled learning and causality detection model.
Comput. Speech Lang., 2023

Average Token Delay: A Duration-aware Latency Metric for Simultaneous Translation.
CoRR, 2023

Arukikata Travelogue Dataset.
CoRR, 2023

NAIST-SIC-Aligned: Automatically-Aligned English-Japanese Simultaneous Interpretation Corpus.
CoRR, 2023

Modeling Multiple User Interests using Hierarchical Knowledge for Conversational Recommender System.
CoRR, 2023

Whats New? Identifying the Unfolding of New Events in Narratives.
CoRR, 2023

SpeeChain: A Speech Toolkit for Large-Scale Machine Speech Chain.
CoRR, 2023

Japanese Neural Incremental Text-to-Speech Synthesis Framework With an Accent Phrase Input.
IEEE Access, 2023

Content Order-Controllable MR-to-Text.
IEEE Access, 2023

Improved Automatic Colorization by Optimal Pre-colorization.
Proceedings of the ACM SIGGRAPH 2023 Posters, 2023

Emotion Prediction Using Multi-source Biosignals During Cognitive Behavior Therapy with Conversational Virtual Agents.
Proceedings of the 26th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2023

Investigation of Validity of Paradigmatic Diagnosis for Downstep in Japanese.
Proceedings of the 26th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2023

E2E Refined Dataset.
Proceedings of the 26th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2023

Tagged End-to-End Simultaneous Speech Translation Training Using Simultaneous Interpretation Data.
Proceedings of the 20th International Conference on Spoken Language Translation, 2023

NAIST Simultaneous Speech-to-speech Translation System for IWSLT 2023.
Proceedings of the 20th International Conference on Spoken Language Translation, 2023


Inter-connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Average Token Delay: A Latency Metric for Simultaneous Translation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

4th Workshop on Social Affective Multimodal Interaction for Health (SAMIH).
Proceedings of the 25th International Conference on Multimodal Interaction, 2023

An Adaptive Virtual Agent Platform for Automated Social Skills Training.
Proceedings of the International Conference on Multimodal Interaction, 2023

Computational analyses of linguistic features with schizophrenic and autistic traits along with formal thought disorders.
Proceedings of the 25th International Conference on Multimodal Interaction, 2023

Self-Adaptive Incremental Machine Speech Chain for Lombard TTS with High-Granularity ASR Feedback in Dynamic Noise Condition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Multimodal Voice Activity Prediction: Turn-taking Events Detection in Expert-Novice Conversation.
Proceedings of the International Conference on Human-Agent Interaction, 2023

Acceptability and Trustworthiness of Virtual Agents by Effects of Theory of Mind and Social Skills Training.
Proceedings of the 17th IEEE International Conference on Automatic Face and Gesture Recognition, 2023

Predicting Autistic Traits Using Eye Movement during Visual Perspective Taking and Facial Emotion Identification.
Proceedings of the 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2023

Evaluating the Robustness of Discrete Prompts.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

Social Performance Rating During Social Skills Training in Adults with Autism Spectrum Disorder and Schizophrenia.
Proceedings of the 11th International Conference on Affective Computing and Intelligent Interaction, ACII 2023, 2023

2022
Modeling Unsupervised Empirical Adaptation by DPGMM and DPGMM-RNN Hybrid Model to Extract Perceptual Features for Low-Resource ASR.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

A Machine Speech Chain Approach for Dynamically Adaptive Lombard TTS in Static and Dynamic Noise Environments.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Tackling multiple object tracking with complicated motions - Re-designing the integration of motion and appearance.
Image Vis. Comput., 2022

Multimodal Prediction of Social Responsiveness Score with BERT-Based Text Features.
IEICE Trans. Inf. Syst., 2022

Online EEG-Based Emotion Prediction and Music Generation for Inducing Affective States.
IEICE Trans. Inf. Syst., 2022

Applying Meta-Learning and Iso Principle for Development of EEG-Based Emotion Induction System.
Frontiers Digit. Health, 2022

Automatic Thoughts and Facial Expressions in Cognitive Restructuring With Virtual Agents.
Frontiers Comput. Sci., 2022

Subspace-based Set Operations on a Pre-trained Word Embedding Space.
CoRR, 2022

Actor-identified Spatiotemporal Action Detection - Detecting Who Is Doing What in Videos.
CoRR, 2022

USB: A Unified Semi-supervised Learning Benchmark.
CoRR, 2022

USB: A Unified Semi-supervised Learning Benchmark for Classification.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Simultaneous Neural Machine Translation with Prefix Alignment.
Proceedings of the 19th International Conference on Spoken Language Translation, 2022

NAIST Simultaneous Speech-to-Text Translation System for IWSLT 2022.
Proceedings of the 19th International Conference on Spoken Language Translation, 2022


Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR via Speech Chain Reconstruction and Self-Transcribing.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Multimodal Persuasive Dialogue Corpus using Teleoperated Android.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Applying Syntax-Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

3rd Workshop on Social Affective Multimodal Interaction for Health (SAMIH).
Proceedings of the International Conference on Multimodal Interaction, 2022

Linguistic Features of Clients and Counselors for Early Detection of Mental Health Issues in Online Text-based Counseling.
Proceedings of the 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2022

Analysis of Feedback Contents and Estimation of Subjective Scores in Social Skills Training.
Proceedings of the 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2022

Pseudo Ambiguous and Clarifying Questions Based on Sentence Structures Toward Clarifying Question Answering System.
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering, 2022

2021
Instance-Level Heterogeneous Domain Adaptation for Limited-Labeled Sketch-to-Photo Retrieval.
IEEE Trans. Multim., 2021

Tackling Perception Bias in Unsupervised Phoneme Discovery Using DPGMM-RNN Hybrid Model and Functional Load.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Towards Tokenization and Part-of-Speech Tagging for Khmer: Data and Discussion.
ACM Trans. Asian Low Resour. Lang. Inf. Process., 2021

ReMOT: A model-agnostic refinement for multiple object tracking.
Image Vis. Comput., 2021

Neural Incremental Speech Recognition Toward Real-Time Machine Speech Translation.
IEICE Trans. Inf. Syst., 2021

Code-Switching ASR and TTS Using Semisupervised Learning with Machine Speech Chain.
IEICE Trans. Inf. Syst., 2021

Using Perturbed Length-aware Positional Encoding for Non-autoregressive Neural Machine Translation.
CoRR, 2021

Constituency Parsing by Cross-Lingual Delexicalization.
IEEE Access, 2021

Multimodal Chain: Cross-Modal Collaboration Through Listening, Speaking, and Visualizing.
IEEE Access, 2021

End-to-End Image-to-Speech Generation for Untranscribed Unknown Languages.
IEEE Access, 2021

Multilingual Machine Translation Evaluation Metrics Fine-tuned on Pseudo-Negative Examples for WMT 2021 Metrics Task.
Proceedings of the Sixth Conference on Machine Translation, 2021

Simultaneous Neural Machine Translation with Constituent Label Prediction.
Proceedings of the Sixth Conference on Machine Translation, 2021

Incorporating Discriminative DPGMM Posteriorgrams for Low-Resource ASR.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Transformer-Based Direct Speech-To-Speech Translation with Transcoder.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Anime Character Colorization using Few-shot Learning.
Proceedings of the SA '21: SIGGRAPH Asia 2021 Technical Communications, Tokyo, Japan, December 14, 2021

ARTA: Collection and Classification of Ambiguous Requests and Thoughtful Actions.
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2021

Multi-Encoder Sequential Attention Network for Context-Aware Speech Recognition in Japanese Dialog Conversation.
Proceedings of the 24th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2021

Using Local Phrase Dependency Structure Information in Neural Sequence-to-Sequence Speech Synthesis.
Proceedings of the 24th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2021

Simultaneous Speech-to-Speech Translation System with Transformer-Based Incremental ASR, MT, and TTS.
Proceedings of the 24th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2021

On Knowledge Distillation for Translating Erroneous Speech Transcriptions.
Proceedings of the 18th International Conference on Spoken Language Translation, 2021

NAIST English-to-Japanese Simultaneous Translation System for IWSLT 2021 Simultaneous Text-to-text Task.
Proceedings of the 18th International Conference on Spoken Language Translation, 2021

Large-Scale English-Japanese Simultaneous Interpretation Corpus: Construction and Analyses with Sentence-Aligned Data.
Proceedings of the 18th International Conference on Spoken Language Translation, 2021


Eliciting Cooperative Persuasive Dialogue by Multimodal Emotional Robot.
Proceedings of the Conversational AI for Natural Human-Centric Interaction, 2021

Transcribing Paralinguistic Acoustic Cues to Target Language Text in Transformer-Based Speech-to-Text Translation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Unsupervised Neural-Based Graph Clustering for Variable-Length Speech Representation Discovery of Zero-Resource Languages.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Dynamically Adaptive Machine Speech Chain Inference for TTS in Noisy Environment: Listen and Speak Louder.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

ASR Posterior-Based Loss for Multi-Task End-to-End Speech Translation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Weakly-Supervised Speech-to-Text Mapping with Visually Connected Non-Parallel Speech-Text Data Using Cyclic Partially-Aligned Transformer.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Named Entity-Factored Transformer for Proper Noun Translation.
Proceedings of the 18th International Conference on Natural Language Processing (ICON 2021), National Institute of Technology Silchar, Silchar, India, December 16, 2021

Multi-Source Cross-Lingual Constituency Parsing.
Proceedings of the 18th International Conference on Natural Language Processing (ICON 2021), National Institute of Technology Silchar, Silchar, India, December 16, 2021

2nd Workshop on Social Affective Multimodal Interaction for Health (SAMIH).
Proceedings of the ICMI '21: International Conference on Multimodal Interaction, 2021

Multimodal Dataset of Social Skills Training in Natural Conversational Setting.
Proceedings of the ICMI '21 Companion: Companion Publication of the 2021 International Conference on Multimodal Interaction, Montreal, QC, Canada, October 18, 2021

Meta-Learning for Emotion Prediction from EEG while Listening to Music.
Proceedings of the ICMI '21 Companion: Companion Publication of the 2021 International Conference on Multimodal Interaction, Montreal, QC, Canada, October 18, 2021

Virtual Agent Design for Social Skills Training Considering Autistic Traits.
Proceedings of the 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2021

Clustering of Human Movement Trajectories based on Distributional Representations Derived from Bi-directional LSTM Network with Geographical Coordinates.
Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), 2021

Emotion Estimation from EEG Signals and Expected Subjective Evaluation.
Proceedings of the 9th International Winter Conference on Brain-Computer Interface, 2021

Relationship between Mood Improvement and Questioning to Evaluate Automatic Thoughts in Cognitive Restructuring with a Virtual Agent.
Proceedings of the 2021 9th International Conference on Affective Computing and Intelligent Interaction, 2021

2020
Corrections to "Machine Speech Chain".
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Machine Speech Chain.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Multi-Source Neural Machine Translation With Missing Data.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

End-to-End Speech Translation With Transcoding by Multi-Task Learning for Distant Language Pairs.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Improving neural machine translation through phrase-based soft forced decoding.
Mach. Transl., 2020

Analysis of conversational listening skills toward agent-based social skills training.
J. Multimodal User Interfaces, 2020

Recurrent Neural Network Compression Based on Low-Rank Tensor Representation.
IEICE Trans. Inf. Syst., 2020

Leveraging Neural Caption Translation with Visually Grounded Paraphrase Augmentation.
IEICE Trans. Inf. Syst., 2020

Simultaneous Speech-to-Speech Translation System with Neural Incremental ASR, MT, and TTS.
CoRR, 2020

Image Captioning with Visual Object Representations Grounded in the Textual Modality.
CoRR, 2020

ReMOTS: Self-Supervised Refining Multi-Object Tracking and Segmentation.
CoRR, 2020

An Interactive Image Editing System Using an Uncertainty-Based Confirmation Strategy.
IEEE Access, 2020

Policy Reuse for Dialog Management Using Action-Relation Probability.
IEEE Access, 2020

Entrainable Neural Conversation Model Based on Reinforcement Learning.
IEEE Access, 2020

Cross-Lingual Machine Speech Chain for Javanese, Sundanese, Balinese, and Bataks Speech Recognition and Synthesis.
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages and Collaboration and Computing for Under-Resourced Languages, 2020

Confidence-aware Practical Anime-style Colorization.
Proceedings of the SIGGRAPH '20: Special Interest Group on Computer Graphics and Interactive Techniques Conference, 2020

Towards Speech Entrainment: Considering ASR Information in Speaking Rate Variation of TTS Waveform Generation.
Proceedings of the 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2020

Emotional Speech Corpus for Persuasive Dialogue System.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

NAIST's Machine Translation Systems for IWSLT 2020 Conversational Speech Translation Task.
Proceedings of the 17th International Conference on Spoken Language Translation, 2020

Caption Generation of Robot Behaviors Based on Unsupervised Learning of Action Segments.
Proceedings of the Conversational Dialogue Systems for the Next Decade, 2020

Neural Speech Completion.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Transformer VQ-VAE for Unsupervised Unit Discovery and Speech Synthesis: ZeroSpeech 2020 Challenge.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Combining Audio and Brain Activity for Predicting Speech Quality.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Incremental Machine Speech Chain Towards Enabling Listening While Speaking in Real-Time.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Augmenting Images for ASR and TTS Through Single-Loop and Dual-Loop Multimodal Chain Framework.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Social Affective Multimodal Interaction for Health.
Proceedings of the ICMI '20: International Conference on Multimodal Interaction, 2020

Analysis of Mood Changes and Facial Expressions during Cognitive Behavior Therapy through a Virtual Agent.
Proceedings of the Companion Publication of the 2020 International Conference on Multimodal Interaction, 2020

Objective Prediction of Social Skills Level for Automated Social Skills Training Using Audio and Text Information.
Proceedings of the Companion Publication of the 2020 International Conference on Multimodal Interaction, 2020

Music Generation and Emotion Estimation from EEG Signals for Inducing Affective States.
Proceedings of the Companion Publication of the 2020 International Conference on Multimodal Interaction, 2020

DEJA-VU: Double Feature Presentation and Iterated Loss in Deep Transformer Networks.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Using Panoramic Videos for Multi-Person Localization and Tracking In A 3D Panoramic Coordinate.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Analysis of selective attention processing on experienced simultaneous interpreters using EEG phase synchronization.
Proceedings of the 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2020

Sequential Attention-based Detection of Semantic Incongruities from EEG While Listening to Speech.
Proceedings of the 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2020

Improving Spoken Language Understanding by Wisdom of Crowds.
Proceedings of the 28th International Conference on Computational Linguistics, 2020

Incorporating Noisy Length Constraints into Transformer with Length-aware Positional Encodings.
Proceedings of the 28th International Conference on Computational Linguistics, 2020

Automatic Machine Translation Evaluation using Source Language Inputs and Cross-lingual Language Model.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Reflection-based Word Attribute Transfer.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, 2020

2019
Positive Emotion Elicitation in Chat-Based Dialogue Systems.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Additional Operations of Simple HITs on Microtask Crowdsourcing for Worker Quality Prediction.
J. Inf. Process., 2019

Neural Oscillation-Based Classification of Japanese Spoken Sentences During Speech Perception.
IEICE Trans. Inf. Syst., 2019

Electroencephalogram-Based Single-Trial Detection of Language Expectation Violations in Listening to Speech.
Frontiers Comput. Neurosci., 2019

Associative knowledge feature vector inferred on external knowledge base for dialog state tracking.
Comput. Speech Lang., 2019

Simultaneous Neural Machine Translation using Connectionist Temporal Classification.
CoRR, 2019

Deja-vu: Double Feature Presentation in Deep Transformer Networks.
CoRR, 2019

Conversational Response Re-ranking Based on Event Causality and Role Factored Tensor Event Embedding.
CoRR, 2019

From Speech Chain to Multimodal Chain: Leveraging Cross-modal Data Augmentation for Semi-supervised Learning.
CoRR, 2019

Classification of alkaloids according to the starting substances of their biosynthetic pathways using graph convolutional neural networks.
BMC Bioinform., 2019

A Framework for Knowing Who is Doing What in Aerial Surveillance Videos.
IEEE Access, 2019

End-to-End Speech Recognition Sequence Training With Reinforcement Learning.
IEEE Access, 2019

Neural iTTS: Toward Synthesizing Speech in Real-time with End-to-end Neural Text-to-Speech Framework.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Graph matching based anime colorization with multiple references.
Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference, 2019

Phoneme-level speaking rate variation on waveform generation using GAN-TTS.
Proceedings of the 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2019

Recognition and translation of code-switching speech utterances.
Proceedings of the 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2019

Oriental-COCOSDA 2019 Japan country report.
Proceedings of the 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2019

Make Skeleton-based Action Recognition Model Smaller, Faster and Better.
Proceedings of the MMAsia '19: ACM Multimedia Asia, Beijing, China, December 16-18, 2019, 2019

Spoken Dialogue Robot for Watching Daily Life of Elderly People.
Proceedings of the Increasing Naturalness and Flexibility in Spoken Dialogue Interaction, 2019

VQVAE Unsupervised Unit Discovery and Multi-Scale Code2Spec Inverter for Zerospeech Challenge 2019.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Speech Quality Evaluation of Synthesized Japanese Speech Using EEG.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

An Incremental Turn-Taking Model for Task-Oriented Dialog Systems.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Neural Conversation Model Controllable by Given Dialogue Act Based on Adversarial Learning and Label-aware Objective.
Proceedings of the 12th International Conference on Natural Language Generation, 2019

Detecting Dementia from Face in Human-Agent Interaction.
Proceedings of the Adjunct of the 2019 International Conference on Multimodal Interaction, 2019

Detecting Syntactic Violations from Single-trial EEG using Recurrent Neural Networks.
Proceedings of the Adjunct of the 2019 International Conference on Multimodal Interaction, 2019

Measuring Affective Sharing between Two People by EEG Hyperscanning.
Proceedings of the Adjunct of the 2019 International Conference on Multimodal Interaction, 2019

Cross-lingual Speech-based Tobi Label Generation Using Bidirectional Lstm.
Proceedings of the IEEE International Conference on Acoustics, 2019

End-to-end Feedback Loss in Speech Chain Framework via Straight-through Estimator.
Proceedings of the IEEE International Conference on Acoustics, 2019

Speech Artifact Removal from Eeg Recordings of Spoken Word Production with Tensor Decomposition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Speech-to-Speech Translation Between Untranscribed Unknown Languages.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Zero-Shot Code-Switching ASR and TTS with Multilingual Machine Speech Chain.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Neural Machine Translation with Acoustic Embedding.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Listening While Speaking and Visualizing: Improving ASR Through Multimodal Chain.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
Dirichlet Process Mixture of Mixtures Model for Unsupervised Subword Modeling.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Sequence-to-Sequence Models for Emphasis Speech Translation.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Intra-gender statistical singing voice conversion with direct waveform modification using log-spectral differential.
Speech Commun., 2018

An end-to-end model for cross-lingual transformation of paralinguistic information.
Mach. Transl., 2018

Construction of Spontaneous Emotion Corpus from Indonesian TV Talk Shows and Its Application on Multimodal Emotion Recognition.
IEICE Trans. Inf. Syst., 2018

Semantically Readable Distributed Representation Learning and Its Expandability Using a Word Semantic Vector Dictionary.
IEICE Trans. Inf. Syst., 2018

Learning Supervised Feature Transformations on Zero Resources for Improved Acoustic Unit Discovery.
IEICE Trans. Inf. Syst., 2018

Optimization of Information-Seeking Dialogue Strategy for Argumentation-Based Dialogue System.
CoRR, 2018

Another Diversity-Promoting Objective Function for Neural Dialogue Generation.
CoRR, 2018

Training Neural Machine Translation using Word Embedding-based Loss.
CoRR, 2018

Interactive Image Manipulation with Natural Language Instruction Commands.
CoRR, 2018

Optimizing DPGMM Clustering in Zero Resource Setting Based on Functional Load.
Proceedings of the 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages, 2018

Corpus Construction and Semantic Analysis of Indonesian Image Description.
Proceedings of the 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages, 2018

Multi-Scale Alignment and Contextual History for Attention Mechanism in Sequence-to-Sequence Model.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Adaptive Wavenet Vocoder for Residual Compensation in GAN-Based Voice Conversion.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Speech Chain for Semi-Supervised Learning of Japanese-English Code-Switching ASR and TTS.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Optimizing Neural Response Generator with Emotional Impact Information.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Toward Multi-Features Emphasis Speech Translation: Assessment of Human Emphasis Production and Perception with Speech and Text Clues.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Pre- and post-processes for automatic colorization using a fully convolutional network.
Proceedings of the SIGGRAPH Asia 2018 Posters, Tokyo, Japan, December 04-07, 2018, 2018

Unsupervised Counselor Dialogue Clustering for Positive Emotion Elicitation in Neural Dialogue System.
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, 2018

Multi-Modal Multi-Task Deep Learning For Speaker And Emotion Recognition Of TV-Series Data.
Proceedings of the 2018 Oriental COCOSDA, 2018

Japanese-English Code-Switching Speech Data Construction.
Proceedings of the 2018 Oriental COCOSDA, 2018

Guiding Neural Machine Translation with Retrieved Translation Pieces.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

Japanese Dialogue Corpus of Information Navigation and Attentive Listening Annotated with Extended ISO-24617-2 Dialogue Act Tags.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Dialogue Scenario Collection of Persuasive Dialogue with Emotional Expressions via Crowdsourcing.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Construction of English-French Multimodal Affective Conversational Corpus from TV Dramas.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Using Spoken Word Posterior Features in Neural Machine Translation.
Proceedings of the 15th International Conference on Spoken Language Translation, 2018

Multi-Source Neural Machine Translation with Data Augmentation.
Proceedings of the 15th International Conference on Spoken Language Translation, 2018

Multi-paraphrase Augmentation to Leverage Neural Caption Translation.
Proceedings of the 15th International Conference on Spoken Language Translation, 2018

Impact of Deception Information on Negotiation Dialog Management: A Case Study on Doctor-Patient Conversations.
Proceedings of the 9th International Workshop on Spoken Dialogue System Technology, 2018

Dialogue Act Classification in Reference Interview Using Convolutional Neural Network with Byte Pair Encoding.
Proceedings of the 9th International Workshop on Spoken Dialogue System Technology, 2018

Incremental TTS for Japanese Language.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Detection of Dementia from Responses to Atypical Questions Asked by Embodied Conversational Agents.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Machine Speech Chain with One-shot Speaker Adaptation.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Compressing End-to-end ASR Networks by Tensor-Train Decomposition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Tensor Decomposition for Compressing Recurrent Neural Network.
Proceedings of the 2018 International Joint Conference on Neural Networks, 2018

Listening Skills Assessment through Computer Agents.
Proceedings of the 2018 on International Conference on Multimodal Interaction, 2018

Sequence-to-Sequence Asr Optimization Via Reinforcement Learning.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Graph Regularized Tensor Factorization for Single-Trial EEG Analysis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Single-Trial Detection of Semantic Anomalies From EEG During Listening to Spoken Sentences.
Proceedings of the 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2018

Information Filtering Method for Twitter Streaming Data Using Human-in-the-Loop Machine Learning.
Proceedings of the Database and Expert Systems Applications, 2018

TRANS-AM: Discovery Method of Optimal Input Vectors Corresponding to Objective Variables.
Proceedings of the Big Data Analytics and Knowledge Discovery, 2018

Detecting suppression of negative emotion by time series change of cerebral blood flow using fNIRS.
Proceedings of the 2018 IEEE EMBS International Conference on Biomedical & Health Informatics, 2018

Deception Detection and Analysis in Spoken Dialogues based on FastText.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Eliciting Positive Emotion through Affect-Sensitive Dialogue Response Generation: A Neural Network Approach.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Preserving Word-Level Emphasis in Speech-to-Speech Translation.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Transcribing against time.
Speech Commun., 2017

A Vibration Control Method of an Electrolarynx Based on Statistical <i>F</i><sub>0</sub> Pattern Prediction.
IEICE Trans. Inf. Syst., 2017

Development of the "VoiceTra" Multi-Lingual Speech Translation System.
IEICE Trans. Inf. Syst., 2017

Analysis of the Effect of Dependency Information on Predicate-Argument Structure Analysis and Zero Anaphora Resolution.
CoRR, 2017

Local Monotonic Attention Mechanism for End-to-End Speech Recognition.
CoRR, 2017

NICT-NAIST System for WMT17 Multimodal Translation Task.
Proceedings of the Second Conference on Machine Translation, 2017

Tree as a Pivot: Syntactic Matching Methods in Pivot Translation.
Proceedings of the Second Conference on Machine Translation, 2017

Semantically readable distributed representation learning for social media mining.
Proceedings of the International Conference on Web Intelligence, 2017

Recognizing Emotionally Coloured Dialogue Speech Using Speaker-Adapted DNN-CNN Bottleneck Features.
Proceedings of the Speech and Computer - 19th International Conference, 2017

Information Navigation System with Discovering User Interests.
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, 2017

Creation of a multi-paraphrase corpus based on various elementary operations.
Proceedings of the 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment, 2017

A <i>k</i>-anonymized Text Generation Method.
Proceedings of the Advances in Network-Based Information Systems, 2017

Initial response time measurement in eye movement for dementia screening test.
Proceedings of the Fifteenth IAPR International Conference on Machine Vision Applications, 2017

Speech recognition features based on deep latent Gaussian models.
Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing, 2017

Eliciting Positive Emotional Impact in Dialogue Response Selection.
Proceedings of the Advanced Social Interaction with Agents, 2017

Subject-Independent Classification of Japanese Spoken Sentences by Multiple Frequency Bands Phase Pattern of EEG Response During Speech Perception.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Physically Constrained Statistical F<sub>0</sub> Prediction for Electrolaryngeal Speech Enhancement.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Structured-Based Curriculum Learning for End-to-End English-Japanese Speech Translation.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Ensembles of Multi-Scale VGG Acoustic Models.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Toward Expressive Speech Translation: A Unified Sequence-to-Sequence LSTMs Approach for Translating Words and Emphasis.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Compressing recurrent neural network with tensor train.
Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

Improving Neural Machine Translation through Phrase-based Forced Decoding.
Proceedings of the Eighth International Joint Conference on Natural Language Processing, 2017

Local Monotonic Attention Mechanism for End-to-End Speech And Language Processing.
Proceedings of the Eighth International Joint Conference on Natural Language Processing, 2017

Acquisition and Assessment of Semantic Content for the Generation of Elaborateness and Indirectness in Spoken Dialogue Systems.
Proceedings of the Eighth International Joint Conference on Natural Language Processing, 2017

Tracking liking state in brain activity while watching multiple movies.
Proceedings of the 19th ACM International Conference on Multimodal Interaction, 2017

A trade-off between estimation accuracy of worker quality and task complexity.
Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017

Attention-based Wav2Text with feature transfer learning.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Listening while speaking: Speech chain by deep learning.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Feature optimized DPGMM clustering for unsupervised subword modeling: A contribution to zerospeech 2017.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

An investigation of how to design control parameters for statistical voice timbre control.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

A Simple and Strong Baseline: NAIST-NICT Neural Machine Translation System for WAT2017 English-Japanese Translation Task.
Proceedings of the 4th Workshop on Asian Translation, 2017

An Empirical Study of Mini-Batch Creation Strategies for Neural Machine Translation.
Proceedings of the First Workshop on Neural Machine Translation, 2017

Neural Machine Translation via Binary Code Prediction.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

Processing negative emotions through social communication: Multimodal database construction and analysis.
Proceedings of the Seventh International Conference on Affective Computing and Intelligent Interaction, 2017

2016
Teaching Social Communication Skills Through Human-Agent Interaction.
ACM Trans. Interact. Intell. Syst., 2016

Postfilters to Modify the Modulation Spectrum for Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Learning cooperative persuasive dialogue policies using framing.
Speech Commun., 2016

Learning local word reorderings for hierarchical phrase-based statistical machine translation.
Mach. Transl., 2016

Reinforcement Learning of Multi-Party Trading Dialog Policies.
Inf. Media Technol., 2016

A Statistical Sample-Based Approach to GMM-Based Voice Conversion Using Tied-Covariance Acoustic Models.
IEICE Trans. Inf. Syst., 2016

Non-Native Text-to-Speech Preserving Speaker Individuality Based on Partial Correction of Prosodic and Phonetic Characteristics.
IEICE Trans. Inf. Syst., 2016

Neural Network Approaches to Dialog Response Retrieval and Generation.
IEICE Trans. Inf. Syst., 2016

Enhancing Event-Related Potentials Based on Maximum a Posteriori Estimation with a Spatial Correlation Prior.
IEICE Trans. Inf. Syst., 2016

Improvements of Voice Timbre Control Based on Perceived Age in Singing Voice Conversion.
IEICE Trans. Inf. Syst., 2016

Assessing the Quality of Wikipedia Editors through Crowdsourcing.
Proceedings of the 25th International Conference on World Wide Web, 2016

Unsupervised Linear Discriminant Analysis for Supporting DPGMM Clustering in the Zero Resource Scenario.
Proceedings of the SLTU-2016, 2016

Deep bottleneck features and sound-dependent i-vectors for simultaneous recognition of speech and environmental sounds.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

F0 transformation techniques for statistical voice conversion with direct waveform modification with spectral differential.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Iterative training of a DPGMM-HMM acoustic unit recognizer in a zero resource scenario.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Analyzing the Effect of Entrainment on Dialogue Acts.
Proceedings of the SIGDIAL 2016 Conference, 2016

Cultural Communication Idiosyncrasies in Human-Computer Interaction.
Proceedings of the SIGDIAL 2016 Conference, 2016

Selecting Syntactic, Non-redundant Segments in Active Learning for Machine Translation.
Proceedings of the NAACL HLT 2016, 2016

Optimizing Computer-Assisted Transcription Quality with Iterative User Interfaces.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Construction of Japanese Audio-Visual Emotion Database and Its Application in Emotion Recognition.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Active Learning for Example-Based Dialog Systems.
Proceedings of the Dialogues with Social Robots, 2016

Unsupervised Phoneme Segmentation of Previously Unseen Languages.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Unsupervised Joint Estimation of Grapheme-to-Phoneme Conversion Systems and Acoustic Model Adaptation for Non-Native Speech Recognition.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Acoustic-to-Articulatory Inversion Mapping Based on Latent Trajectory Gaussian Mixture Model.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

The NU-NAIST Voice Conversion System for the Voice Conversion Challenge 2016.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Supervised Learning of Acoustic Models in a Zero Resource Setting to Improve DPGMM Clustering.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

A Hybrid System for Continuous Word-Level Emphasis Modeling Based on HMM State Clustering and Adaptive Training.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Transferring Emphasis in Speech Translation Using Hard-Attentional Neural Network Models.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Gated Recurrent Neural Tensor Network.
Proceedings of the 2016 International Joint Conference on Neural Networks, 2016

Fast text anonymization using k-anonyminity.
Proceedings of the 18th International Conference on Information Integration and Web-based Applications and Services, 2016

Automatic detection of very early stage of dementia through multimodal interaction with computer avatars.
Proceedings of the 18th ACM International Conference on Multimodal Interaction, 2016

Personalized unknown word detection in non-native language reading using eye gaze.
Proceedings of the 18th ACM International Conference on Multimodal Interaction, 2016

An estimation method of voice timbre evaluation values using feature extraction with Gaussian mixture model based on reference singer.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Statistical F0 prediction for electrolaryngeal speech enhancement considering generative process of F0 contours within product of experts framework.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Noise suppression method for body-conducted soft speech enhancement based on external noise monitoring.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Implementation of F0 transformation for statistical singing voice conversion based on direct waveform modification.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Real-time vibration control of an electrolarynx based on statistical F0 contour prediction.
Proceedings of the 24th European Signal Processing Conference, 2016

Incorporating Discrete Translation Lexicons into Neural Machine Translation.
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016

Learning a Lexicon and Translation Model from Phoneme Lattices.
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016

Automated social skills training with audiovisual information.
Proceedings of the 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2016

Removing noise from event-related potentials using a probabilistic generative model with grouped covariance matrices.
Proceedings of the 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2016

A Continuous Space Rule Selection Model for Syntax-based Statistical Machine Translation.
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016

2015
Multichannel Signal Separation Combining Directional Clustering and Nonnegative Matrix Factorization with Spectrogram Restoration.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Semantic Parsing of Ambiguous Input through Paraphrasing and Verification.
Trans. Assoc. Comput. Linguistics, 2015

NOCOA+: Multimodal Computer-Based Training for Social and Communication Skills.
IEICE Trans. Inf. Syst., 2015

An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering.
Proceedings of the Tenth Workshop on Statistical Machine Translation, 2015

Reinforcement Learning in Multi-Party Trading Dialog.
Proceedings of the SIGDIAL 2015 Conference, 2015

Keynote speech 3: Toward simultaneous, natural and multimodal speech-to-speech translation.
Proceedings of the 2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2015

Message of the O-COCOSDA Convener.
Proceedings of the 2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2015

Construction and analysis of social-affective interaction corpus in English and Indonesian.
Proceedings of the 2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2015

Ckylark: A More Robust PCFG-LA Parser.
Proceedings of the NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31, 2015

Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation (T).
Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, 2015

Pseudogen: A Tool to Automatically Generate Pseudo-Code from Source Code.
Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, 2015

Parser self-training for syntax-based machine translation.
Proceedings of the 12th International Workshop on Spoken Language Translation: Papers, 2015

The NAIST English speech recognition system for IWSLT 2015.
Proceedings of the 12th International Workshop on Spoken Language Translation: Evaluation Campaign@IWSLT 2015, 2015

Improving translation of emphasis with pause prediction in speech-to-speech translation systems.
Proceedings of the 12th International Workshop on Spoken Language Translation: Papers, 2015

Automated Social Skills Trainer.
Proceedings of the 20th International Conference on Intelligent User Interfaces, 2015

Context awareness and priority control for ITS based on automatic speech recognition.
Proceedings of the 14th International Conference on ITS Telecommunications, 2015

Articulatory controllable speech modification based on Gaussian mixture models with direct waveform modification using spectrum differential.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Modulation spectrum-constrained trajectory training algorithm for HMM-based speech synthesis.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Non-audible murmur enhancement based on statistical conversion using air- and body-conductive microphones in noisy environments.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Non-native speech synthesis preserving speaker individuality based on partial correction of prosodic and phonetic characteristics.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A latent variable model for joint pause prediction and dependency parsing.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Speed or accuracy? a study in evaluation of simultaneous speech translation.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Statistical singing voice conversion based on direct waveform modification with global variance.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Preserving word-level emphasis in speech-to-speech translation using linear regression HSMMs.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Combination of two-dimensional cochleogram and spectrogram features for deep learning-based ASR.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Modulation spectrum-constrained trajectory training algorithm for GMM-based Voice Conversion.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Parameter generation algorithm considering Modulation Spectrum for HMM-based speech synthesis.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Statistical modeling of binaural signal and its application to binaural source separation.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

EEG signal enhancement using multi-channel wiener filter with a spatial correlation prior.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

WFST-based structural classification integrating dnn acoustic features and RNN language features for speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A Binarized Neural Network Joint Model for Machine Translation.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

An evaluation of EEG ocular artifact removal with a multi-channel wiener filter based on probabilistic generative model.
Proceedings of the 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2015

The NAIST Text-to-Speech System for the Blizzard Challenge 2015.
Proceedings of the Blizzard Challenge 2015, 2015

An Enhanced Electrolarynx with Automatic Fundamental Frequency Control based on Statistical Prediction.
Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility, 2015

Stochastic Gradient Variational Bayes for deep learning-based ASR.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Incremental sentence compression using LSTM recurrent networks.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Adaptive selection from multiple response candidates in example-based dialogue.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

A study of social-affective communication: Automatic prediction of emotion triggers and responses in television talk shows.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

The NAIST ASR system for the 2015 Multi-Genre Broadcast challenge: On combination of deep learning systems using a rank-score function.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Neural Reranking Improves Subjective Quality of Machine Translation: NAIST at WAT2015.
Proceedings of the 2nd Workshop on Asian Translation, 2015

Syntax-based Simultaneous Translation through Prediction of Unseen Syntactic Constituents.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015

Improving Pivot Translation by Remembering the Pivot.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015

An Analysis Towards Dialogue-Based Deception Detection.
Proceedings of the Natural Language Dialog Systems and Intelligent Assistants, 2015

Unknown Word Detection Based on Event-Related Brain Desynchronization Responses.
Proceedings of the Natural Language Dialog Systems and Intelligent Assistants, 2015

Linguistic Individuality Transformation for Spoken Language.
Proceedings of the Natural Language Dialog Systems and Intelligent Assistants, 2015

A Study on Natural Expressive Speech: Automatic Memorable Spoken Quote Detection.
Proceedings of the Natural Language Dialog Systems and Intelligent Assistants, 2015

Evaluation of a Fully Automatic Cooperative Persuasive Dialogue System.
Proceedings of the Natural Language Dialog Systems and Intelligent Assistants, 2015

2014
Segmentation for Efficient Supervised Language Annotation with an Explicit Cost-Utility Tradeoff.
Trans. Assoc. Comput. Linguistics, 2014

Musical-noise-free blind speech extraction integrating microphone array and iterative spectral subtraction.
Signal Process., 2014

Parameter Generation Methods With Rich Context Models for High-Quality and Flexible Text-To-Speech Synthesis.
IEEE J. Sel. Top. Signal Process., 2014

Variable Selection Linear Regression for Robust Speech Recognition.
IEICE Trans. Inf. Syst., 2014

A Hybrid Approach to Electrolaryngeal Speech Enhancement Based on Noise Reduction and Statistical Excitation Generation.
IEICE Trans. Inf. Syst., 2014

Utilizing Human-to-Human Conversation Examples for a Multi Domain Chat-Oriented Dialog System.
IEICE Trans. Inf. Syst., 2014

Structured Adaptive Regularization of Weight Vectors for a Robust Grapheme-to-Phoneme Conversion Model.
IEICE Trans. Inf. Syst., 2014

Voice Timbre Control Based on Perceived Age in Singing Voice Conversion.
IEICE Trans. Inf. Syst., 2014

Rule-based Syntactic Preprocessing for Syntax-based Machine Translation.
Proceedings of SSST@EMNLP 2014, 2014

Recent progress in developing grapheme-based speech recognition for Indonesian ethnic languages: Javanese, Sundanese, Balinese and Bataks.
Proceedings of the 4th Workshop on Spoken Language Technologies for Under-resourced Languages, 2014

Towards real-time multilingual multimodal speech-to-speech translation.
Proceedings of the 4th Workshop on Spoken Language Technologies for Under-resourced Languages, 2014

On-the-fly user modeling for cost-sensitive correction of speech transcripts.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Improving the robustness of example-based dialog retrieval using recursive neural network paraphrase identification.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Emotion recognition on Indonesian television talk shows.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Conversation dialog corpora from television and movie scripts.
Proceedings of the 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014

Building a free, general-domain paraphrase database for Japanese.
Proceedings of the 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014

Construction and analysis of Indonesian Emotional Speech Corpus.
Proceedings of the 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014

Memorable spoken quote corpora of TED public speaking.
Proceedings of the 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014

Collection and analysis of a Japanese-English emphasized speech corpora.
Proceedings of the 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014

Collection of a Simultaneous Translation Corpus for Comparative Analysis.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Towards Multilingual Conversations in the Medical Domain: Development of Multilingual Medical Data and A Network-based ASR System.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Emotion and Its Triggers in Human Spoken Dialogue: Recognition and Analysis.
Proceedings of the Situated Dialog in Speech-Based Human-Computer Interaction, 2014

Construction and Analysis of a Persuasive Dialogue Corpus.
Proceedings of the Situated Dialog in Speech-Based Human-Computer Interaction, 2014

Articulatory controllable speech modification based on statistical feature mapping with Gaussian mixture models.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Direct F<sub>0</sub> control of an electrolarynx based on statistical excitation feature prediction and its evaluation through simulation.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Data-driven generation of text balloons based on linguistic and acoustic features of a comics-anime corpus.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Structured soft margin confidence weighted learning for grapheme-to-phoneme conversion.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Statistical singing voice conversion with direct waveform modification based on the spectrum differential.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A hearing impairment simulation method using audiogram-based approximation of auditory charatecteristics.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

An evaluation of excitation feature prediction in a hybrid approach to electrolaryngeal speech enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2014

A postfilter to modify the modulation spectrum in HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2014

Music signal separation based on Bayesian spectral amplitude estimator with automatic target prior adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2014

Narrow Adaptive Regularization of weights for grapheme-to-phoneme conversion.
Proceedings of the IEEE International Conference on Acoustics, 2014

Regression approaches to perceptual age control in singing voice conversion.
Proceedings of the IEEE International Conference on Acoustics, 2014

Theoretical analysis of biased MMSE short-time spectral amplitude estimator and its extension to musical-noise-free speech enhancement.
Proceedings of the 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays, 2014

Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation.
Proceedings of the 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays, 2014

Optimized joint noise suppression and dereverberation based on blind signal extraction for hands-free speech recognition system.
Proceedings of the 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays, 2014

Modified post-filter to recover modulation spectrum for HMM-based speech synthesis.
Proceedings of the 2014 IEEE Global Conference on Signal and Information Processing, 2014

Acquiring a Dictionary of Emotion-Provoking Events.
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 2014

Reinforcement Learning of Cooperative Persuasive Dialogue Policies using Framing.
Proceedings of the COLING 2014, 2014

Discriminative Language Models as a Tool for Machine Translation Error Analysis.
Proceedings of the COLING 2014, 2014

Unnecessary utterance detection for avoiding digressions in discussion.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

An evaluation of target speech for a nonaudible murmur enhancement system in noisy environments.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

An inter-speaker evaluation through simulation of electrolarynx control based on statistical F0 prediction.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

Modulation spectrum-based post-filter for GMM-based Voice Conversion.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

An event-related brain potential study on the impact of speech recognition errors.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

Recursive neural network paraphrase identification for example-based dialog retrieval.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

The use of semantic and acoustic features for open-domain TED talk summarization.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

Gender-dependent spectrum differential models for perceived age control based on direct waveform modification in singing voice conversion.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

Hybrid multichannel signal separation using supervised nonnegative matrix factorization with spectrogram restoration.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

Optimizing Segmentation Strategies for Simultaneous Speech Translation.
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014

Linguistic and Acoustic Features for Automatic Identification of Autism Spectrum Disorders in Children's Narrative.
Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 2014

2013
A-STAR: Toward translating Asian spoken languages.
Comput. Speech Lang., 2013

Investigation of intra-speaker spectral parameter variation and its prediction towards improvement of spectral conversion metric.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Towards language preservation: Design and collection of graphemically balanced and parallel speech corpora of Indonesian ethnic languages.
Proceedings of the 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013

Multilingual Speech-to-Speech Translation System: VoiceTra.
Proceedings of the 2013 IEEE 14th International Conference on Mobile Data Management, Milan, Italy, June 3-6, 2013, 2013

Constructing a speech translation system using simultaneous interpretation data.
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers, 2013

The NAIST English speech recognition system for IWSLT 2013.
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign@IWSLT 2013, 2013

Incremental unsupervised training for university lecture recognition.
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers, 2013

A hybrid approach to electrolaryngeal speech enhancement based on spectral subtraction and statistical voice conversion.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Improvements to HMM-based speech synthesis based on parameter generation with rich context models.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Efficient speech transcription through respeaking.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

An empirical comparison of joint optimization techniques for speech translation.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

A digital signal processor implementation of silent/electrolaryngeal speech enhancement based on real-time statistical voice conversion.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Grapheme-to-phoneme conversion based on adaptive regularization of weight vectors.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

An investigation of acoustic features for singing voice conversion based on perceptual age.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Generalizing continuous-space translation of paralinguistic information.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Simple, lexicalized choice of translation timing for simultaneous speech translation.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Evaluation of a singing voice conversion method based on many-to-many eigenvoice conversion.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Modality and contextual differences in computer based non-verbal communication training.
Proceedings of the IEEE 4th International Conference on Cognitive Infocommunications, 2013

NAIST at the CLEF 2013 QA4MRE Pilot Task.
Proceedings of the Working Notes for CLEF 2013 Conference , 2013

Inter-Sentence Features and Thresholded Minimum Error Rate Training: NAIST at CLEF 2013 QA4MRE.
Proceedings of the Working Notes for CLEF 2013 Conference , 2013

Dialogue management for leading the conversation in persuasive dialogue systems.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Toward musical-noise-free blind speech extraction: Concept and its applications.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

Semi-blind algorithm for joint noise suppression and dereverberation based on higher-order statistics and acoustic model likelihood.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

Towards High-Reliability Speech Translation in the Medical Domain.
Proceedings of the First Workshop on Natural Language Processing for Medical and Healthcare Fields@IJCNLP 2013, 2013

2012
Distributed speech translation technologies for multiparty multilingual communication.
ACM Trans. Speech Lang. Process., 2012

Sequence-Based Pronunciation Variation Modeling for Spontaneous ASR Using a Noisy Channel Approach.
IEICE Trans. Inf. Syst., 2012

Minimum Bayes-Risk decoding extended with similar examples: NAIST-NICT at IWSLT 2012.
Proceedings of the 2012 International Workshop on Spoken Language Translation, 2012

The 2012 KIT and KIT-NAIST English ASR systems for the IWSLT evaluation.
Proceedings of the 2012 International Workshop on Spoken Language Translation, 2012

The NAIST machine translation system for IWSLT2012.
Proceedings of the 2012 International Workshop on Spoken Language Translation, 2012

A method for translation of paralinguistic information.
Proceedings of the 2012 International Workshop on Spoken Language Translation, 2012

The KIT-NAIST (contrastive) English ASR system for IWSLT 2012.
Proceedings of the 2012 International Workshop on Spoken Language Translation, 2012

Developing Non-goal Dialog System Based on Examples of Drama Television.
Proceedings of the Natural Interaction with Robots, 2012

An Evaluation of Parameter Generation Methods with Rich Context Models in HMM-Based Speech Synthesis.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

A bootstrapping approach for SLU portability to a new language by inducting unannotated user queries.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Non-verbal cognitive skills and autistic conditions: An analysis and training tool.
Proceedings of the IEEE 3rd International Conference on Cognitive Infocommunications, 2012

Singing voice conversion method based on many-to-many eigenvoice conversion and training data generation using a singing-to-singing synthesis system.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

2011
Modeling spoken decision support dialogue and optimization of its dialogue strategy.
ACM Trans. Speech Lang. Process., 2011

Temporal modulation normalization for robust speech feature extraction and recognition.
Multim. Tools Appl., 2011

A Bayesian Model of Transliteration and Its Human Evaluation When Integrated into a Machine Translation System.
IEICE Trans. Inf. Syst., 2011

Sub-band temporal modulation envelopes and their normalization for automatic speech recognition in reverberant environments.
Comput. Speech Lang., 2011

Situated Spoken Dialogue with Robots Using Active Learning.
Adv. Robotics, 2011

Learning, Generation and Recognition of Motions by Reference-Point-Dependent Probabilistic Models.
Adv. Robotics, 2011

Toward Construction of Spoken Dialogue System that Evokes Users' Spontaneous Backchannels.
Proceedings of the SIGDIAL 2011 Conference, 2011

Conditional Random Fields for Modeling Korean Pronunciation Variation.
Proceedings of the Paralinguistic Information and its Integration in Spoken Dialogue Systems, 2011

Analysis on Effects of Text-to-Speech and Avatar Agent in Evoking Users' Spontaneous Listener's Reactions.
Proceedings of the Paralinguistic Information and its Integration in Spoken Dialogue Systems, 2011

User Study of Spoken Decision Support System.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Adaptive Regularization Framework for Robust Voice Activity Detection.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

A sampling-based environment population projection approach for rapid acoustic model adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2011

Increasing discriminative capability on MAP-based mapping function estimation for acoustic model adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2011

Unsupervised determination of efficient Korean LVCSR units using a Bayesian Dirichlet process model.
Proceedings of the IEEE International Conference on Acoustics, 2011

Providing Immersive Virtual Experience with First-Person Perspective Omnidirectional Movies and Three Dimensional Sound Field.
Proceedings of the Virtual and Mixed Reality - New Trends, 2011

3-D Sound Reproduction System for Immersive Environments Based on the Boundary Surface Control Principle.
Proceedings of the Virtual and Mixed Reality - New Trends, 2011

Blind noise suppression for Non-Audible Murmur recognition with stereo signal processing.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

Dialogue Acts Annotation to Construct Dialogue Systems for Consulting.
Proceedings of the Spoken Dialogue Systems Technology and Design, 2011

Online Learning of Bayes Risk-Based Optimization of Dialogue Management for Document Retrieval Systems with Speech Interface.
Proceedings of the Spoken Dialogue Systems Technology and Design, 2011

2010
Temporal contrast normalization and edge-preserved smoothing of temporal modulation structures of speech for robust speech recognition.
Speech Commun., 2010

An Unsupervised Model of Redundancy for Answer Validation.
IEICE Trans. Inf. Syst., 2010

Dialogue strategy optimization to assist user's decision for spoken consulting dialogue systems.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Integrating lip-synch into game production workflow: "Sengoku BASARA 3" (Copyright restrictions prevent ACM from providing the full text for this article).
Proceedings of the ACM SIGGRAPH ASIA 2010 Sketches, 2010

Modeling Spoken Decision Making Dialogue and Optimization of its Dialogue Strategy.
Proceedings of the SIGDIAL 2010 Conference, 2010

Dialogue Acts Annotation for NICT Kyoto Tour Dialogue Corpus to Construct Statistical Dialogue Systems.
Proceedings of the International Conference on Language Resources and Evaluation, 2010

A Study Toward an Evaluation Method for Spoken Dialogue Systems Considering User Criteria.
Proceedings of the Spoken Dialogue Systems for Ambient Environments, 2010

Sightseeing Guidance Systems Based on WFST-Based Dialogue Manager.
Proceedings of the Spoken Dialogue Systems for Ambient Environments, 2010

Construction and Experiment of a Spoken Consulting Dialogue System.
Proceedings of the Spoken Dialogue Systems for Ambient Environments, 2010

Evaluation of Facial Direction Estimation from Cameras for Multi-modal Spoken Dialog System.
Proceedings of the Spoken Dialogue Systems for Ambient Environments, 2010

Expansion of WFST-Based Dialog Management for Handling Multiple ASR Hypotheses.
Proceedings of the Spoken Dialogue Systems for Ambient Environments, 2010

Sequence-Based Pronunciation Modeling Using a Noisy-Channel Approach.
Proceedings of the Spoken Dialogue Systems for Ambient Environments, 2010

Korean pronunciation variation modeling with probabilistic Bayesian networks.
Proceedings of the 4th International Universal Communication Symposium, 2010

Web text classification for response generation in spoken decision support dialogue systems.
Proceedings of the 4th International Universal Communication Symposium, 2010

Improving spontaneous English ASR using a joint-sequence pronunciation model.
Proceedings of the 4th International Universal Communication Symposium, 2010

An environment structuring framework to facilitating suitable prior density estimation for MAPLR on robust speech recognition.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Speech enhancement as a functional approximation and generalization.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Active learning of confidence measure function in robot language acquisition framework.
Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010

Utilizing a noisy-channel approach for Korean LVCSR.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Voice activity detection in a reguarized reproducing kernel hilbert space.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Construction and evaluations of an annotated Chinese conversational corpus in travel domain for the language model of speech recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Cluster-based language model for spoken document retrieval using NMF-based document clustering.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Brazilian portuguese acoustic model training based on data borrowing from other language.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Spoken Dialog System on Plasma Display Panel Estimating Users' Interest by Image Processing.
Proceedings of the Workshops Proceedings of the 6th International Conference on Intelligent Environments, 2010

NICT Blizzard Challenge 2010 Entry.
Proceedings of the Blizzard Challenge 2010, Kansai Science City, Japan, September 25, 2010, 2010

CENSREC-1-AV: an audio-visual corpus for noisy bimodal speech recognition.
Proceedings of the Auditory-Visual Speech Processing, 2010

Active Learning for Generating Motion and Utterances in Object Manipulation Dialogue Tasks.
Proceedings of the Dialog with Robots, 2010

2009
Incorporating Knowledge Sources into Statistical Speech Recognition
Lecture Notes in Electrical Engineering 42, Springer, ISBN: 978-0-387-85829-6, 2009

Consolidation-Based Speech Translation and Evaluation Approach.
IEICE Trans. Inf. Syst., 2009

Class-Dependent Modeling for Dialog Translation.
IEICE Trans. Inf. Syst., 2009

Automatic pronunciation scoring of words and sentences independent from the non-native's first language.
Comput. Speech Lang., 2009

On the Importance of Pivot Language Selection for Statistical Machine Translation.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, May 31, 2009

Network-based speech-to-speech translation.
Proceedings of the 2009 International Workshop on Spoken Language Translation, 2009

Soft margin estimation on improving environment structures for ensemble speaker and speaking environment modeling.
Proceedings of the 3rd International Universal Communication Symposium, 2009

Dialogue act annotation for consulting dialogue corpus.
Proceedings of the 3rd International Universal Communication Symposium, 2009

Hyperbolic structure of fundamental frequency contour.
Proceedings of the 3rd International Universal Communication Symposium, 2009

Normalization on the modulation spectrum of the subband temporal envelopes for automatic speech recognition in reverberant environments.
Proceedings of the 3rd International Universal Communication Symposium, 2009

Spoken document retrieval using topic models.
Proceedings of the 3rd International Universal Communication Symposium, 2009

Evaluation for WFST-based dialog management.
Proceedings of the 3rd International Universal Communication Symposium, 2009

Bayesian learning of confidence measure function for generation of utterances and motions in object manipulation dialogue task.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

A close look into the probabilistic concatenation model for corpus-based speech synthesis.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Annotating communicative function and semantic content in dialogue act for construction of consulting dialogue systems.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

A study on soft margin estimation of linear regression parameters for speaker adaptation.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

A decision tree-based clustering approach to state definition in an excitation modeling framework for HMM-based speech synthesis.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Subband temporal modulation spectrum normalization for automatic speech recognition in reverberant environments.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Recent advances in WFST-based dialog system.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Optimal learning of P-Layer additive F0 models with cross-validation.
Proceedings of the IEEE International Conference on Acoustics, 2009

CART-based modeling of Chinese tonal patterns with a functional model tracing the fundamental frequency trajectories.
Proceedings of the IEEE International Conference on Acoustics, 2009

Temporal contrast normalization and edge-preserved smoothing on temporal modulation structure for robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2009

Statistical dialog management applied to WFST-based dialog systems.
Proceedings of the IEEE International Conference on Acoustics, 2009

Automatic voice assignment tool for Instant Casting movie System.
Proceedings of the IEEE International Conference on Acoustics, 2009

The NICT Entry for the Blizzard Challenge 2009: an Enhanced HMM-based Speech Synthesis System with Trajectory Training considering Global Variance and State-Dependent Mixed Excitation.
Proceedings of the Blizzard Challenge 2009, Edinburgh, Scotland, UK, September 4, 2009, 2009

MAP estimation of online mapping parameters in ensemble speaker and speaking environment modeling.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

The Asian network-based speech-to-speech translation system.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

Weighted finite state transducer based statistical dialog management.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

Japanese Spontaneous Spoken Document Retrieval Using NMF-Based Topic Models.
Proceedings of the Information Retrieval Technology, 2009

Annotating Dialogue Acts to Construct Dialogue Systems for Consulting.
Proceedings of the 7th Workshop on Asian Language Resources, 2009

Construction of Chinese Segmented and POS-tagged Conversational Corpora and Their Evaluations on Spontaneous Speech Recognitions.
Proceedings of the 7th Workshop on Asian Language Resources, 2009

2008
A Robust Speech Recognition System for Communication Robots in Noisy Environments.
IEEE Trans. Robotics, 2008

Efficient lip-synch tool for 3D cartoon animation.
Comput. Animat. Virtual Worlds, 2008

A Study on Cross Transformation of Mongolian Language.
Inf. Media Technol., 2008

An Improved Greedy Search Algorithm for the Development of a Phonetically Rich Speech Corpus.
IEICE Trans. Inf. Syst., 2008

Using Mutual Information Criterion to Design an Efficient Phoneme Set for Chinese Speech Recognition.
IEICE Trans. Inf. Syst., 2008

Post-recording tool for instant casting movie system.
Proceedings of the 16th International Conference on Multimedia 2008, 2008

Spoken Dialog System for Next Generation Knowledge Access.
Proceedings of the 9th International Conference on Mobile Data Management (MDM 2008), 2008

Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments: newest Part of the CENSREC Series -.
Proceedings of the International Conference on Language Resources and Evaluation, 2008

Probabilistic Pronunciation Variation Model Based on Bayesian Network for Conversational Speech Recognition.
Proceedings of the ISUC 2008, 2008

Dialogue Act Annotation for Statistically Managed Spoken Dialogue Systems.
Proceedings of the ISUC 2008, 2008

Prosody Modeling from Tone to Intonation in Chinese using a Functional F0 Model.
Proceedings of the ISUC 2008, 2008

Normalization on Temporal Modulation Transfer Function for Robust Speech Recognition.
Proceedings of the ISUC 2008, 2008

A Statistical Approach to Expandable Spoken Dialog Systems using WFSTs.
Proceedings of the ISUC 2008, 2008

Simultaneous Acoustic, Prosodic, and Phrasing Model Training for TTs Conversion Systems.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Frequency Modulation Technique for Prosodic Modification.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Noise Reduction Based Random Matrix Theory.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

CENSREC-4: development of evaluation framework for distant-talking speech recognition under reverberant environments.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Improved novelty detection for online GMM based speaker diarization.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Dialog management using weighted finite-state transducers.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Development of Indonesian Large Vocabulary Continuous Speech Recognition System within A-STAR Project.
Proceedings of the Third International Joint Conference on Natural Language Processing, 2008

Multilingual Mobile-Phone Translation Services for World Travelers.
Proceedings of the COLING 2008, 2008

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008.
Proceedings of the Blizzard Challenge 2008, 2008

2007
Incorporating Knowledge Sources Into a Statistical Acoustic Model for Spoken Language Communication Systems.
IEEE Trans. Computers, 2007

Out-of-Domain Utterance Detection Using Classification Confidences of Multiple Topics.
IEEE Trans. Speech Audio Process., 2007

Multichannel Bin-Wise Robust Frequency-Domain Adaptive Filtering and Its Application to Adaptive Beamforming.
IEEE Trans. Speech Audio Process., 2007

Communicative speech synthesis with XIMERA: a first step.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

Data-driven efficient production of cartoon character animation.
Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2007

Acoustic Features for Estimation of Perceptional Similarity.
Proceedings of the Advances in Multimedia Information Processing, 2007

An HMM acoustic model incorporating various additional knowledge sources.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Never-ending learning with dynamic hidden Markov network.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Use of Poisson Processes to Generate Fundamental Frequency Contours.
Proceedings of the IEEE International Conference on Acoustics, 2007

A method to integrate additional knowledge sources into HMM based on junction tree decomposition.
Proceedings of the 15th European Signal Processing Conference, 2007

ATRECSS - ATR English speech corpus for speech synthesis.
Proceedings of the Evaluation of text-to-speech systems: Blizzard Challenge 2007, 2007

Never-ending learning system for on-line speaker diarization.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

Development of VAD evaluation framework CENSREC-1-C and investigation of relationship between VAD and speech recognition performance.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

NICT-ATR Speech-to-Speech Translation System.
Proceedings of the ACL 2007, 2007

2006
The ATR multilingual speech-to-speech translation system.
IEEE Trans. Speech Audio Process., 2006

HMM-based noise-robust feature compensation.
Speech Commun., 2006

Integration of articulatory and spectrum features based on the hybrid HMM/BN modeling framework.
Speech Commun., 2006

Improving Acoustic Model Precision by Incorporating a Wide Phonetic Context Based on a Bayesian Framework.
IEICE Trans. Inf. Syst., 2006

A Hybrid HMM/BN Acoustic Model Utilizing Pentaphone-Context Dependency.
IEICE Trans. Inf. Syst., 2006

Special Section on Statistical Modeling for Speech Processing.
IEICE Trans. Inf. Syst., 2006

ATR Parallel Decoding Based Speech Recognition System Robust to Noise and Speaking Styles.
IEICE Trans. Inf. Syst., 2006

Using Hybrid HMM/BN Acoustic Models: Design and Implementation Issues.
IEICE Trans. Inf. Syst., 2006

CENSREC-3: An Evaluation Framework for Japanese Speech Recognition in Real Car-Driving Environments.
IEICE Trans. Inf. Syst., 2006

A Non-stationary Noise Suppression Method Based on Particle Filtering and Polyak Averaging.
IEICE Trans. Inf. Syst., 2006

Lip-sync animation from HMM using dynamic features.
Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2006

Key-frame removal method for blendshape-based cartoon lip-sync animation.
Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2006

Developing Client-Server Speech Translation Platform.
Proceedings of the 7th International Conference on Mobile Data Management (MDM 2006), 2006

Oriental COCOSDA: Past, Present and Future.
Proceedings of the Fifth International Conference on Language Resources and Evaluation, 2006

Development of client-server speech translation system on a multi-lingual speech communication platform.
Proceedings of the 2006 International Workshop on Spoken Language Translation, 2006

Speech recognition of foreign out-of-vocabulary words using a hierarchical language model.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

The use of Bayesian network for incorporating accent, gender and wide-context dependency information.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

CENSREC2: corpus and evaluation environments for in car continuous digit speech recognition.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Forward-backwards training of hybrid HMM/BN acoustic models.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Automatic Derivation of a Phoneme Set with Tone Information for Chinese Speech Recognition Based on Mutual Information Criterion.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Incorporation of Pentaphone-Context Dependency Based on Hybrid Hmm/Bn Acoustic Modeling Framework.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Sequential Non-Stationary Noise Tracking Using Particle Filtering with Switching Dynamical System.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Robust Speech Recognition System for Communication Robots in Real Environments.
Proceedings of the 2006 6th IEEE-RAS International Conference on Humanoid Robots, 2006

Developing a Test Bed of English Text-to-Speech System XIMERA for the Blizzard Challenge 2006.
Proceedings of the Blizzard Challenge 2006, Pittsburgh, PA, USA, September 16, 2006, 2006

2005
Maximum likelihood sub-band adaptation for robust speech recognition.
Speech Commun., 2005

Tone nucleus-based multi-level robust acoustic tonal modeling of sentential F0 variations for Chinese continuous speech tone recognition.
Speech Commun., 2005

Construction of Audio-Visual Speech Corpus Using Motion-Capture System and Corpus Based Facial Animation.
IEICE Trans. Inf. Syst., 2005

AURORA-2J: An Evaluation Framework for Japanese Noisy Speech Recognition.
IEICE Trans. Inf. Syst., 2005

Dialogue Speech Recognition by Combining Hierarchical Topic Classification and Language Model Switching.
IEICE Trans. Inf. Syst., 2005

Automatic Generation of Non-uniform and Context-Dependent HMMs Based on the Variational Bayesian Approach.
IEICE Trans. Inf. Syst., 2005

Speech to talking heads system based on hidden Markov models.
Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2005

Automatic head-movement control for emotional speech.
Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2005

SoundWeb: Hyperlinked Voice Data for Wearable Computing Environment.
Proceedings of the Ninth IEEE International Symposium on Wearable Computers (ISWC 2005), 2005

Incorporating a Bayesian wide phonetic context model for acoustic rescoring.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Outlier detection for acoustic model training using robust statistics.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Spoken dialog system and its evaluation of geographic information system for elderly persons' mobility support.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

CENSREC-3: Data Collection for In-Car Speech Recognition and Its Common Evaluation Framework.
Proceedings of the 21st International Conference on Data Engineering Workshops, 2005

Online cepstral filtering using a sequential EM approach with Polyak averaging and feedback.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Modeling Successive Frame Dependencies with Hybrid HMM/BN Acoustic Model.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Joint optimization of LCMV beamforming and acoustic echo cancellation for automatic speech recognition.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Particle Filter Based Non-Stationary Noise Tracking for Robust Speech Recognition.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004
Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents.
Proceedings of the Life-like characters - tools, affective functions, and applications., 2004

A Robust Bimodal Speech Section Detection.
J. VLSI Signal Process., 2004

Simultaneous Recognition of Distant-Talking Speech of Multiple Talkers Based on the 3-D <i>N</i>-Best Search Method.
J. VLSI Signal Process., 2004

Introduction to the Special Issue on Spontaneous Speech Processing.
IEEE Trans. Speech Audio Process., 2004

Noise adaptive speech recognition based on sequential noise parameter estimation.
Speech Commun., 2004

Automatic Generation of Non-uniform HMM Topologies Based on the MDL Criterion.
IEICE Trans. Inf. Syst., 2004

Missing Feature Theory Applied to Robust Speech Recognition over IP Network.
IEICE Trans. Inf. Syst., 2004

Multimodal Translation System Using Texture-Mapped Lip-Sync Images for Video Mail and Automatic Dubbing Applications.
EURASIP J. Adv. Signal Process., 2004

Face expression synthesis based on a facial motion distribution chart.
Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2004

Multi-lingual speech recognition system for speech-to-speech translation.
Proceedings of the 2004 International Workshop on Spoken Language Translation, 2004

Generalized posterior probability for minimizing verification errors at subword, word and sentence levels.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

Efficient tone classification of speaker independent continuous Chinese speech using anchoring based discriminating features.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Optimal acoustic and language model weights for minimizing word verification errors.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

HMM-based feature compensation method: an evaluation using the AURORA2.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Indonesian speech recognition for hearing and speaking impaired people.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Online minimum mean square error filtering of noisy cepstral coefficients using a sequential EM algorithm.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Speech recognition system robust to noise and speaking styles.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Integration of articulatory dynamic parameters in HMM/BN based speech recognition system.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Robust verification of recognized words in noise.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Topic classification and verification modeling for out-of-domain utterance detection.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Increasing the mixture components of non-uniform HMM structures based on a variational Bayesian approach.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

A statistical lexicon for non-native speech recognition.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Speech recognition for multiple non-native accent groups with speaker-group-dependent acoustic models.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Minimum mean square error filtering of noisy cepstral coefficients with applications to ASR.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Out-of-domain detection based on confidence measures from multiple topic classification.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Automatic generation of non-uniform HMM structures based on variational Bayesian approach.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Joint optimization of LCMV beamforming and acoustic echo cancellation.
Proceedings of the 2004 12th European Signal Processing Conference, 2004

2003
Cepstrum derived from differentiated power spectrum for robust speech recognition.
Speech Commun., 2003

Multiple beamforming with source localization based on CSP analysis.
Syst. Comput. Jpn., 2003

Model-based talking face synthesis for anthropomorphic spoken dialog agent system.
Proceedings of the Eleventh ACM International Conference on Multimedia, 2003

Maximum likelihood sub-band weighting for robust speech recognition.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Model based noisy speech recognition with environment parameters estimated by noise adaptive speech recognition with prior.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Integration of noise reduction algorithms for Aurora2 task.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Adaptation of acoustic model using the gain-adapted HMM decomposition method.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Environmental sound source identification based on hidden Markov model for robust speech recognition.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Noise reduction using paired-microphones on non-equally-spaced microphone arrangement.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Hybrid HMM/BN ASR system integrating spectrum and articulatory features.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Hierarchical topic classification for dialog speech recognition based on language model switching.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Automatic generation of non-uniform context-dependent HMM topologies based on the MDL criterion.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

A semi-blind source separation method for hands-free speech recognition of multiple talkers.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Detection and separation of speech segment using audio and video information fusion.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

A multilevel framework to model the inherently confounding nature of sentential F0sentential F0 contours contours for recognizing Chinese lexical tones.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

An evaluation of adaptive beamformer based on average speech spectrum for noisy speech recognition.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Hybrid HMM/BN LVCSR system integrating multiple acoustic features.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002
Statistical multimodal integration for audio-visual speech processing.
IEEE Trans. Neural Networks, 2002

Distant-talking speech recognition based on a 3-D Viterbi search using a microphone array.
IEEE Trans. Speech Audio Process., 2002

The Present Status of Speech Database in Japan: Development, Management, and Application to Speech Research.
Proceedings of the Third International Conference on Language Resources and Evaluation, 2002

Modeling varying pauses to develop robust acoustic models for recognizing noisy conversational speech.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Evaluation of a noise adaptive speech recognition system on the Aurora 3 database.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Noise adaptive speech recognition with acoustic models trained from noisy speech evaluated on Aurora-2 database.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Speaking rate compensation based on likelihood criterion in acoustic model training and decoding.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Suitable design of adaptive beamformer based on average speech spectrum for noisy speech recognition.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

The 2ch hybrid subtractive beamformer applied to line sound sources.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Modeling HMM state distributions with Bayesian networks.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

HMM COmposition-based rapid model adaptation using a priori noise GMM adaptation evaluation on Aurora2 corpus.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Weighted graph based decision tree optimization for high accuracy acoustic modeling.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Multi-Modal Temporal Asynchronicity Modeling by Product HMMs for Robust.
Proceedings of the 4th IEEE International Conference on Multimodal Interfaces (ICMI 2002), 2002

3-D N-Best Search for Simultaneous Recognition of Distant-Talking Speech of Multiple Talkers.
Proceedings of the 4th IEEE International Conference on Multimodal Interfaces (ICMI 2002), 2002

Multi-Modal Translation System and Its Evaluation.
Proceedings of the 4th IEEE International Conference on Multimodal Interfaces (ICMI 2002), 2002

An evaluation of sound source identification with RWCP sound scene database in real acoustic environments.
Proceedings of the 2002 IEEE International Conference on Multimedia and Expo, 2002

Design and collection of acoustic sound data for hands-free speech recognition and sound scene understanding.
Proceedings of the 2002 IEEE International Conference on Multimedia and Expo, 2002

Real time face detection for multimodal speech recognition.
Proceedings of the 2002 IEEE International Conference on Multimedia and Expo, 2002

Noise adaptive speech recognition in time-varying noise based on sequential kullback proximal algorithm.
Proceedings of the IEEE International Conference on Acoustics, 2002

Talker localization in a real acoustic environment based on DOA estimation and statistical sound source identification.
Proceedings of the IEEE International Conference on Acoustics, 2002

Robust bi-modal speech recognition based on state synchronous modeling and stream weight optimization.
Proceedings of the IEEE International Conference on Acoustics, 2002

Audio-visual speech translation with automatic lip syncqronization and face tracking based on 3-D head model.
Proceedings of the IEEE International Conference on Acoustics, 2002

2001
Speech-to-Lip Movement Synthesis by Maximizing Audio-Visual Joint Probability Based on the EM Algorithm.
J. VLSI Signal Process., 2001

HMM-separation-based speech recognition for a distant moving speaker.
IEEE Trans. Speech Audio Process., 2001

A Method of Key Input with Two Mice.
Proceedings of the 5th International Symposium on Wearable Computers (ISWC 2001), 2001

A hybrid approach to enhance task portability of acoustic models in Chinese speech recognition.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Sequential noise compensation by a sequential kullback proximal algorithm.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Feature extraction and model-based noise compensation for noisy speech recognition evaluated on AURORA 2 task.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Towards the creation of acoustic models for stressed Japanese speech.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Statistical sound source identification in a real acoustic environment for robust speech recognition using a microphone array.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Noise reduction using paired-microphones for both far-field and near-field sound sources.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Sub-band based additive noise removal for robust speech recognition.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Model-Based Lip Synchronization With Automatically Translated Systhetic Voice Toward A Multi-Modal Translation System.
Proceedings of the 2001 IEEE International Conference on Multimedia and Expo, 2001

Automatic Steering Of Microphone Array And Video Camera Toward Multi-Lingual Tele-Conference Through Speech-To-Speech Translation.
Proceedings of the 2001 IEEE International Conference on Multimedia and Expo, 2001

Speech Detection By Facial Image For Multimodal Speech Recognition.
Proceedings of the 2001 IEEE International Conference on Multimedia and Expo, 2001

Trends of Learning Technology Standard.
Proceedings of the 2001 IEEE International Conference on Multimedia and Expo, 2001

Automatic Face Tracking And Model Match-Move In Video Sequence Using 3d Face Model.
Proceedings of the 2001 IEEE International Conference on Multimedia and Expo, 2001

An Adaptive Integration Based On Product Hmm For Audio-Visual Speech Recognition.
Proceedings of the 2001 IEEE International Conference on Multimedia and Expo, 2001

Discriminative training of HMM using maximum normalized likelihood algorithm.
Proceedings of the IEEE International Conference on Acoustics, 2001

A microphone array-based 3-D N-best search algorithm for the simultaneous recognition of multiple sound sources in real environments.
Proceedings of the IEEE International Conference on Acoustics, 2001

Multimodal translation.
Proceedings of the Auditory-Visual Speech Processing, 2001

Fusion of Audio-Visual Information for Integrated Speech Processing.
Proceedings of the Audio- and Video-Based Biometric Person Authentication, 2001

2000
Speech enhancement based on the subspace method.
IEEE Trans. Speech Audio Process., 2000

Model adaptation by HMM decomposition and composition in noisy reverberant environments.
Syst. Comput. Jpn., 2000

Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-Free Speech Recognition.
Proceedings of the Second International Conference on Language Resources and Evaluation, 2000

Discriminating Chinese lexical tones by anchoring F0 features.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Residual noise compensation by a sequential EM algorithm for robust speech recognition in nonstationary noise.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Multimodal corpora for human-machine interaction research.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Stream weight optimization of speech and lip image sequence for audio-visual speech recognition.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Design of robust subtractive beamformer for noisy speech recognition.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Analysis of acoustic models trained on a large-scale Japanese speech database.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Frame level likelihood transformations for ASR and utterance verification.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Cellular-phone based speech-to-speech translation system ATR-MATRIX.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

A block cosine transform and its application in speech recognition.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Robust fundamental frequency estimation using instantaneous frequencies of harmonic components.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Speech-to-Face Movement Synthesis based on HMMS.
Proceedings of the 2000 IEEE International Conference on Multimedia and Expo, 2000

Speech recognition for a distant moving speaker based on HMM composition and separation.
Proceedings of the IEEE International Conference on Acoustics, 2000

Localization of multiple sound sources based on a CSP analysis with a microphone array.
Proceedings of the IEEE International Conference on Acoustics, 2000

1999
Data collection in real acoustical environments for sound scene understanding and hands-free speech recognition.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

Simultaneous recognition of multiple sound sources based on 3-d n-best search using microphone array.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

1998
Lip movement synthesis from speech based on Hidden Markov Models.
Speech Commun., 1998

Speech-to-lip movement synthesis maximizing audio-visual joint probability based on EM algorithm.
Proceedings of the Second IEEE Workshop on Multimedia Signal Processing, 1998

Compression algorithm of trigram language models based on maximum likelihood estimation.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Speech-to-lip movement synthesis based on the EM algorithm using audio-visual HMMs.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

An effect of adaptive beamforming on hands-free speech recognition based on 3-d viterbi search.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Evaluation of model adaptation by HMM decomposition on telephone speech recognition.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Creating speaker independent HMM models for restricted database using STRAIGHT-TEMPO morphing.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Hands-free speech recognition based on 3-D Viterbi search using a microphone array.
Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

Robust speech recognition in car environments.
Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

Efficient representation of short-time phase based on group delay.
Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

Subjective Evaluation for HMM-Based Speech-To-Lip Movement Synthesis.
Proceedings of the Auditory-Visual Speech Processing, 1998

1997
A non-iterative model-adaptive e-CMN/PMC approach for speech recognition in car environments.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Room acoustics and reverberation: impact on hands-free recognition.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Improved bimodal speech recognition using tied-mixture HMMs and 5000 word audio-visual synchronous database.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Microphone array design measures for hands-free speech recognition.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Maximum likelihood successive state splitting algorithm for tied-mixture HMNET.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Model adaptation based on HMM decomposition for reverberant speech recognition.
Proceedings of the 1997 IEEE International Conference on Acoustics, 1997

Speech to lip movement synthesis by HMM.
Proceedings of the ESCA Workshop on Audio-Visual Speech Processing, 1997

1996
Robust speech recognition with speaker localization by a microphone array.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

Noise and room acoustics distorted speech recognition by HMM composition.
Proceedings of the 1996 IEEE International Conference on Acoustics, 1996

1993
Robust word spotting in adverse car environments.
Proceedings of the Third European Conference on Speech Communication and Technology, 1993

1991
A neural speaker model for speaker clustering.
Proceedings of the 1991 International Conference on Acoustics, 1991

1990
Speaker weighted training of HMM using multiple reference speakers.
Proceedings of the First International Conference on Spoken Language Processing, 1990

A comparative study of spectral mapping for speaker adaptation.
Proceedings of the 1990 International Conference on Acoustics, 1990

Supplementation of HMM for articulatory variation in speaker adaptation.
Proceedings of the 1990 International Conference on Acoustics, 1990

ATR HMM-LR continuous speech recognition system.
Proceedings of the 1990 International Conference on Acoustics, 1990

1989
Speaker adaptation applied to HMM and neural networks.
Proceedings of the IEEE International Conference on Acoustics, 1989

1988
Voice conversion through vector quantization.
Proceedings of the IEEE International Conference on Acoustics, 1988


  Loading...