Sakriani Sakti

Orcid: 0000-0001-5509-8963

According to our database1, Sakriani Sakti authored at least 284 papers between 2004 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Neural End-To-End Speech Translation Leveraged by ASR Posterior Distribution.
IEICE Trans. Inf. Syst., 2024

A Neural Transformer Framework for Simultaneous Tasks of Segmentation, Classification, and Caller Identification of Marmoset Vocalization.
CoRR, 2024

Enhancing Indonesian Automatic Speech Recognition: Evaluating Multilingual Models with Diverse Speech Variabilities.
CoRR, 2024

On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition.
CoRR, 2024

Contrastive Feedback Mechanism for Simultaneous Speech Translation.
CoRR, 2024

NAIST Simultaneous Speech Translation System for IWSLT 2024.
CoRR, 2024

MAG-BERT-ARL for Fair Automated Video Interview Assessment.
IEEE Access, 2024

Applying Syntax-Prosody Mapping Hypothesis and Boundary-Driven Theory to Neural Sequence-to-Sequence Speech Synthesis.
IEEE Access, 2024

Refining rtMRI Landmark-Based Vocal Tract Contour Labels with FCN-Based Smoothing and Point-to-Curve Projection.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

2023
SpeeChain: A Speech Toolkit for Large-Scale Machine Speech Chain.
CoRR, 2023

Japanese Neural Incremental Text-to-Speech Synthesis Framework With an Accent Phrase Input.
IEEE Access, 2023

Exploring Difficulties Encountered by Professional Interpreters in Japanese-to-English and English-to-Japanese Simultaneous Translation.
Proceedings of the 26th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2023

Generating Speech with Prosodic Prominence based on SSL-Visually Grounded Models.
Proceedings of the 26th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2023

NAIST Simultaneous Speech-to-speech Translation System for IWSLT 2023.
Proceedings of the 20th International Conference on Spoken Language Translation, 2023

STEN-TTS: Improving Zero-shot Cross-Lingual Transfer for Multi-Lingual TTS with Style-Enhanced Normalization Diffusion Framework.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Unsupervised Learning of Discrete Latent Representations with Data-Adaptive Dimensionality from Continuous Speech Streams.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Self-Adaptive Incremental Machine Speech Chain for Lombard TTS with High-Granularity ASR Feedback in Dynamic Noise Condition.
Proceedings of the IEEE International Conference on Acoustics, 2023

An Isotropy Analysis for Self-Supervised Acoustic Unit Embeddings on the Zero Resource Speech Challenge 2021 Framework.
Proceedings of the IEEE International Conference on Acoustics, 2023

Speech Recognition and Meaning Interpretation: Towards Disambiguation of Structurally Ambiguous Spoken Utterances in Indonesian.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Leveraging the Multilingual Indonesian Ethnic Languages Dataset In Self-Supervised Models for Low-Resource ASR Task.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023


2022
Modeling Unsupervised Empirical Adaptation by DPGMM and DPGMM-RNN Hybrid Model to Extract Perceptual Features for Low-Resource ASR.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

A Machine Speech Chain Approach for Dynamically Adaptive Lombard TTS in Static and Dynamic Noise Environments.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Tackling multiple object tracking with complicated motions - Re-designing the integration of motion and appearance.
Image Vis. Comput., 2022

NusaCrowd: Open Source Initiative for Indonesian NLP Resources.
CoRR, 2022

Actor-identified Spatiotemporal Action Detection - Detecting Who Is Doing What in Videos.
CoRR, 2022

NIX-TTS: Lightweight and End-to-End Text-to-Speech Via Module-Wise Distillation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

NAIST Simultaneous Speech-to-Text Translation System for IWSLT 2022.
Proceedings of the 19th International Conference on Spoken Language Translation, 2022

Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR via Speech Chain Reconstruction and Self-Transcribing.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021
Instance-Level Heterogeneous Domain Adaptation for Limited-Labeled Sketch-to-Photo Retrieval.
IEEE Trans. Multim., 2021

Tackling Perception Bias in Unsupervised Phoneme Discovery Using DPGMM-RNN Hybrid Model and Functional Load.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

ReMOT: A model-agnostic refinement for multiple object tracking.
Image Vis. Comput., 2021

Neural Incremental Speech Recognition Toward Real-Time Machine Speech Translation.
IEICE Trans. Inf. Syst., 2021

Code-Switching ASR and TTS Using Semisupervised Learning with Machine Speech Chain.
IEICE Trans. Inf. Syst., 2021

Multimodal Chain: Cross-Modal Collaboration Through Listening, Speaking, and Visualizing.
IEEE Access, 2021

End-to-End Image-to-Speech Generation for Untranscribed Unknown Languages.
IEEE Access, 2021

Incorporating Discriminative DPGMM Posteriorgrams for Low-Resource ASR.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Transformer-Based Direct Speech-To-Speech Translation with Transcoder.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Multi-Encoder Sequential Attention Network for Context-Aware Speech Recognition in Japanese Dialog Conversation.
Proceedings of the 24th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2021

Using Local Phrase Dependency Structure Information in Neural Sequence-to-Sequence Speech Synthesis.
Proceedings of the 24th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2021

Simultaneous Speech-to-Speech Translation System with Transformer-Based Incremental ASR, MT, and TTS.
Proceedings of the 24th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2021

NAIST English-to-Japanese Simultaneous Translation System for IWSLT 2021 Simultaneous Text-to-text Task.
Proceedings of the 18th International Conference on Spoken Language Translation, 2021

Eliciting Cooperative Persuasive Dialogue by Multimodal Emotional Robot.
Proceedings of the Conversational AI for Natural Human-Centric Interaction, 2021

Transcribing Paralinguistic Acoustic Cues to Target Language Text in Transformer-Based Speech-to-Text Translation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Unsupervised Neural-Based Graph Clustering for Variable-Length Speech Representation Discovery of Zero-Resource Languages.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Dynamically Adaptive Machine Speech Chain Inference for TTS in Noisy Environment: Listen and Speak Louder.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

ASR Posterior-Based Loss for Multi-Task End-to-End Speech Translation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Weakly-Supervised Speech-to-Text Mapping with Visually Connected Non-Parallel Speech-Text Data Using Cyclic Partially-Aligned Transformer.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

2020
Corrections to "Machine Speech Chain".
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Machine Speech Chain.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

End-to-End Speech Translation With Transcoding by Multi-Task Learning for Distant Language Pairs.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Recurrent Neural Network Compression Based on Low-Rank Tensor Representation.
IEICE Trans. Inf. Syst., 2020

Leveraging Neural Caption Translation with Visually Grounded Paraphrase Augmentation.
IEICE Trans. Inf. Syst., 2020

Simultaneous Speech-to-Speech Translation System with Neural Incremental ASR, MT, and TTS.
CoRR, 2020

ReMOTS: Self-Supervised Refining Multi-Object Tracking and Segmentation.
CoRR, 2020

An Interactive Image Editing System Using an Uncertainty-Based Confirmation Strategy.
IEEE Access, 2020

Policy Reuse for Dialog Management Using Action-Relation Probability.
IEEE Access, 2020

Cross-Lingual Machine Speech Chain for Javanese, Sundanese, Balinese, and Bataks Speech Recognition and Synthesis.
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages and Collaboration and Computing for Under-Resourced Languages, 2020

Towards Speech Entrainment: Considering ASR Information in Speaking Rate Variation of TTS Waveform Generation.
Proceedings of the 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2020

Emotional Speech Corpus for Persuasive Dialogue System.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Neural Speech Completion.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Transformer VQ-VAE for Unsupervised Unit Discovery and Speech Synthesis: ZeroSpeech 2020 Challenge.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Combining Audio and Brain Activity for Predicting Speech Quality.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Incremental Machine Speech Chain Towards Enabling Listening While Speaking in Real-Time.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Augmenting Images for ASR and TTS Through Single-Loop and Dual-Loop Multimodal Chain Framework.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

The Zero Resource Speech Challenge 2020: Discovering Discrete Subword and Word Units.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Using Panoramic Videos for Multi-Person Localization and Tracking In A 3D Panoramic Coordinate.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Positive Emotion Elicitation in Chat-Based Dialogue Systems.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Neural Oscillation-Based Classification of Japanese Spoken Sentences During Speech Perception.
IEICE Trans. Inf. Syst., 2019

Electroencephalogram-Based Single-Trial Detection of Language Expectation Violations in Listening to Speech.
Frontiers Comput. Neurosci., 2019

From Speech Chain to Multimodal Chain: Leveraging Cross-modal Data Augmentation for Semi-supervised Learning.
CoRR, 2019

A Framework for Knowing Who is Doing What in Aerial Surveillance Videos.
IEEE Access, 2019

End-to-End Speech Recognition Sequence Training With Reinforcement Learning.
IEEE Access, 2019

Neural iTTS: Toward Synthesizing Speech in Real-time with End-to-end Neural Text-to-Speech Framework.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Phoneme-level speaking rate variation on waveform generation using GAN-TTS.
Proceedings of the 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2019

Recognition and translation of code-switching speech utterances.
Proceedings of the 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2019

Make Skeleton-based Action Recognition Model Smaller, Faster and Better.
Proceedings of the MMAsia '19: ACM Multimedia Asia, Beijing, China, December 16-18, 2019, 2019

Spoken Dialogue Robot for Watching Daily Life of Elderly People.
Proceedings of the Increasing Naturalness and Flexibility in Spoken Dialogue Interaction, 2019

VQVAE Unsupervised Unit Discovery and Multi-Scale Code2Spec Inverter for Zerospeech Challenge 2019.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Speech Quality Evaluation of Synthesized Japanese Speech Using EEG.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

The Zero Resource Speech Challenge 2019: TTS Without T.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Cross-lingual Speech-based Tobi Label Generation Using Bidirectional Lstm.
Proceedings of the IEEE International Conference on Acoustics, 2019

End-to-end Feedback Loss in Speech Chain Framework via Straight-through Estimator.
Proceedings of the IEEE International Conference on Acoustics, 2019

Speech Artifact Removal from Eeg Recordings of Spoken Word Production with Tensor Decomposition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Speech-to-Speech Translation Between Untranscribed Unknown Languages.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Zero-Shot Code-Switching ASR and TTS with Multilingual Machine Speech Chain.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Neural Machine Translation with Acoustic Embedding.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Listening While Speaking and Visualizing: Improving ASR Through Multimodal Chain.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
Dirichlet Process Mixture of Mixtures Model for Unsupervised Subword Modeling.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Sequence-to-Sequence Models for Emphasis Speech Translation.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

An end-to-end model for cross-lingual transformation of paralinguistic information.
Mach. Transl., 2018

Construction of Spontaneous Emotion Corpus from Indonesian TV Talk Shows and Its Application on Multimodal Emotion Recognition.
IEICE Trans. Inf. Syst., 2018

Learning Supervised Feature Transformations on Zero Resources for Improved Acoustic Unit Discovery.
IEICE Trans. Inf. Syst., 2018

Interactive Image Manipulation with Natural Language Instruction Commands.
CoRR, 2018

Optimizing DPGMM Clustering in Zero Resource Setting Based on Functional Load.
Proceedings of the 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages, 2018

Corpus Construction and Semantic Analysis of Indonesian Image Description.
Proceedings of the 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages, 2018

Multi-Scale Alignment and Contextual History for Attention Mechanism in Sequence-to-Sequence Model.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Adaptive Wavenet Vocoder for Residual Compensation in GAN-Based Voice Conversion.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Speech Chain for Semi-Supervised Learning of Japanese-English Code-Switching ASR and TTS.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Optimizing Neural Response Generator with Emotional Impact Information.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Toward Multi-Features Emphasis Speech Translation: Assessment of Human Emphasis Production and Perception with Speech and Text Clues.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Unsupervised Counselor Dialogue Clustering for Positive Emotion Elicitation in Neural Dialogue System.
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, 2018

Multi-Modal Multi-Task Deep Learning For Speaker And Emotion Recognition Of TV-Series Data.
Proceedings of the 2018 Oriental COCOSDA, 2018

Japanese-English Code-Switching Speech Data Construction.
Proceedings of the 2018 Oriental COCOSDA, 2018

Dialogue Scenario Collection of Persuasive Dialogue with Emotional Expressions via Crowdsourcing.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Construction of English-French Multimodal Affective Conversational Corpus from TV Dramas.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Using Spoken Word Posterior Features in Neural Machine Translation.
Proceedings of the 15th International Conference on Spoken Language Translation, 2018

Multi-paraphrase Augmentation to Leverage Neural Caption Translation.
Proceedings of the 15th International Conference on Spoken Language Translation, 2018

Impact of Deception Information on Negotiation Dialog Management: A Case Study on Doctor-Patient Conversations.
Proceedings of the 9th International Workshop on Spoken Dialogue System Technology, 2018

Incremental TTS for Japanese Language.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Machine Speech Chain with One-shot Speaker Adaptation.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Compressing End-to-end ASR Networks by Tensor-Train Decomposition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Tensor Decomposition for Compressing Recurrent Neural Network.
Proceedings of the 2018 International Joint Conference on Neural Networks, 2018

Sequence-to-Sequence Asr Optimization Via Reinforcement Learning.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Graph Regularized Tensor Factorization for Single-Trial EEG Analysis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Rude-Words Detection for Indonesian Speech Using Support Vector Machine.
Proceedings of the 2018 International Conference on Asian Language Processing, 2018

Single-Trial Detection of Semantic Anomalies From EEG During Listening to Spoken Sentences.
Proceedings of the 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2018

Detecting suppression of negative emotion by time series change of cerebral blood flow using fNIRS.
Proceedings of the 2018 IEEE EMBS International Conference on Biomedical & Health Informatics, 2018

Deception Detection and Analysis in Spoken Dialogues based on FastText.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Eliciting Positive Emotion through Affect-Sensitive Dialogue Response Generation: A Neural Network Approach.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Preserving Word-Level Emphasis in Speech-to-Speech Translation.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Local Monotonic Attention Mechanism for End-to-End Speech Recognition.
CoRR, 2017

Recognizing Emotionally Coloured Dialogue Speech Using Speaker-Adapted DNN-CNN Bottleneck Features.
Proceedings of the Speech and Computer - 19th International Conference, 2017

Creation of a multi-paraphrase corpus based on various elementary operations.
Proceedings of the 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment, 2017

Speech recognition features based on deep latent Gaussian models.
Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing, 2017

Eliciting Positive Emotional Impact in Dialogue Response Selection.
Proceedings of the Advanced Social Interaction with Agents, 2017

Subject-Independent Classification of Japanese Spoken Sentences by Multiple Frequency Bands Phase Pattern of EEG Response During Speech Perception.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Structured-Based Curriculum Learning for End-to-End English-Japanese Speech Translation.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Toward Expressive Speech Translation: A Unified Sequence-to-Sequence LSTMs Approach for Translating Words and Emphasis.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Compressing recurrent neural network with tensor train.
Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

Local Monotonic Attention Mechanism for End-to-End Speech And Language Processing.
Proceedings of the Eighth International Joint Conference on Natural Language Processing, 2017

Tracking liking state in brain activity while watching multiple movies.
Proceedings of the 19th ACM International Conference on Multimodal Interaction, 2017

Attention-based Wav2Text with feature transfer learning.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Listening while speaking: Speech chain by deep learning.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Feature optimized DPGMM clustering for unsupervised subword modeling: A contribution to zerospeech 2017.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

An investigation of how to design control parameters for statistical voice timbre control.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Processing negative emotions through social communication: Multimodal database construction and analysis.
Proceedings of the Seventh International Conference on Affective Computing and Intelligent Interaction, 2017

2016
Teaching Social Communication Skills Through Human-Agent Interaction.
ACM Trans. Interact. Intell. Syst., 2016

Postfilters to Modify the Modulation Spectrum for Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Learning cooperative persuasive dialogue policies using framing.
Speech Commun., 2016

A Statistical Sample-Based Approach to GMM-Based Voice Conversion Using Tied-Covariance Acoustic Models.
IEICE Trans. Inf. Syst., 2016

Non-Native Text-to-Speech Preserving Speaker Individuality Based on Partial Correction of Prosodic and Phonetic Characteristics.
IEICE Trans. Inf. Syst., 2016

Neural Network Approaches to Dialog Response Retrieval and Generation.
IEICE Trans. Inf. Syst., 2016

Enhancing Event-Related Potentials Based on Maximum a Posteriori Estimation with a Spatial Correlation Prior.
IEICE Trans. Inf. Syst., 2016

Unsupervised Linear Discriminant Analysis for Supporting DPGMM Clustering in the Zero Resource Scenario.
Proceedings of the SLTU-2016, 2016

Deep bottleneck features and sound-dependent i-vectors for simultaneous recognition of speech and environmental sounds.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Iterative training of a DPGMM-HMM acoustic unit recognizer in a zero resource scenario.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Construction of Japanese Audio-Visual Emotion Database and Its Application in Emotion Recognition.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Unsupervised Joint Estimation of Grapheme-to-Phoneme Conversion Systems and Acoustic Model Adaptation for Non-Native Speech Recognition.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Supervised Learning of Acoustic Models in a Zero Resource Setting to Improve DPGMM Clustering.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

A Hybrid System for Continuous Word-Level Emphasis Modeling Based on HMM State Clustering and Adaptive Training.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Transferring Emphasis in Speech Translation Using Hard-Attentional Neural Network Models.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Gated Recurrent Neural Tensor Network.
Proceedings of the 2016 International Joint Conference on Neural Networks, 2016

Personalized unknown word detection in non-native language reading using eye gaze.
Proceedings of the 18th ACM International Conference on Multimodal Interaction, 2016

Automated social skills training with audiovisual information.
Proceedings of the 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2016

Removing noise from event-related potentials using a probabilistic generative model with grouped covariance matrices.
Proceedings of the 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2016

2015
Semantic Parsing of Ambiguous Input through Paraphrasing and Verification.
Trans. Assoc. Comput. Linguistics, 2015

The Future of Human-Robot Spoken Dialogue: from Information Services to Virtual Assistants (NII Shonan Meeting 2015-7).
NII Shonan Meet. Rep., 2015

NOCOA+: Multimodal Computer-Based Training for Social and Communication Skills.
IEICE Trans. Inf. Syst., 2015

An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering.
Proceedings of the Tenth Workshop on Statistical Machine Translation, 2015

Construction and analysis of social-affective interaction corpus in English and Indonesian.
Proceedings of the 2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2015

Ckylark: A More Robust PCFG-LA Parser.
Proceedings of the NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31, 2015

Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation (T).
Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, 2015

Pseudogen: A Tool to Automatically Generate Pseudo-Code from Source Code.
Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, 2015

The NAIST English speech recognition system for IWSLT 2015.
Proceedings of the 12th International Workshop on Spoken Language Translation: Evaluation Campaign@IWSLT 2015, 2015

Improving translation of emphasis with pause prediction in speech-to-speech translation systems.
Proceedings of the 12th International Workshop on Spoken Language Translation: Papers, 2015

Automated Social Skills Trainer.
Proceedings of the 20th International Conference on Intelligent User Interfaces, 2015

Context awareness and priority control for ITS based on automatic speech recognition.
Proceedings of the 14th International Conference on ITS Telecommunications, 2015

Articulatory controllable speech modification based on Gaussian mixture models with direct waveform modification using spectrum differential.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Non-audible murmur enhancement based on statistical conversion using air- and body-conductive microphones in noisy environments.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Non-native speech synthesis preserving speaker individuality based on partial correction of prosodic and phonetic characteristics.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A latent variable model for joint pause prediction and dependency parsing.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Speed or accuracy? a study in evaluation of simultaneous speech translation.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Statistical singing voice conversion based on direct waveform modification with global variance.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Preserving word-level emphasis in speech-to-speech translation using linear regression HSMMs.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Combination of two-dimensional cochleogram and spectrogram features for deep learning-based ASR.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

EEG signal enhancement using multi-channel wiener filter with a spatial correlation prior.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

An evaluation of EEG ocular artifact removal with a multi-channel wiener filter based on probabilistic generative model.
Proceedings of the 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2015

An Enhanced Electrolarynx with Automatic Fundamental Frequency Control based on Statistical Prediction.
Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility, 2015

Stochastic Gradient Variational Bayes for deep learning-based ASR.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Incremental sentence compression using LSTM recurrent networks.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Adaptive selection from multiple response candidates in example-based dialogue.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

A study of social-affective communication: Automatic prediction of emotion triggers and responses in television talk shows.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

The NAIST ASR system for the 2015 Multi-Genre Broadcast challenge: On combination of deep learning systems using a rank-score function.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Syntax-based Simultaneous Translation through Prediction of Unseen Syntactic Constituents.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015

Improving Pivot Translation by Remembering the Pivot.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015

An Analysis Towards Dialogue-Based Deception Detection.
Proceedings of the Natural Language Dialog Systems and Intelligent Assistants, 2015

Unknown Word Detection Based on Event-Related Brain Desynchronization Responses.
Proceedings of the Natural Language Dialog Systems and Intelligent Assistants, 2015

Linguistic Individuality Transformation for Spoken Language.
Proceedings of the Natural Language Dialog Systems and Intelligent Assistants, 2015

A Study on Natural Expressive Speech: Automatic Memorable Spoken Quote Detection.
Proceedings of the Natural Language Dialog Systems and Intelligent Assistants, 2015

Evaluation of a Fully Automatic Cooperative Persuasive Dialogue System.
Proceedings of the Natural Language Dialog Systems and Intelligent Assistants, 2015

2014
Parameter Generation Methods With Rich Context Models for High-Quality and Flexible Text-To-Speech Synthesis.
IEEE J. Sel. Top. Signal Process., 2014

Variable Selection Linear Regression for Robust Speech Recognition.
IEICE Trans. Inf. Syst., 2014

A Hybrid Approach to Electrolaryngeal Speech Enhancement Based on Noise Reduction and Statistical Excitation Generation.
IEICE Trans. Inf. Syst., 2014

Utilizing Human-to-Human Conversation Examples for a Multi Domain Chat-Oriented Dialog System.
IEICE Trans. Inf. Syst., 2014

Structured Adaptive Regularization of Weight Vectors for a Robust Grapheme-to-Phoneme Conversion Model.
IEICE Trans. Inf. Syst., 2014

Voice Timbre Control Based on Perceived Age in Singing Voice Conversion.
IEICE Trans. Inf. Syst., 2014

Rule-based Syntactic Preprocessing for Syntax-based Machine Translation.
Proceedings of SSST@EMNLP 2014, 2014

Recent progress in developing grapheme-based speech recognition for Indonesian ethnic languages: Javanese, Sundanese, Balinese and Bataks.
Proceedings of the 4th Workshop on Spoken Language Technologies for Under-resourced Languages, 2014

Improving the robustness of example-based dialog retrieval using recursive neural network paraphrase identification.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Emotion recognition on Indonesian television talk shows.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Conversation dialog corpora from television and movie scripts.
Proceedings of the 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014

Building a free, general-domain paraphrase database for Japanese.
Proceedings of the 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014

Construction and analysis of Indonesian Emotional Speech Corpus.
Proceedings of the 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014

Memorable spoken quote corpora of TED public speaking.
Proceedings of the 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014

Collection and analysis of a Japanese-English emphasized speech corpora.
Proceedings of the 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014

Collection of a Simultaneous Translation Corpus for Comparative Analysis.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Towards Multilingual Conversations in the Medical Domain: Development of Multilingual Medical Data and A Network-based ASR System.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Emotion and Its Triggers in Human Spoken Dialogue: Recognition and Analysis.
Proceedings of the Situated Dialog in Speech-Based Human-Computer Interaction, 2014

Construction and Analysis of a Persuasive Dialogue Corpus.
Proceedings of the Situated Dialog in Speech-Based Human-Computer Interaction, 2014

Articulatory controllable speech modification based on statistical feature mapping with Gaussian mixture models.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Direct F<sub>0</sub> control of an electrolarynx based on statistical excitation feature prediction and its evaluation through simulation.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Data-driven generation of text balloons based on linguistic and acoustic features of a comics-anime corpus.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Structured soft margin confidence weighted learning for grapheme-to-phoneme conversion.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Statistical singing voice conversion with direct waveform modification based on the spectrum differential.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A hearing impairment simulation method using audiogram-based approximation of auditory charatecteristics.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

An evaluation of excitation feature prediction in a hybrid approach to electrolaryngeal speech enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2014

A postfilter to modify the modulation spectrum in HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2014

Narrow Adaptive Regularization of weights for grapheme-to-phoneme conversion.
Proceedings of the IEEE International Conference on Acoustics, 2014

Regression approaches to perceptual age control in singing voice conversion.
Proceedings of the IEEE International Conference on Acoustics, 2014

Acquiring a Dictionary of Emotion-Provoking Events.
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 2014

Reinforcement Learning of Cooperative Persuasive Dialogue Policies using Framing.
Proceedings of the COLING 2014, 2014

Discriminative Language Models as a Tool for Machine Translation Error Analysis.
Proceedings of the COLING 2014, 2014

Unnecessary utterance detection for avoiding digressions in discussion.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

An evaluation of target speech for a nonaudible murmur enhancement system in noisy environments.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

An inter-speaker evaluation through simulation of electrolarynx control based on statistical F0 prediction.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

An event-related brain potential study on the impact of speech recognition errors.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

Recursive neural network paraphrase identification for example-based dialog retrieval.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

The use of semantic and acoustic features for open-domain TED talk summarization.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

Gender-dependent spectrum differential models for perceived age control based on direct waveform modification in singing voice conversion.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

Optimizing Segmentation Strategies for Simultaneous Speech Translation.
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014

Linguistic and Acoustic Features for Automatic Identification of Autism Spectrum Disorders in Children's Narrative.
Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 2014

2013
A-STAR: Toward translating Asian spoken languages.
Comput. Speech Lang., 2013

Investigation of intra-speaker spectral parameter variation and its prediction towards improvement of spectral conversion metric.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Towards language preservation: Design and collection of graphemically balanced and parallel speech corpora of Indonesian ethnic languages.
Proceedings of the 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013

Constructing a speech translation system using simultaneous interpretation data.
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers, 2013

The NAIST English speech recognition system for IWSLT 2013.
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign@IWSLT 2013, 2013

Incremental unsupervised training for university lecture recognition.
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers, 2013

A hybrid approach to electrolaryngeal speech enhancement based on spectral subtraction and statistical voice conversion.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Improvements to HMM-based speech synthesis based on parameter generation with rich context models.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

An empirical comparison of joint optimization techniques for speech translation.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

A digital signal processor implementation of silent/electrolaryngeal speech enhancement based on real-time statistical voice conversion.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Grapheme-to-phoneme conversion based on adaptive regularization of weight vectors.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

An investigation of acoustic features for singing voice conversion based on perceptual age.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Generalizing continuous-space translation of paralinguistic information.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Simple, lexicalized choice of translation timing for simultaneous speech translation.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Modality and contextual differences in computer based non-verbal communication training.
Proceedings of the IEEE 4th International Conference on Cognitive Infocommunications, 2013

NAIST at the CLEF 2013 QA4MRE Pilot Task.
Proceedings of the Working Notes for CLEF 2013 Conference , 2013

Inter-Sentence Features and Thresholded Minimum Error Rate Training: NAIST at CLEF 2013 QA4MRE.
Proceedings of the Working Notes for CLEF 2013 Conference , 2013

Dialogue management for leading the conversation in persuasive dialogue systems.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Towards High-Reliability Speech Translation in the Medical Domain.
Proceedings of the First Workshop on Natural Language Processing for Medical and Healthcare Fields@IJCNLP 2013, 2013

2012
Distributed speech translation technologies for multiparty multilingual communication.
ACM Trans. Speech Lang. Process., 2012

Sequence-Based Pronunciation Variation Modeling for Spontaneous ASR Using a Noisy Channel Approach.
IEICE Trans. Inf. Syst., 2012

The 2012 KIT and KIT-NAIST English ASR systems for the IWSLT evaluation.
Proceedings of the 2012 International Workshop on Spoken Language Translation, 2012

The NAIST machine translation system for IWSLT2012.
Proceedings of the 2012 International Workshop on Spoken Language Translation, 2012

A method for translation of paralinguistic information.
Proceedings of the 2012 International Workshop on Spoken Language Translation, 2012

The KIT-NAIST (contrastive) English ASR system for IWSLT 2012.
Proceedings of the 2012 International Workshop on Spoken Language Translation, 2012

Developing Non-goal Dialog System Based on Examples of Drama Television.
Proceedings of the Natural Interaction with Robots, 2012

An Evaluation of Parameter Generation Methods with Rich Context Models in HMM-Based Speech Synthesis.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Non-verbal cognitive skills and autistic conditions: An analysis and training tool.
Proceedings of the IEEE 3rd International Conference on Cognitive Infocommunications, 2012

2011
Conditional Random Fields for Modeling Korean Pronunciation Variation.
Proceedings of the Paralinguistic Information and its Integration in Spoken Dialogue Systems, 2011

Unsupervised determination of efficient Korean LVCSR units using a Bayesian Dirichlet process model.
Proceedings of the IEEE International Conference on Acoustics, 2011

Blind noise suppression for Non-Audible Murmur recognition with stereo signal processing.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010
Sequence-Based Pronunciation Modeling Using a Noisy-Channel Approach.
Proceedings of the Spoken Dialogue Systems for Ambient Environments, 2010

Korean pronunciation variation modeling with probabilistic Bayesian networks.
Proceedings of the 4th International Universal Communication Symposium, 2010

Improving spontaneous English ASR using a joint-sequence pronunciation model.
Proceedings of the 4th International Universal Communication Symposium, 2010

Utilizing a noisy-channel approach for Korean LVCSR.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Brazilian portuguese acoustic model training based on data borrowing from other language.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

2009
Incorporating Knowledge Sources into Statistical Speech Recognition
Lecture Notes in Electrical Engineering 42, Springer, ISBN: 978-0-387-85829-6, 2009

Network-based speech-to-speech translation.
Proceedings of the 2009 International Workshop on Spoken Language Translation, 2009

The Asian network-based speech-to-speech translation system.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008
Incorporating knowledge into statistical acoustic models for spoken language dialog systems.
PhD thesis, 2008

Probabilistic Pronunciation Variation Model Based on Bayesian Network for Conversational Speech Recognition.
Proceedings of the ISUC 2008, 2008

Development of Indonesian Large Vocabulary Continuous Speech Recognition System within A-STAR Project.
Proceedings of the Third International Joint Conference on Natural Language Processing, 2008

2007
Incorporating Knowledge Sources Into a Statistical Acoustic Model for Spoken Language Communication Systems.
IEEE Trans. Computers, 2007

An HMM acoustic model incorporating various additional knowledge sources.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

A method to integrate additional knowledge sources into HMM based on junction tree decomposition.
Proceedings of the 15th European Signal Processing Conference, 2007

2006
Improving Acoustic Model Precision by Incorporating a Wide Phonetic Context Based on a Bayesian Framework.
IEICE Trans. Inf. Syst., 2006

A Hybrid HMM/BN Acoustic Model Utilizing Pentaphone-Context Dependency.
IEICE Trans. Inf. Syst., 2006

The use of Bayesian network for incorporating accent, gender and wide-context dependency information.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Incorporation of Pentaphone-Context Dependency Based on Hybrid Hmm/Bn Acoustic Modeling Framework.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005
Incorporating a Bayesian wide phonetic context model for acoustic rescoring.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

2004
Indonesian speech recognition for hearing and speaking impaired people.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004


  Loading...