Sheng Li

Chenhui Chu

Int. J. Asian Lang. Process., March, 2024

Robust voice activity detection using an auditory-inspired masked modulation encoder based convolutional attention network.

[BibT_eX]

[DOI]

Speech Commun., 2024

Phantom in the opera: adversarial music attack for robot dialogue system.

[BibT_eX]

[DOI]

Yang Cao

Frontiers Comput. Sci., 2024

Extracting Spatiotemporal Data from Gradients with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction.

[BibT_eX]

[DOI]

CoRR, 2024

Investigating Effective Speaker Property Privacy Protection in Federated Learning for Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the 6th ACM International Conference on Multimedia in Asia, 2024

Reproducibility Companion Paper: Stable Diffusion for Content-Style Disentanglement in Art Analysis.

[BibT_eX]

[DOI]

Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Enhancing Realism in 3D Facial Animation Using Conformer-Based Generation and Automated Post-Processing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Revisiting Generative Adversarial Network for Downstream Task of Speech Recognition.

[BibT_eX]

[DOI]

Bei Liu

Jianlong Fu

Proceedings of the IEEE Gaming, Entertainment, and Media Conference, 2024

Enhancing Privacy of Spatiotemporal Federated Learning Against Gradient Inversion Attacks.

[BibT_eX]

[DOI]

Proceedings of the Database Systems for Advanced Applications, 2024

Automatic Post-editing of Speech Recognition System Output Using Large Language Models.

[BibT_eX]

[DOI]

Yang Cao

Proceedings of the Database Systems for Advanced Applications. DASFAA 2024 International Workshops, 2024

Low-resource Language Adaptation with Ensemble of PEFT Approaches.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2024

Data Selection using Spoken Language Identification for Low-Resource and Zero-Resource Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2024

LLM as decoder: Investigating Lattice-based Speech Recognition Hypotheses Rescoring Using LLM.

[BibT_eX]

[DOI]

Yuka Ko

Akinori Ito

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2024

2023

Finetuning Pretrained Model with Embedding of Domain and Language Information for ASR of Very Low-Resource Settings.

[BibT_eX]

[DOI]

Int. J. Asian Lang. Process., December, 2023

Disordered speech recognition considering low resources and abnormal articulation.

[BibT_eX]

[DOI]

Speech Commun., November, 2023

KyotoMOS: An Automatic MOS Scoring System for Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the ACM Multimedia Asia Workshops, 2023

GhostVec: A New Threat to Speaker Privacy of End-to-End Speech Recognition System.

[BibT_eX]

[DOI]

Proceedings of the ACM Multimedia Asia 2023, 2023

Reprogramming Self-supervised Learning-based Speech Representations for Speaker Anonymization.

[BibT_eX]

[DOI]

Proceedings of the ACM Multimedia Asia 2023, 2023

The Kyoto Speech-to-Speech Translation System for IWSLT 2023.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Spoken Language Translation, 2023

Speech-Text Based Multi-Modal Training with Bidirectional Attention for Improved Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Speakeraugment: Data Augmentation for Generalizable Source Separation via Speaker Parameter Manipulation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

General or Specific? Investigating Effective Privacy Protection in Federated Learning for Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Hierarchical Softmax for End-To-End Low-Resource Multilingual Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Development of a Pain Signaling System Using Machine Learning.

[BibT_eX]

[DOI]

Helen Korving

Di Zhou

Paula Sophia Sterkenburg

Panos Markopoulos

Emilia I. Barakova

Proceedings of the IEEE International Conference on Acoustics, 2023

Domain and Language Adaptation Using Heterogeneous Datasets for Wav2vec2.0-Based Speech Recognition of Low-Resource Language.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Correction while Recognition: Combining Pretrained Language Model for Taiwan-Accented Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2023, 2023

FedCPC: An Effective Federated Contrastive Learning Method for Privacy Preserving Early-Stage Alzheimers Speech Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Multi-Domain Dialogue State Tracking with Disentangled Domain-Slot Attention.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Towards Speech Dialogue Translation Mediating Speakers of Different Languages.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling.

[BibT_eX]

[DOI]

EURASIP J. Audio Speech Music. Process., 2022

Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2022

Multi-Domain Dialogue State Tracking with Top-K Slot Self Attention.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2022

Self-Adaptive Multilingual ASR Rescoring with Language Identification and Unified Language Model.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Nict-Tib1: A Public Speech Corpus Of Lhasa Dialect For Benchmarking Tibetan Language Speech Recognition Systems.

[BibT_eX]

[DOI]

Soky Kak

Zhuo Gong

Proceedings of the 25th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2022

Adversarial Speech Generation and Natural Speech Recovery for Speech Content Protection.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Fusion of Self-supervised Learned Models for MOS Prediction.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Augmented Adversarial Self-Supervised Learning for Early-Stage Alzheimer's Speech Detection.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Finer-grained Modeling units-based Meta-Learning for Low-resource Tibetan Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Global Signal-to-noise Ratio Estimation Based on Multi-subband Processing Using Convolutional Neural Network.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Leveraging Simultaneous Translation for Enhancing Transcription of Low-resource Language via Cross Attention Mechanism.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Investigating Effective Domain Adaptation Method for Speaker Verification Task.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing - 29th International Conference, 2022

An End-to-End Chinese and Japanese Bilingual Speech Recognition Systems with Shared Character Decomposition.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing - 29th International Conference, 2022

GhostVec: Directly Extracting Speaker Embedding from End-to-End Speech Recognition Model Using Adversarial Examples.

[BibT_eX]

[DOI]

Xiaojiao Chen

Hao Huang

Proceedings of the Neural Information Processing - 29th International Conference, 2022

Mining Hard Samples Locally And Globally For Improved Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Compressing Transformer-Based ASR Model by Task-Driven Loss and Attention-Based Multi-Level Feature Distillation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Relationship Between Speakers' Physiological Structure and Acoustic Speech Signals: Data-Driven Study Based on Frequency-Wise Attentional Neural Network.

[BibT_eX]

[DOI]

Proceedings of the 30th European Signal Processing Conference, 2022

2021

TriECCC: Trilingual Corpus of the Extraordinary Chambers in the Courts of Cambodia for Speech Recognition and Translation Studies.

[BibT_eX]

[DOI]

Int. J. Asian Lang. Process., 2021

Khmer Speech Translation Corpus of the Extraordinary Chambers in the Courts of Cambodia (ECCC).

[BibT_eX]

[DOI]

Proceedings of the 24th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2021

An End-to-End Dialect Identification System with Transfer Learning from a Multilingual Automatic Speech Recognition Model.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-to-End Speech Separation Using Orthogonal Representation in Complex and Real Time-Frequency Domain.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Simultaneous Progressive Filtering-Based Monaural Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing - 28th International Conference, 2021

Speech Dereverberation Based on Scale-Aware Mean Square Error Loss.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing - 28th International Conference, 2021

Exploring Effective Speech Representation via ASR for High-Quality End-to-End Multispeaker TTS.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing - 28th International Conference, 2021

Robust Voice Activity Detection Using a Masked Auditory Encoder Based Convolutional Neural Network.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Encoder-Decoder Based Pitch Tracking and Joint Model Training for Mandarin Tone Classification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

An Investigation of Using Hybrid Modeling Units for Improving End-to-End Speech Recognition System.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Spectrograms Fusion-based End-to-end Robust Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Multilingual Approach to Joint Speech and Accent Recognition with DNN-HMM Framework.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

On the Use of Speaker Information for Automatic Speech Recognition in Speaker-imbalanced Corpora.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020

Automatic Speech Recognition.

[BibT_eX]

[DOI]

Xugang Lu

Masakiyo Fujimoto

Proceedings of the Speech-to-Speech Translation, 2020

Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Compensation on x-vector for Short Utterance Spoken Language Identification.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Joint Training End-to-End Speech Recognition Systems with Speaker Attributes.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

VOIS: The First Speech Therapy App Specifically Designed for Myanmar Hearing-Impaired Children.

[BibT_eX]

[DOI]

Proceedings of the 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2020

Singing Voice Extraction with Attention-Based Spectrograms Fusion.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Investigation of Effectively Synthesizing Code-Switched Speech Using Highly Imbalanced Mix-Lingual Data.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing - 27th International Conference, 2020

Voice-Indistinguishability: Protecting Voiceprint In Privacy-Preserving Speech Data Release.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2020

Spectrograms Fusion with Minimum Difference Masks Estimation for Monaural Speech Dereverberation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

End-to-End Articulatory Modeling for Dysarthric Articulatory Attribute Detection.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Voice-Indistinguishability - Protecting Voiceprint with Differential Privacy under an Untrusted Server.

[BibT_eX]

[DOI]

Proceedings of the CCS '20: 2020 ACM SIGSAC Conference on Computer and Communications Security, 2020

2019

Deep progressive multi-scale attention for acoustic event classification.

[BibT_eX]

[DOI]

CoRR, 2019

Class-Wise Centroid Distance Metric Learning for Acoustic Event Detection.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Improving Transformer-Based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Investigating Radical-Based End-to-End Speech Recognition Systems for Chinese Dialects and Japanese.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

End-to-End Articulatory Attribute Modeling for Low-Resource Multilingual Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Investigation of Sequence-level Knowledge Distillation Methods for CTC Acoustic Models.

[BibT_eX]

[DOI]

Ryoichi Takashima

Hisashi Kawai

Proceedings of the IEEE International Conference on Acoustics, 2019

Interactive Learning of Teacher-student Model for Short Utterance Spoken Language Identification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Multi-lingual Transformer Training for Khmer Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Effective Training End-to-End ASR systems for Low-resource Lhasa Dialect of Tibetan Language.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018

Improving Very Deep Time-Delay Neural Network With Vertical-Attention For Effectively Training CTC-Based ASR Systems.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Feature Representation of Short Utterances Based on Knowledge Distillation for Spoken Language Identification.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Temporal Attentive Pooling for Acoustic Event Detection.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Improving CTC-based Acoustic Model with Very Deep Residual Time-delay Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

CTC Loss Function with a Unit-Level Ambiguity Penalty.

[BibT_eX]

[DOI]

Ryoichi Takashima

Hisashi Kawai

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

An Investigation of a Knowledge Distillation Method for CTC Acoustic Models.

[BibT_eX]

[DOI]

Ryoichi Takashima

Hisashi Kawai

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Conditional Generative Adversarial Nets Classifier for Spoken Language Identification.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Semi-supervised ensemble DNN acoustic model training.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Incremental training and constructing the very deep convolutional residual network acoustic models.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2016

Speech Recognition Enhanced by Lightly-supervised and Semi-supervised Acoustic Model Training.

[BibT_eX]

[DOI]

PhD thesis, 2016

Semi-Supervised Acoustic Model Training by Discriminative Data Selection From Multiple ASR Systems' Hypotheses.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2016

Confidence estimation for speech recognition systems using conditional random fields trained with partially annotated data.

[BibT_eX]

[DOI]

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Data selection from multiple ASR systems' hypotheses for unsupervised acoustic model training.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Automatic Lecture Transcription Based on Discriminative Data Selection for Lightly Supervised Acoustic Model Training.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2015

Ensemble speaker modeling using speaker adaptive training deep neural network for speaker adaptation.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Discriminative data selection for lightly supervised training of acoustic model using closed caption texts.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

2014

Corpus and transcription system of Chinese Lecture Room.

[BibT_eX]

[DOI]

Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

2012

Phoneme-level articulatory animation in pronunciation training.

[BibT_eX]

[DOI]

Speech Commun., 2012

Cross Linguistic Comparison of Mandarin and English EMA Articulatory Data.

[BibT_eX]

[DOI]

Lan Wang

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

2011

The Phoneme-Level Articulator Dynamics for Pronunciation Animation.

[BibT_eX]

[DOI]