Sheng Li

Orcid: 0000-0001-7636-3797

Affiliations:
  • National Institute of Information and Communications Technology (NICT), Universal Communication Research Institute (UCRI), Kyoto, Japan
  • Kyoto University, Graduate School of Informatics, Japan (2012-2017, PhD 2016)
  • Shenzhen Institutes of Advanced Technology, Shenzhen, China (2008-2012)
  • Chinese Academy of Sciences, Beijing, China (2008-2012)
  • Chinese University of Hong Kong, Hong Kong (2008-2012)
  • Nanjing University, China (2002-2009)


According to our database1, Sheng Li authored at least 99 papers between 2011 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Voices of the Himalayas: Benchmarking Speech Recognition Systems for the Tibetan Language.
Int. J. Asian Lang. Process., March, 2024

Robust voice activity detection using an auditory-inspired masked modulation encoder based convolutional attention network.
Speech Commun., 2024

Phantom in the opera: adversarial music attack for robot dialogue system.
Frontiers Comput. Sci., 2024

Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction.
CoRR, 2024

Reproducibility Companion Paper: Stable Diffusion for Content-Style Disentanglement in Art Analysis.
Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction.
Proceedings of the IEEE International Conference on Acoustics, 2024

Enhancing Realism in 3D Facial Animation Using Conformer-Based Generation and Automated Post-Processing.
Proceedings of the IEEE International Conference on Acoustics, 2024

Revisiting Generative Adversarial Network for Downstream Task of Speech Recognition.
Proceedings of the IEEE Gaming, Entertainment, and Media Conference, 2024

Enhancing Privacy of Spatiotemporal Federated Learning Against Gradient Inversion Attacks.
Proceedings of the Database Systems for Advanced Applications, 2024

2023
Finetuning Pretrained Model with Embedding of Domain and Language Information for ASR of Very Low-Resource Settings.
Int. J. Asian Lang. Process., December, 2023

Disordered speech recognition considering low resources and abnormal articulation.
Speech Commun., November, 2023

KyotoMOS: An Automatic MOS Scoring System for Speech Synthesis.
Proceedings of the ACM Multimedia Asia Workshops, 2023

GhostVec: A New Threat to Speaker Privacy of End-to-End Speech Recognition System.
Proceedings of the ACM Multimedia Asia 2023, 2023

Reprogramming Self-supervised Learning-based Speech Representations for Speaker Anonymization.
Proceedings of the ACM Multimedia Asia 2023, 2023

The Kyoto Speech-to-Speech Translation System for IWSLT 2023.
Proceedings of the 20th International Conference on Spoken Language Translation, 2023

Speech-Text Based Multi-Modal Training with Bidirectional Attention for Improved Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Speakeraugment: Data Augmentation for Generalizable Source Separation via Speaker Parameter Manipulation.
Proceedings of the IEEE International Conference on Acoustics, 2023

General or Specific? Investigating Effective Privacy Protection in Federated Learning for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Hierarchical Softmax for End-To-End Low-Resource Multilingual Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Development of a Pain Signaling System Using Machine Learning.
Proceedings of the IEEE International Conference on Acoustics, 2023

Domain and Language Adaptation Using Heterogeneous Datasets for Wav2vec2.0-Based Speech Recognition of Low-Resource Language.
Proceedings of the IEEE International Conference on Acoustics, 2023

Correction while Recognition: Combining Pretrained Language Model for Taiwan-Accented Speech Recognition.
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2023, 2023

FedCPC: An Effective Federated Contrastive Learning Method for Privacy Preserving Early-Stage Alzheimers Speech Detection.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Multi-Domain Dialogue State Tracking with Disentangled Domain-Slot Attention.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Towards Speech Dialogue Translation Mediating Speakers of Different Languages.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling.
EURASIP J. Audio Speech Music. Process., 2022

Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition.
CoRR, 2022

Multi-Domain Dialogue State Tracking with Top-K Slot Self Attention.
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2022

Self-Adaptive Multilingual ASR Rescoring with Language Identification and Unified Language Model.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Nict-Tib1: A Public Speech Corpus Of Lhasa Dialect For Benchmarking Tibetan Language Speech Recognition Systems.
Proceedings of the 25th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2022

Adversarial Speech Generation and Natural Speech Recovery for Speech Content Protection.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Fusion of Self-supervised Learned Models for MOS Prediction.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Augmented Adversarial Self-Supervised Learning for Early-Stage Alzheimer's Speech Detection.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Finer-grained Modeling units-based Meta-Learning for Low-resource Tibetan Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Global Signal-to-noise Ratio Estimation Based on Multi-subband Processing Using Convolutional Neural Network.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Leveraging Simultaneous Translation for Enhancing Transcription of Low-resource Language via Cross Attention Mechanism.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Investigating Effective Domain Adaptation Method for Speaker Verification Task.
Proceedings of the Neural Information Processing - 29th International Conference, 2022

An End-to-End Chinese and Japanese Bilingual Speech Recognition Systems with Shared Character Decomposition.
Proceedings of the Neural Information Processing - 29th International Conference, 2022

GhostVec: Directly Extracting Speaker Embedding from End-to-End Speech Recognition Model Using Adversarial Examples.
Proceedings of the Neural Information Processing - 29th International Conference, 2022

Mining Hard Samples Locally And Globally For Improved Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2022

Compressing Transformer-Based ASR Model by Task-Driven Loss and Attention-Based Multi-Level Feature Distillation.
Proceedings of the IEEE International Conference on Acoustics, 2022

Relationship Between Speakers' Physiological Structure and Acoustic Speech Signals: Data-Driven Study Based on Frequency-Wise Attentional Neural Network.
Proceedings of the 30th European Signal Processing Conference, 2022

2021
TriECCC: Trilingual Corpus of the Extraordinary Chambers in the Courts of Cambodia for Speech Recognition and Translation Studies.
Int. J. Asian Lang. Process., 2021

Khmer Speech Translation Corpus of the Extraordinary Chambers in the Courts of Cambodia (ECCC).
Proceedings of the 24th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2021

An End-to-End Dialect Identification System with Transfer Learning from a Multilingual Automatic Speech Recognition Model.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-to-End Speech Separation Using Orthogonal Representation in Complex and Real Time-Frequency Domain.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Simultaneous Progressive Filtering-Based Monaural Speech Enhancement.
Proceedings of the Neural Information Processing - 28th International Conference, 2021

Speech Dereverberation Based on Scale-Aware Mean Square Error Loss.
Proceedings of the Neural Information Processing - 28th International Conference, 2021

Exploring Effective Speech Representation via ASR for High-Quality End-to-End Multispeaker TTS.
Proceedings of the Neural Information Processing - 28th International Conference, 2021

Robust Voice Activity Detection Using a Masked Auditory Encoder Based Convolutional Neural Network.
Proceedings of the IEEE International Conference on Acoustics, 2021

Encoder-Decoder Based Pitch Tracking and Joint Model Training for Mandarin Tone Classification.
Proceedings of the IEEE International Conference on Acoustics, 2021

An Investigation of Using Hybrid Modeling Units for Improving End-to-End Speech Recognition System.
Proceedings of the IEEE International Conference on Acoustics, 2021

Spectrograms Fusion-based End-to-end Robust Automatic Speech Recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Multilingual Approach to Joint Speech and Accent Recognition with DNN-HMM Framework.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

On the Use of Speaker Information for Automatic Speech Recognition in Speaker-imbalanced Corpora.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020
Automatic Speech Recognition.
Proceedings of the Speech-to-Speech Translation, 2020

Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Compensation on x-vector for Short Utterance Spoken Language Identification.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Joint Training End-to-End Speech Recognition Systems with Speaker Attributes.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

VOIS: The First Speech Therapy App Specifically Designed for Myanmar Hearing-Impaired Children.
Proceedings of the 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2020

Singing Voice Extraction with Attention-Based Spectrograms Fusion.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Investigation of Effectively Synthesizing Code-Switched Speech Using Highly Imbalanced Mix-Lingual Data.
Proceedings of the Neural Information Processing - 27th International Conference, 2020

Voice-Indistinguishability: Protecting Voiceprint In Privacy-Preserving Speech Data Release.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2020

Spectrograms Fusion with Minimum Difference Masks Estimation for Monaural Speech Dereverberation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

End-to-End Articulatory Modeling for Dysarthric Articulatory Attribute Detection.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Voice-Indistinguishability - Protecting Voiceprint with Differential Privacy under an Untrusted Server.
Proceedings of the CCS '20: 2020 ACM SIGSAC Conference on Computer and Communications Security, 2020

2019
Deep progressive multi-scale attention for acoustic event classification.
CoRR, 2019

Class-Wise Centroid Distance Metric Learning for Acoustic Event Detection.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Improving Transformer-Based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Investigating Radical-Based End-to-End Speech Recognition Systems for Chinese Dialects and Japanese.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

End-to-End Articulatory Attribute Modeling for Low-Resource Multilingual Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Investigation of Sequence-level Knowledge Distillation Methods for CTC Acoustic Models.
Proceedings of the IEEE International Conference on Acoustics, 2019

Interactive Learning of Teacher-student Model for Short Utterance Spoken Language Identification.
Proceedings of the IEEE International Conference on Acoustics, 2019

Multi-lingual Transformer Training for Khmer Automatic Speech Recognition.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Effective Training End-to-End ASR systems for Low-resource Lhasa Dialect of Tibetan Language.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018
Improving Very Deep Time-Delay Neural Network With Vertical-Attention For Effectively Training CTC-Based ASR Systems.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Feature Representation of Short Utterances Based on Knowledge Distillation for Spoken Language Identification.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Temporal Attentive Pooling for Acoustic Event Detection.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Improving CTC-based Acoustic Model with Very Deep Residual Time-delay Neural Networks.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

CTC Loss Function with a Unit-Level Ambiguity Penalty.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

An Investigation of a Knowledge Distillation Method for CTC Acoustic Models.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Conditional Generative Adversarial Nets Classifier for Spoken Language Identification.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Semi-supervised ensemble DNN acoustic model training.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Incremental training and constructing the very deep convolutional residual network acoustic models.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2016
Speech Recognition Enhanced by Lightly-supervised and Semi-supervised Acoustic Model Training.
PhD thesis, 2016

Semi-Supervised Acoustic Model Training by Discriminative Data Selection From Multiple ASR Systems' Hypotheses.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Confidence estimation for speech recognition systems using conditional random fields trained with partially annotated data.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Data selection from multiple ASR systems' hypotheses for unsupervised acoustic model training.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
Automatic Lecture Transcription Based on Discriminative Data Selection for Lightly Supervised Acoustic Model Training.
IEICE Trans. Inf. Syst., 2015

Ensemble speaker modeling using speaker adaptive training deep neural network for speaker adaptation.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Discriminative data selection for lightly supervised training of acoustic model using closed caption texts.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

2014
Corpus and transcription system of Chinese Lecture Room.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

2012
Phoneme-level articulatory animation in pronunciation training.
Speech Commun., 2012

Cross Linguistic Comparison of Mandarin and English EMA Articulatory Data.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

2011
The Phoneme-Level Articulator Dynamics for Pronunciation Animation.
Proceedings of the International Conference on Asian Language Processing, 2011


  Loading...