Sheng Li
Orcid: 0000-0001-7636-3797Affiliations:
- National Institute of Information and Communications Technology (NICT), Universal Communication Research Institute (UCRI), Kyoto, Japan
- Kyoto University, Graduate School of Informatics, Japan (2012-2017, PhD 2016)
- Shenzhen Institutes of Advanced Technology, Shenzhen, China (2008-2012)
- Chinese Academy of Sciences, Beijing, China (2008-2012)
- Chinese University of Hong Kong, Hong Kong (2008-2012)
- Nanjing University, China (2002-2009)
According to our database1,
Sheng Li
authored at least 99 papers
between 2011 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
Voices of the Himalayas: Benchmarking Speech Recognition Systems for the Tibetan Language.
Int. J. Asian Lang. Process., March, 2024
Robust voice activity detection using an auditory-inspired masked modulation encoder based convolutional attention network.
Speech Commun., 2024
Frontiers Comput. Sci., 2024
Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction.
CoRR, 2024
Reproducibility Companion Paper: Stable Diffusion for Content-Style Disentanglement in Art Analysis.
Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Enhancing Realism in 3D Facial Animation Using Conformer-Based Generation and Automated Post-Processing.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE Gaming, Entertainment, and Media Conference, 2024
Enhancing Privacy of Spatiotemporal Federated Learning Against Gradient Inversion Attacks.
Proceedings of the Database Systems for Advanced Applications, 2024
2023
Finetuning Pretrained Model with Embedding of Domain and Language Information for ASR of Very Low-Resource Settings.
Int. J. Asian Lang. Process., December, 2023
Speech Commun., November, 2023
Proceedings of the ACM Multimedia Asia Workshops, 2023
Proceedings of the ACM Multimedia Asia 2023, 2023
Reprogramming Self-supervised Learning-based Speech Representations for Speaker Anonymization.
Proceedings of the ACM Multimedia Asia 2023, 2023
Proceedings of the 20th International Conference on Spoken Language Translation, 2023
Speech-Text Based Multi-Modal Training with Bidirectional Attention for Improved Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023
Speakeraugment: Data Augmentation for Generalizable Source Separation via Speaker Parameter Manipulation.
Proceedings of the IEEE International Conference on Acoustics, 2023
General or Specific? Investigating Effective Privacy Protection in Federated Learning for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Domain and Language Adaptation Using Heterogeneous Datasets for Wav2vec2.0-Based Speech Recognition of Low-Resource Language.
Proceedings of the IEEE International Conference on Acoustics, 2023
Correction while Recognition: Combining Pretrained Language Model for Taiwan-Accented Speech Recognition.
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2023, 2023
FedCPC: An Effective Federated Contrastive Learning Method for Privacy Preserving Early-Stage Alzheimers Speech Detection.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
2022
Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling.
EURASIP J. Audio Speech Music. Process., 2022
CoRR, 2022
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2022
Self-Adaptive Multilingual ASR Rescoring with Language Identification and Unified Language Model.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022
Nict-Tib1: A Public Speech Corpus Of Lhasa Dialect For Benchmarking Tibetan Language Speech Recognition Systems.
Proceedings of the 25th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2022
Adversarial Speech Generation and Natural Speech Recovery for Speech Content Protection.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Augmented Adversarial Self-Supervised Learning for Early-Stage Alzheimer's Speech Detection.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Finer-grained Modeling units-based Meta-Learning for Low-resource Tibetan Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Global Signal-to-noise Ratio Estimation Based on Multi-subband Processing Using Convolutional Neural Network.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Leveraging Simultaneous Translation for Enhancing Transcription of Low-resource Language via Cross Attention Mechanism.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the Neural Information Processing - 29th International Conference, 2022
An End-to-End Chinese and Japanese Bilingual Speech Recognition Systems with Shared Character Decomposition.
Proceedings of the Neural Information Processing - 29th International Conference, 2022
GhostVec: Directly Extracting Speaker Embedding from End-to-End Speech Recognition Model Using Adversarial Examples.
Proceedings of the Neural Information Processing - 29th International Conference, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Compressing Transformer-Based ASR Model by Task-Driven Loss and Attention-Based Multi-Level Feature Distillation.
Proceedings of the IEEE International Conference on Acoustics, 2022
Relationship Between Speakers' Physiological Structure and Acoustic Speech Signals: Data-Driven Study Based on Frequency-Wise Attentional Neural Network.
Proceedings of the 30th European Signal Processing Conference, 2022
2021
TriECCC: Trilingual Corpus of the Extraordinary Chambers in the Courts of Cambodia for Speech Recognition and Translation Studies.
Int. J. Asian Lang. Process., 2021
Khmer Speech Translation Corpus of the Extraordinary Chambers in the Courts of Cambodia (ECCC).
Proceedings of the 24th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2021
An End-to-End Dialect Identification System with Transfer Learning from a Multilingual Automatic Speech Recognition Model.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
End-to-End Speech Separation Using Orthogonal Representation in Complex and Real Time-Frequency Domain.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the Neural Information Processing - 28th International Conference, 2021
Proceedings of the Neural Information Processing - 28th International Conference, 2021
Exploring Effective Speech Representation via ASR for High-Quality End-to-End Multispeaker TTS.
Proceedings of the Neural Information Processing - 28th International Conference, 2021
Robust Voice Activity Detection Using a Masked Auditory Encoder Based Convolutional Neural Network.
Proceedings of the IEEE International Conference on Acoustics, 2021
Encoder-Decoder Based Pitch Tracking and Joint Model Training for Mandarin Tone Classification.
Proceedings of the IEEE International Conference on Acoustics, 2021
An Investigation of Using Hybrid Modeling Units for Improving End-to-End Speech Recognition System.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021
On the Use of Speaker Information for Automatic Speech Recognition in Speaker-imbalanced Corpora.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021
2020
Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification.
IEEE ACM Trans. Audio Speech Lang. Process., 2020
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020
VOIS: The First Speech Therapy App Specifically Designed for Myanmar Hearing-Impaired Children.
Proceedings of the 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Investigation of Effectively Synthesizing Code-Switched Speech Using Highly Imbalanced Mix-Lingual Data.
Proceedings of the Neural Information Processing - 27th International Conference, 2020
Voice-Indistinguishability: Protecting Voiceprint In Privacy-Preserving Speech Data Release.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2020
Spectrograms Fusion with Minimum Difference Masks Estimation for Monaural Speech Dereverberation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Voice-Indistinguishability - Protecting Voiceprint with Differential Privacy under an Untrusted Server.
Proceedings of the CCS '20: 2020 ACM SIGSAC Conference on Computer and Communications Security, 2020
2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Improving Transformer-Based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Investigating Radical-Based End-to-End Speech Recognition Systems for Chinese Dialects and Japanese.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
End-to-End Articulatory Attribute Modeling for Low-Resource Multilingual Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Investigation of Sequence-level Knowledge Distillation Methods for CTC Acoustic Models.
Proceedings of the IEEE International Conference on Acoustics, 2019
Interactive Learning of Teacher-student Model for Short Utterance Spoken Language Identification.
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Effective Training End-to-End ASR systems for Low-resource Lhasa Dialect of Tibetan Language.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
2018
Improving Very Deep Time-Delay Neural Network With Vertical-Attention For Effectively Training CTC-Based ASR Systems.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018
Feature Representation of Short Utterances Based on Knowledge Distillation for Spoken Language Identification.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Improving CTC-based Acoustic Model with Very Deep Residual Time-delay Neural Networks.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
2017
Conditional Generative Adversarial Nets Classifier for Spoken Language Identification.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Incremental training and constructing the very deep convolutional residual network acoustic models.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017
2016
Speech Recognition Enhanced by Lightly-supervised and Semi-supervised Acoustic Model Training.
PhD thesis, 2016
Semi-Supervised Acoustic Model Training by Discriminative Data Selection From Multiple ASR Systems' Hypotheses.
IEEE ACM Trans. Audio Speech Lang. Process., 2016
Confidence estimation for speech recognition systems using conditional random fields trained with partially annotated data.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
Data selection from multiple ASR systems' hypotheses for unsupervised acoustic model training.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
2015
Automatic Lecture Transcription Based on Discriminative Data Selection for Lightly Supervised Acoustic Model Training.
IEICE Trans. Inf. Syst., 2015
Ensemble speaker modeling using speaker adaptive training deep neural network for speaker adaptation.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Discriminative data selection for lightly supervised training of acoustic model using closed caption texts.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
2014
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
2012
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
2011
Proceedings of the International Conference on Asian Language Processing, 2011