We stand with Ukraine

We stand with Ukraine

Wei Li

Orcid: 0000-0001-7824-4839

Affiliations:

ByteDance AI-Lab
Georgia Institute of Technology, School of Electrical and Computer Engineering, Atlanta, GA, USA
Beijing Language and Culture University, College of Information Science, Beijing, China

According to our database¹, Wei Li authored at least 27 papers between 2010 and 2024.

Collaborative distances:

Dijkstra number² of five.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

Online presence:

on orcid.org
on scholar.google.com

On csauthors.net:

Bibliography

2024

A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, 2024

Can Large Language Models Understand Spatial Audio?

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

SALMONN: Towards Generic Hearing Abilities for Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Connecting Speech Encoder and Large Language Model for ASR.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Extending Large Language Models for Speech and Audio Captioning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2023

Disentangling the Contribution of Non-native Speech in Automated Pronunciation Assessment.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Phonetic and Prosody-aware Self-supervised Learning Approach for Non-native Fluency Scoring.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

An ASR-Free Fluency Scoring Approach with Self-Supervised Learning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Leveraging Phone-Level Linguistic-Acoustic Similarity For Utterance-Level Pronunciation Scoring.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Improving Non-native Word-level Pronunciation Scoring with Phone-level Mixup Data Augmentation and Multi-source Information.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2022

A Transfer and Multi-Task Learning based Approach for MOS Prediction.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Using Fluency Representation Learned from Sequential Raw Features for Improving Non-native Fluency Scoring.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2020

Improving mispronunciation Detection and Enriching Diagnostic feedback for non-Native Learners of Mandarin.

[BibT_eX]

[DOI]

PhD thesis, 2020

Improving Accent Conversion with Reference Encoder and End-To-End Text-To-Speech.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2020

A Cross-Task Transfer Learning Approach to Adapting Deep Speech Enhancement Models to Unseen Background Noise Using Paired Senone Classifiers.

[BibT_eX]

[DOI]

,

,

Sabato Marco Siniscalchi

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Improving Mispronunciation Detection of Mandarin Tones for Non-Native Learners With Soft-Target Tone Labels and BLSTM-Based Deep Tone Models.

[BibT_eX]

[DOI]

,

,

Sabato Marco Siniscalchi

,

IEEE ACM Trans. Audio Speech Lang. Process., 2019

Improving Audio-visual Speech Recognition Performance with Cross-modal Student-teacher Training.

[BibT_eX]

[DOI]

,

,

,

Sabato Marco Siniscalchi

,

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features Using Extended Recognition Networks.

[BibT_eX]

[DOI]

,

,

,

,

,

Sabato Marco Siniscalchi

,

,

J. Signal Process. Syst., 2018

Improving Mandarin Tone Mispronunciation Detection for Non-Native Learners with Soft-Target Tone Labels and BLSTM-Based Deep Models.

[BibT_eX]

[DOI]

,

,

Sabato Marco Siniscalchi

,

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Improving Mispronunciation Detection for Non-Native Learners with Multisource Information and LSTM-Based Deep Models.

[BibT_eX]

[DOI]

,

,

Sabato Marco Siniscalchi

,

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016

Detecting Mispronunciations of L2 Learners and Providing Corrective Feedback Using Knowledge-Guided and Data-Driven Decision Trees.

[BibT_eX]

[DOI]

,

,

Sabato Marco Siniscalchi

,

,

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Improving non-native mispronunciation detection and enriching diagnostic feedback with DNN-based speech attribute modeling.

[BibT_eX]

[DOI]

,

Sabato Marco Siniscalchi

,

,

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Using tone-based extended recognition network to detect non-native Mandarin tone mispronunciations.

[BibT_eX]

[DOI]

,

Sabato Marco Siniscalchi

,

,

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2013

Using Mutual Information Criterion to Design an Effective Lexicon for Chinese Pinyin-to-Character Conversion.

[BibT_eX]

[DOI]

,

,

,

,

Masafumi Nishida

,

Seiichi Yamamoto

Proceedings of the 2013 International Conference on Asian Language Processing, 2013

2010

A study on Functional Loads of phonetic contrasts under context based on Mutual Information of Chinese text and phonemes.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Loading...