Shang-Wen Li

Orcid: 0000-0003-0656-9874

Affiliations:
  • Apple Inc., Cupertino, CA, USA
  • Amazon, Seattle, WA, USA (former)
  • Massachusetts Institute of Technology, Cambridge, USA (PhD 2017)


According to our database1, Shang-Wen Li authored at least 75 papers between 2010 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
DINOv2: Learning Robust Visual Features without Supervision.
Trans. Mach. Learn. Res., 2024

A Large-Scale Evaluation of Speech Foundation Models.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Text Quality-Based Pruning for Efficient Training of Language Models.
CoRR, 2024

Demystifying CLIP Data.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

SpeechDPR: End-To-End Spoken Passage Retrieval For Open-Domain Spoken Question Answering.
Proceedings of the IEEE International Conference on Acoustics, 2024

SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in Hubert.
Proceedings of the IEEE International Conference on Acoustics, 2024

Altogether: Image Captioning via Re-aligning Alt-text.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

MoDE: CLIP Data Experts via Clustering.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
GSQA: An End-to-End Model for Generative Spoken Question Answering.
CoRR, 2023

An Exploration of In-Context Learning for Speech Language Model.
CoRR, 2023

SD-HuBERT: Self-Distillation Induces Syllabic Organization in HuBERT.
CoRR, 2023

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond.
CoRR, 2023

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models.
CoRR, 2023

Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning.
CoRR, 2023

Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Mode.
CoRR, 2023

DINOv2: Learning Robust Visual Features without Supervision.
CoRR, 2023

SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks.
CoRR, 2023

MAViL: Masked Audio-Video Learners.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Disentangled Training with Adversarial Examples for Robust Small-Footprint Keyword Spotting.
Proceedings of the IEEE International Conference on Acoustics, 2023

Flap: Fast Language-Audio Pre-Training.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Findings of the 2023 ML-Superb Challenge: Pre-Training And Evaluation Over More Languages And Beyond.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Prompting and Adapter Tuning For Self-Supervised Encoder-Decoder Speech Model.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Introducing Semantics into Speech Encoders.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Expand, Rerank, and Retrieve: Query Reranking for Open-Domain Question Answering.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Self-Supervised Speech Representation Learning: A Review.
IEEE J. Sel. Top. Signal Process., 2022

Introducing Semantics into Speech Encoders.
CoRR, 2022

Meta Learning for Natural Language Processing: A Survey.
CoRR, 2022

QaNER: Prompting Question Answering Models for Few-shot Named Entity Recognition.
CoRR, 2022

Superb @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Exploring Efficient-Tuning Methods in Self-Supervised Speech Models.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Cooperative Self-training of Machine Reading Comprehension.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Meta Learning for Natural Language Processing: A Survey.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation for Automatic Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Mitigating Biases in Toxic Language Detection through Invariant Rationalization.
CoRR, 2021

Meta-learning for downstream aware and agnostic pretraining.
CoRR, 2021

Cooperative Learning of Zero-Shot Machine Reading Comprehension.
CoRR, 2021

Audio Albert: A Lite Bert for Self-Supervised Learning of Audio Representation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Meta Learning to Classify Intent and Slot Labels with Noisy Few Shot Examples.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Supporting Clustering with Contrastive Learning.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

SUPERB: Speech Processing Universal PERformance Benchmark.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Joint Retrieval-Extraction Training for Evidence-Aware Dialog Response Selection.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining.
Proceedings of the IEEE International Conference on Acoustics, 2021

Pairwise Supervised Contrastive Learning of Sentence Representations.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Zero-shot Generalization in Dialog State Tracking through Generative Question Answering.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

2020
Improving Learning Experience in MOOCs with Educational Content Linking.
CoRR, 2020

Towards Semi-Supervised Semantics Understanding from Speech.
CoRR, 2020

Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation.
CoRR, 2020

Prototypical Q Networks for Automatic Conversational Diagnosis and Few-Shot New Disease Adaption.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Style Attuned Pre-Training and Parameter Efficient Fine-Tuning for Spoken Language Understanding.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Knowledge Grounded Conversational Symptom Detection with Graph Memory Networks.
Proceedings of the 3rd Clinical Natural Language Processing Workshop, 2020

2017
Improving learning experience in MOOCs with educational content linking.
PhD thesis, 2017

Learning Robust Dialog Policies in Noisy Environments.
CoRR, 2017

2016
Thread Structure Prediction for MOOC Discussion Forum.
Proceedings of the Social Computing, 2016

Automated Segmentation of MOOC Lectures towards Customized Learning.
Proceedings of the 16th IEEE International Conference on Advanced Learning Technologies, 2016

2015
Linking MOOC courseware to accommodate diverse learner backgrounds.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2015

Structuring lectures in massive open online courses (MOOCs) for efficient learning by linking similar sections and predicting prerequisites.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Would Linked MOOC Courseware Enhance Information Search?
Proceedings of the 15th IEEE International Conference on Advanced Learning Technologies, 2015

Learnersourced Recommendations for Remediation.
Proceedings of the 15th IEEE International Conference on Advanced Learning Technologies, 2015

2014
Data-driven interaction techniques for improving navigation of educational videos.
Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology, 2014

2013
An Experimental Analysis on Integrating Multi-Stream Spectro-Temporal, Cepstral and Pitch Information for Mandarin Speech Recognition.
IEEE Trans. Speech Audio Process., 2013

2011
Improved Tonal Language Speech Recognition by Integrating Spectro-Temporal Evidence and Pitch Information with Properly Chosen Tonal Acoustic Units.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Multi-stream spectro-temporal and cepstral features based on data-driven hierarchical phoneme clusters.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
Improved phoneme recognition by integrating evidence from spectro-temporal and cepstral features.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010


  Loading...