Wei-Ning Hsu

Orcid: 0000-0001-5546-5217

According to our database1, Wei-Ning Hsu authored at least 89 papers between 2015 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization.
CoRR, 2024

XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception.
CoRR, 2024

2023
Audiobox: Unified Audio Generation with Natural Language Prompts.
CoRR, 2023

Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency.
CoRR, 2023

Generative Pre-training for Speech with Flow Matching.
CoRR, 2023

Low-Resource Self-Supervised Learning with SSL-Enhanced TTS.
CoRR, 2023

EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis.
CoRR, 2023

Scaling Speech Technology to 1, 000+ Languages.
CoRR, 2023

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation.
CoRR, 2023

Efficient Speech Representation Learning with Low-Bit Quantization.
CoRR, 2023

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language.
Proceedings of the International Conference on Machine Learning, 2023

Scaling Laws for Generative Mixed-Modal Language Models.
Proceedings of the International Conference on Machine Learning, 2023

Measuring the Impact of Domain Factors in Self-Supervised Pre-Training.
Proceedings of the IEEE International Conference on Acoustics, 2023

Cocktail Hubert: Generalized Self-Supervised Pre-Training for Mixture and Single-Source Speech.
Proceedings of the IEEE International Conference on Acoustics, 2023

Do Coarser Units Benefit Cluster Prediction-Based Speech Pre-Training?
Proceedings of the IEEE International Conference on Acoustics, 2023

Continual Learning for On-Device Speech Recognition Using Disentangled Conformers.
Proceedings of the IEEE International Conference on Acoustics, 2023

Toward Joint Language Modeling for Speech Units and Text.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Regeneration.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Av-Data2Vec: Self-Supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Simple and Effective Unsupervised Speech Translation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Speech-to-Speech Translation for a Real-world Unwritten Language.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement.
CoRR, 2022

STOP: A dataset for Spoken Task Oriented Semantic Parsing.
CoRR, 2022

A Single Self-Supervised Model for Many Speech Modalities Enables Zero-Shot Modality Transfer.
CoRR, 2022

Generative Spoken Dialogue Language Modeling.
CoRR, 2022

Measuring the Impact of Individual Domain Factors in Self-Supervised Pre-Training.
CoRR, 2022

textless-lib: a Library for Textless Spoken Language Processing.
CoRR, 2022

Stop: A Dataset for Spoken Task Oriented Semantic Parsing.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Towards End-to-End Unsupervised Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Textless Speech-to-Speech Translation on Real Data.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

On-demand compute reduction with stochastic wav2vec 2.0.
Proceedings of the Interspeech 2022, 2022

Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT.
Proceedings of the Interspeech 2022, 2022

Robust Self-Supervised Audio-Visual Speech Recognition.
Proceedings of the Interspeech 2022, 2022

Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation.
Proceedings of the Interspeech 2022, 2022

Simple and Effective Unsupervised Speech Synthesis.
Proceedings of the Interspeech 2022, 2022

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language.
Proceedings of the International Conference on Machine Learning, 2022

Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Textless Speech Emotion Conversion using Discrete & Decomposed Representations.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Unified Speech-Text Pre-training for Speech Translation and Recognition.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Direct Speech-to-Speech Translation With Discrete Units.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Text-Free Prosody-Aware Generative Spoken Language Modeling.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Textless Speech-to-Speech Translation on Real Data.
CoRR, 2021

Textless Speech Emotion Conversion using Decomposed and Discrete Representations.
CoRR, 2021

Direct simultaneous speech to speech translation.
CoRR, 2021

Direct speech-to-speech translation with discrete units.
CoRR, 2021

Generative Spoken Language Modeling from Raw Audio.
CoRR, 2021

Semi-Supervised end-to-end Speech Recognition via Local Prior Matching.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Unsupervised Speech Recognition.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Hubert: How Much Can a Bad Teacher Benefit ASR Pre-Training?
Proceedings of the IEEE International Conference on Acoustics, 2021

fairseq S\^2: A Scalable and Integrable Speech Synthesis Toolkit.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2021

Kaizen: Continuously Improving Teacher Using Exponential Moving Average for Semi-Supervised Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Text-Free Image-to-Speech Synthesis Using Learned Segmental Units.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units.
CoRR, 2020

Differentiable Weighted Finite-State Transducers.
CoRR, 2020

Semi-Supervised Speech Recognition via Local Prior Matching.
CoRR, 2020

A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning.
Proceedings of the Interspeech 2020, 2020

Unsupervised Methods for Evaluating Speech Representations.
Proceedings of the Interspeech 2020, 2020

Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech.
Proceedings of the 8th International Conference on Learning Representations, 2020

2019
Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling.
CoRR, 2019

Transfer Learning from Audio-Visual Grounding to Speech Recognition.
Proceedings of the Interspeech 2019, 2019

An Unsupervised Autoregressive Model for Speech Representation Learning.
Proceedings of the Interspeech 2019, 2019

Hierarchical Generative Modeling for Controllable Speech Synthesis.
Proceedings of the 7th International Conference on Learning Representations, 2019

Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization.
Proceedings of the IEEE International Conference on Acoustics, 2019

Semi-supervised Training for Improving Data Efficiency in End-to-end Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2019

2018
Disentangling by Partitioning: A Representation Learning Framework for Multimodal Sensory Data.
CoRR, 2018

Unsupervised Representation Learning of Speech for Dialect Identification.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

A Study of Enhancement, Augmentation and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition.
Proceedings of the Interspeech 2018, 2018

Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition.
Proceedings of the Interspeech 2018, 2018

Scalable Factorized Hierarchical Variational Autoencoder Training.
Proceedings of the Interspeech 2018, 2018

A Noise-Robust Self-Adaptive Multitarget Speaker Detection System.
Proceedings of the 24th International Conference on Pattern Recognition, 2018

Extracting Domain Invariant Features by Unsupervised Learning for Robust Automatic Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Learning Latent Representations for Speech Generation and Transformation.
Proceedings of the Interspeech 2017, 2017

Automatic speech recognition of Arabic multi-genre broadcast media.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2016
Recurrent Neural Network Encoder with Attention for Community Question Answering.
CoRR, 2016

A prioritized grid long short-term memory RNN for speech recognition.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Development of the MIT ASR system for the 2016 Arabic Multi-genre Broadcast Challenge.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

SLS at SemEval-2016 Task 3: Neural-based Approaches for Ranking in Community Question Answering.
Proceedings of the 10th International Workshop on Semantic Evaluation, 2016

Exploiting Depth and Highway Connections in Convolutional Recurrent Deep Neural Networks for Speech Recognition.
Proceedings of the Interspeech 2016, 2016

Neural Attention for Learning to Rank Questions in Community Question Answering.
Proceedings of the COLING 2016, 2016

2015
Enhancing automatically discovered multi-level acoustic patterns considering context consistency with applications in spoken term detection.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Active Learning by Learning.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015


  Loading...