We stand with Ukraine

We stand with Ukraine

Michael Zeng

Orcid: 0000-0001-5302-5883

Affiliations:

Microsoft, Redmond, WA, USA

According to our database¹, Michael Zeng authored at least 92 papers between 2018 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

Online presence:

on microsoft.com
on orcid.org

On csauthors.net:

Bibliography

2024

Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Chung-Hsien Tsai

,

,

,

,

,

,

,

CoRR, 2024

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like.

[BibT_eX]

[DOI]

,

,

Sefik Emre Eskimez

,

Manthan Thakker

,

,

,

,

,

Chung-Hsien Tsai

,

,

,

,

,

,

CoRR, 2024

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data.

[BibT_eX]

[DOI]

,

Mahmoud Khademi

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Takuya Yoshioka

,

,

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Improving Readability for Automatic Speech Recognition Transcription.

[BibT_eX]

[DOI]

,

Sefik Emre Eskimez

,

,

,

,

,

,

ACM Trans. Asian Low Resour. Lang. Inf. Process., May, 2023

MACSum: Controllable Summarization with Mixed Attributes.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Trans. Assoc. Comput. Linguistics, 2023

Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2023

ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2023

i-Code Studio: A Configurable and Composable Framework for Integrative AI.

[BibT_eX]

[DOI]

,

Mahmoud Khademi

,

,

,

,

,

,

Takuya Yoshioka

,

,

,

CoRR, 2023

LMGQS: A Large-scale Dataset for Query-focused Summarization.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, 2023

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data.

[BibT_eX]

[DOI]

,

Mahmoud Khademi

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Takuya Yoshioka

,

,

CoRR, 2023

MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action.

[BibT_eX]

[DOI]

,

,

,

,

Ehsan Azarnasab

,

,

,

,

,

CoRR, 2023

Any-to-Any Generation via Composable Diffusion.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Adapting Multi-Lingual ASR Models for Handling Multiple Talkers.

[BibT_eX]

[DOI]

,

,

,

,

,

Takuya Yoshioka

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Generate rather than Retrieve: Large Language Models are Strong Context Generators.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Code-Switching Text Generation and Injection in Mandarin-English ASR.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

DATA2VEC-SG: Improving Self-Supervised Learning Representations for Speech Generation Tasks.

[BibT_eX]

[DOI]

,

,

,

,

,

Takuya Yoshioka

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Target Sound Extraction with Variable Cross-Modality Clues.

[BibT_eX]

[DOI]

,

,

,

,

Takuya Yoshioka

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

LMGQS: A Large-scale Dataset for Query-focused Summarization.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Automatic Prompt Optimization with "Gradient Descent" and Beam Search.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

ReCo: Region-Controlled Text-to-Image Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Unifying Vision, Text, and Layout for Universal Document Processing.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

UniSumm and SummZoo: Unified Model and Diverse Benchmark for Few-Shot Summarization.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

i-Code: An Integrative and Composable Multimodal Learning Framework.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Takuya Yoshioka

,

,

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Takuya Yoshioka

,

,

,

,

,

,

,

,

,

,

IEEE J. Sel. Top. Signal Process., 2022

UniSumm: Unified Few-shot Summarization with Multi-Task Pre-Training and Prefix-Tuning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, 2022

Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Hany Hassan Awadalla

,

,

,

,

,

,

CoRR, 2022

Impossible Triangle: What's Next for Pre-trained Language Models?

[BibT_eX]

[DOI]

,

CoRR, 2022

Unsupervised Summarization with Customized Granularities.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2022

A Comprehensive Study on Self-Supervised Distillation for Speaker Representation Learning.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Optimizing Alignment of Speech and Language Latent Spaces for End-To-End Speech Recognition and Understanding.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

Large-Scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

Unsupervised Multi-Granularity Summarization.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Narrate Dialogues for Better Summarization.

[BibT_eX]

[DOI]

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

ParaTag: A Dataset of Paraphrase Tagging for Fine-Grained Labels, NLG Evaluation, and Data Augmentation.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Automatic Rule Induction for Efficient Semi-Supervised Learning.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Task Compass: Scaling Multi-task Pre-training with Task Prefix.

[BibT_eX]

[DOI]

Zhuosheng Zhang

,

,

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

AdaPrompt: Adaptive Model Training for Prompt-based NLP.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

CLIP-Event: Connecting Text and Images with Event Structures.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

An Empirical Study of Training End-to-End Vision-and-Language Transformers.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Pengchuan Zhang

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Training Data is More Valuable than You Think: A Simple and Effective Method by Retrieving from Training Data.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

End-to-End Segmentation-based News Summarization.

[BibT_eX]

[DOI]

,

,

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

Leveraging Knowledge in Multilingual Commonsense Reasoning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

Dict-BERT: Enhancing Language Model Pre-training with Dictionary.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

JAKET: Joint Pre-training of Knowledge Graph and Language Understanding.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

MLP Architectures for Vision-and-Language Modeling: An Empirical Study.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2021

Florence: A New Foundation Model for Computer Vision.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Pengchuan Zhang

CoRR, 2021

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Takuya Yoshioka

,

,

,

,

,

,

,

,

,

CoRR, 2021

Does Knowledge Help General NLU? An Empirical Study.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2021

A Joint and Domain-Adaptive Approach to Spoken Language Understanding.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2021

Leveraging Lead Bias for Zero-shot Abstractive News Summarization.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

MediaSum: A Large-scale Media Interview Dataset for Dialogue Summarization.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Enhancing Factual Consistency of Abstractive Summarization.

[BibT_eX]

[DOI]

,

William Hinthorn

,

,

,

,

,

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding.

[BibT_eX]

[DOI]

,

,

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Data Augmentation for Spoken Language Understanding via Pretrained Language Models.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Improving Zero-shot Neural Machine Translation on Language-specific Encoders- Decoders.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the International Joint Conference on Neural Networks, 2021

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data.

[BibT_eX]

[DOI]

,

,

,

Ken'ichi Kumatani

,

,

,

,

Proceedings of the 38th International Conference on Machine Learning, 2021

Speech-Language Pre-Training for End-to-End Spoken Language Understanding.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Generating Human Readable Transcript for Automatic Speech Recognition with Pre-Trained Language Model.

[BibT_eX]

[DOI]

,

,

,

,

Sefik Emre Eskimez

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Want To Reduce Labeling Cost? GPT-3 Can Help.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

Fusing Context Into Knowledge Graph for Commonsense Question Answering.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

Retrieval Enhanced Model for Commonsense Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

2020

Fusing Context Into Knowledge Graph for Commonsense Reasoning.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2020

LSTM-LM with Long-Term History for First-Pass Decoding in Conversational Speech Recognition.

[BibT_eX]

[DOI]

,

Sarangarajan Parthasarathy

,

,

,

CoRR, 2020

Semi-Supervised Speech-Language Joint Pre-Training for Spoken Language Understanding.

[BibT_eX]

[DOI]

,

,

CoRR, 2020

Mind The Facts: Knowledge-Boosted Coherent Abstractive Text Summarization.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2020

Meta Dialogue Policy Learning.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2020

Data Augmentation for Spoken Language Understanding via Pretrained Models.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2020

End-to-End Abstractive Summarization for Meetings.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2020

Boosting Factual Correctness of Abstractive Summarization with Knowledge Graph.

[BibT_eX]

[DOI]

,

William Hinthorn

,

,

,

,

,

CoRR, 2020

Discriminative Transfer Learning for Optimizing ASR and Semantic Labeling in Task-Oriented Spoken Dialog.

[BibT_eX]

[DOI]

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Sequence-Level Self-Learning with Multiple Hypotheses.

[BibT_eX]

[DOI]

Ken'ichi Kumatani

,

Dimitrios Dimitriadis

,

,

,

Sefik Emre Eskimez

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Mixed-Lingual Pre-training for Cross-lingual Summarization.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 2020

A Hierarchical Network for Abstractive Meeting Summarization with Cross-Domain Pretraining.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

TED: A Pretrained Unsupervised Summarization Model with Theme Modeling and Denoising.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Few-shot Natural Language Generation for Task-Oriented Dialog.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

2019

Make Lead Bias in Your Favor: A Simple and Effective Method for News Summarization.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, 2019

Meeting Transcription Using Virtual Microphone Arrays.

[BibT_eX]

[DOI]

Takuya Yoshioka

,

,

Dimitrios Dimitriadis

,

William Hinthorn

,

,

Andreas Stolcke

,

CoRR, 2019

SIM: A Slot-Independent Neural Model for Dialogue State Tracking.

[BibT_eX]

[DOI]

,

,

Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue, 2019

Meeting Transcription Using Asynchronous Distant Microphones.

[BibT_eX]

[DOI]

Takuya Yoshioka

,

Dimitrios Dimitriadis

,

Andreas Stolcke

,

William Hinthorn

,

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Multi-task Learning for Natural Language Generation in Task-Oriented Dialogue.

[BibT_eX]

[DOI]

,

,

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

2018

SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering.

[BibT_eX]

[DOI]

,

,

CoRR, 2018

Loading...