We stand with Ukraine

We stand with Ukraine

Mohammad Shoeybi

According to our database¹, Mohammad Shoeybi authored at least 55 papers between 2006 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

MIND: Math Informed syNthetic Dialogues for Pretraining LLMs.

[BibT_eX]

[DOI]

Syeda Nahida Akter

,

Shrimai Prabhumoye

,

,

Sanjeev Satheesh

,

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Bryan Catanzaro

CoRR, 2024

Upcycling Large Language Models into Mixture of Experts.

[BibT_eX]

[DOI]

,

Abhinav Khattar

,

,

Vijay Korthikanti

,

,

,

,

,

Mohammad Shoeybi

,

Bryan Catanzaro

CoRR, 2024

NVLM: Open Frontier-Class Multimodal LLMs.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Tuomas Rintamaki

,

Mohammad Shoeybi

,

Bryan Catanzaro

,

CoRR, 2024

LLM Pruning and Distillation in Practice: The Minitron Approach.

[BibT_eX]

[DOI]

Sharath Turuvekere Sreenivas

,

Saurav Muralidharan

,

,

Marcin Chochowski

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Bryan Catanzaro

,

,

Pavlo Molchanov

CoRR, 2024

Compact Language Models via Pruning and Knowledge Distillation.

[BibT_eX]

[DOI]

Saurav Muralidharan

,

Sharath Turuvekere Sreenivas

,

,

Marcin Chochowski

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Bryan Catanzaro

,

,

Pavlo Molchanov

CoRR, 2024

ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities.

[BibT_eX]

[DOI]

,

,

,

,

Mohammad Shoeybi

,

Bryan Catanzaro

CoRR, 2024

Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models.

[BibT_eX]

[DOI]

Jupinder Parmar

,

Sanjeev Satheesh

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Bryan Catanzaro

CoRR, 2024

Data, Data Everywhere: A Guide for Pretraining Dataset Construction.

[BibT_eX]

[DOI]

Jupinder Parmar

,

Shrimai Prabhumoye

,

Joseph Jennings

,

,

Aastha Jhunjhunwala

,

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Bryan Catanzaro

CoRR, 2024

RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Mohammad Shoeybi

,

Bryan Catanzaro

CoRR, 2024

Nemotron-4 340B Technical Report.

[BibT_eX]

[DOI]

,

,

,

,

Pallab Bhattacharya

,

,

,

Bryan Catanzaro

,

,

Jonathan M. Cohen

,

,

Ayush Dattagupta

,

Olivier Delalleau

,

Leon Derczynski

,

,

,

,

Aleksander Ficek

,

,

,

,

,

Tomasz Grzegorzek

,

,

,

,

Joseph Jennings

,

Aastha Jhunjhunwala

,

,

,

Oleksii Kuchaiev

,

Patrick LeGresley

,

,

,

,

,

Ameya Sunil Mahabaleshwarkar

,

Somshubra Majumdar

,

,

Miguel Martinez

,

Maer Rodrigues de Melo

,

,

Deepak Narayanan

,

Sean Narenthiran

,

,

,

,

,

Guruprasad Nutheti

,

Christopher Parisien

,

Jupinder Parmar

,

Mostofa Patwary

,

Krzysztof Pawelec

,

,

Shrimai Prabhumoye

,

,

,

Vasanth Rao Naik Sabavat

,

Sanjeev Satheesh

,

Jane Polak Scowcroft

,

,

,

,

Mohammad Shoeybi

,

,

Misha Smelyanskiy

,

,

Makesh Narsimhan Sreedhar

,

,

Sandeep Subramanian

,

,

Shubham Toshniwal

,

,

,

,

,

,

,

,

,

CoRR, 2024

An Empirical Study of Mamba-based Language Models.

[BibT_eX]

[DOI]

,

,

,

,

Vijay Korthikanti

,

,

,

Ali Hatamizadeh

,

,

Deepak Narayanan

,

Garvit Kulshreshtha

,

,

,

,

Mohammad Shoeybi

,

Bryan Catanzaro

CoRR, 2024

NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models.

[BibT_eX]

[DOI]

,

,

,

Jonathan Raiman

,

Mohammad Shoeybi

,

Bryan Catanzaro

,

CoRR, 2024

Nemotron-4 15B Technical Report.

[BibT_eX]

[DOI]

CoRR, 2024

ChatQA: Building GPT-4 Level Conversational QA Models.

[BibT_eX]

[DOI]

,

,

,

,

,

Mohammad Shoeybi

,

Bryan Catanzaro

CoRR, 2024

InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining.

[BibT_eX]

[DOI]

,

,

Lawrence McAfee

,

,

,

Mohammad Shoeybi

,

Bryan Catanzaro

Proceedings of the Forty-first International Conference on Machine Learning, 2024

ODIN: Disentangled Reward Mitigates Hacking in RLHF.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Mohammad Shoeybi

,

Bryan Catanzaro

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Retrieval meets Long Context Large Language Models.

[BibT_eX]

[DOI]

,

,

,

Lawrence McAfee

,

,

,

Sandeep Subramanian

,

Evelina Bakhturina

,

Mohammad Shoeybi

,

Bryan Catanzaro

Proceedings of the Twelfth International Conference on Learning Representations, 2024

LLM-Evolve: Evaluation for LLM's Evolving Capability on Benchmarks.

[BibT_eX]

[DOI]

,

,

Shrimai Prabhumoye

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Bryan Catanzaro

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Data, Data Everywhere: A Guide for Pretraining Dataset Construction.

[BibT_eX]

[DOI]

Jupinder Parmar

,

Shrimai Prabhumoye

,

Joseph Jennings

,

,

Aastha Jhunjhunwala

,

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Bryan Catanzaro

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

VILA: On Pre-training for Visual Language Models.

[BibT_eX]

[DOI]

,

,

,

Pavlo Molchanov

,

Mohammad Shoeybi

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

VILA: On Pre-training for Visual Language Models.

[BibT_eX]

[DOI]

,

,

,

,

Pavlo Molchanov

,

,

,

,

Mohammad Shoeybi

,

CoRR, 2023

RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models.

[BibT_eX]

[DOI]

,

,

,

Mohammad Shoeybi

,

Kevin Chen-Chuan Chang

,

Bryan Catanzaro

CoRR, 2023

Reducing Activation Recomputation in Large Transformer Models.

[BibT_eX]

[DOI]

Vijay Anand Korthikanti

,

,

,

Lawrence McAfee

,

Michael Andersch

,

Mohammad Shoeybi

,

Bryan Catanzaro

Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning.

[BibT_eX]

[DOI]

,

,

,

Vijay Korthikanti

,

,

,

,

,

,

,

Mohammad Shoeybi

,

,

,

Bryan Catanzaro

,

,

Anima Anandkumar

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study.

[BibT_eX]

[DOI]

,

,

,

Lawrence McAfee

,

,

Mohammad Shoeybi

,

,

Oleksii Kuchaiev

,

,

,

Anima Anandkumar

,

Bryan Catanzaro

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Context Generation Improves Open Domain Question Answering.

[BibT_eX]

[DOI]

,

Mostofa Patwary

,

Shrimai Prabhumoye

,

,

,

Mohammad Shoeybi

,

,

Anima Anandkumar

,

Bryan Catanzaro

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

Adding Instructions during Pretraining: Effective way of Controlling Toxicity in Language Models.

[BibT_eX]

[DOI]

Shrimai Prabhumoye

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Bryan Catanzaro

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

2022

FP8 Formats for Deep Learning.

[BibT_eX]

[DOI]

Paulius Micikevicius

,

,

,

,

,

Richard Grisenthwaite

,

,

Alexander Heinecke

,

,

,

Naveen Mellempudi

,

Stuart F. Oberman

,

Mohammad Shoeybi

,

,

CoRR, 2022

Factuality Enhanced Language Models for Open-Ended Text Generation.

[BibT_eX]

[DOI]

,

,

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Bryan Catanzaro

CoRR, 2022

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model.

[BibT_eX]

[DOI]

CoRR, 2022

Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models.

[BibT_eX]

[DOI]

,

,

,

,

Mostofa Patwary

,

Mohammad Shoeybi

,

,

Anima Anandkumar

,

Bryan Catanzaro

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Factuality Enhanced Language Models for Open-Ended Text Generation.

[BibT_eX]

[DOI]

,

,

,

Mostofa Patwary

,

,

Mohammad Shoeybi

,

Bryan Catanzaro

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Prompt Compression and Contrastive Conditioning for Controllability and Toxicity Reduction in Language Models.

[BibT_eX]

[DOI]

,

Mohammad Shoeybi

,

Taylor Sorensen

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Evaluating Parameter Efficient Learning for Generation.

[BibT_eX]

[DOI]

,

Mostofa Patwary

,

Shrimai Prabhumoye

,

,

,

,

,

Mohammad Shoeybi

,

Bryan Catanzaro

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Multi-Stage Prompting for Knowledgeable Dialogue Generation.

[BibT_eX]

[DOI]

,

Mostofa Patwary

,

,

Shrimai Prabhumoye

,

,

Mohammad Shoeybi

,

Bryan Catanzaro

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

2021

Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases.

[BibT_eX]

[DOI]

Shrimai Prabhumoye

,

Rafal Kocielnik

,

Mohammad Shoeybi

,

Anima Anandkumar

,

Bryan Catanzaro

CoRR, 2021

Efficient Large-Scale Language Model Training on GPU Clusters.

[BibT_eX]

[DOI]

Deepak Narayanan

,

Mohammad Shoeybi

,

,

Patrick LeGresley

,

Mostofa Patwary

,

Vijay Korthikanti

,

Dmitri Vainbrand

,

Prethvi Kashinkunti

,

,

Bryan Catanzaro

,

Amar Phanishayee

,

CoRR, 2021

Efficient large-scale language model training on GPU clusters using megatron-LM.

[BibT_eX]

[DOI]

Deepak Narayanan

,

Mohammad Shoeybi

,

,

Patrick LeGresley

,

Mostofa Patwary

,

Vijay Korthikanti

,

Dmitri Vainbrand

,

Prethvi Kashinkunti

,

,

Bryan Catanzaro

,

Amar Phanishayee

,

Proceedings of the International Conference for High Performance Computing, 2021

Long-Short Transformer: Efficient Transformers for Language and Vision.

[BibT_eX]

[DOI]

,

,

,

Mohammad Shoeybi

,

,

Anima Anandkumar

,

Bryan Catanzaro

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

End-to-End Training of Neural Retrievers for Open-Domain Question Answering.

[BibT_eX]

[DOI]

Devendra Singh Sachan

,

Mostofa Patwary

,

Mohammad Shoeybi

,

,

,

William L. Hamilton

,

Bryan Catanzaro

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020

Local Knowledge Powered Conversational Agents.

[BibT_eX]

[DOI]

Sashank Santhanam

,

,

,

Mohammad Shoeybi

,

Mostofa Patwary

,

Bryan Catanzaro

CoRR, 2020

Style Example-Guided Text Generation using Generative Adversarial Transformers.

[BibT_eX]

[DOI]

,

Mohammad Shoeybi

,

CoRR, 2020

MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models.

[BibT_eX]

[DOI]

,

Mostofa Patwary

,

Mohammad Shoeybi

,

,

,

Anima Anandkumar

,

Bryan Catanzaro

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

BioMegatron: Larger Biomedical Domain Language Model.

[BibT_eX]

[DOI]

,

,

Evelina Bakhturina

,

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Training Question Answering Models From Synthetic Data.

[BibT_eX]

[DOI]

,

,

Mohammad Shoeybi

,

Mostofa Patwary

,

Bryan Catanzaro

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Large Scale Multi-Actor Generative Dialog Modeling.

[BibT_eX]

[DOI]

,

,

Mohammad Shoeybi

,

Mostofa Patwary

,

Bryan Catanzaro

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019

Neural ODEs for Image Segmentation with Level Sets.

[BibT_eX]

[DOI]

,

,

Mohammad Shoeybi

,

Patrick LeGresley

,

,

Bryan Catanzaro

CoRR, 2019

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism.

[BibT_eX]

[DOI]

Mohammad Shoeybi

,

Mostofa Patwary

,

,

Patrick LeGresley

,

,

Bryan Catanzaro

CoRR, 2019

Unsupervised Video Interpolation Using Cycle Consistency.

[BibT_eX]

[DOI]

,

,

,

Mohammad Shoeybi

,

,

,

,

,

Bryan Catanzaro

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

2017

Trace norm regularization and faster inference for embedded speech recognition RNNs.

[BibT_eX]

[DOI]

,

Siddharth Goyal

,

,

,

Mohammad Shoeybi

CoRR, 2017

Deep Voice: Real-time Neural Text-to-Speech.

[BibT_eX]

[DOI]

Sercan Ömer Arik

,

Mike Chrzanowski

,

,

,

Andrew Gibiansky

,

,

,

,

Jonathan Raiman

,

Shubho Sengupta

,

Mohammad Shoeybi

CoRR, 2017

Deep Voice: Real-time Neural Text-to-Speech.

[BibT_eX]

[DOI]

Sercan Ömer Arik

,

Mike Chrzanowski

,

,

Gregory Frederick Diamos

,

Andrew Gibiansky

,

,

,

,

,

Jonathan Raiman

,

Shubho Sengupta

,

Mohammad Shoeybi

Proceedings of the 34th International Conference on Machine Learning, 2017

2010

An adaptive implicit-explicit scheme for the DNS and LES of compressible flows on unstructured grids.

[BibT_eX]

[DOI]

Mohammad Shoeybi

,

,

,

J. Comput. Phys., 2010

2008

Stable and accurate schemes for the compressible Navier-Stokes equations.

[BibT_eX]

[DOI]

,

,

Mohammad Shoeybi

J. Comput. Phys., 2008

2006

Towards Wall-Normal Filtering for Large-Eddy Simulation.

[BibT_eX]

[DOI]

Jeremy A. Templeton

,

Mohammad Shoeybi

Multiscale Model. Simul., 2006

Loading...