Bryan Catanzaro

Orcid: 0000-0003-0034-7728

Affiliations:
  • Baidu Inc., Sunnyvale, USA
  • University of California, Berkeley, Department of Electrical Engineering and Computer Sciences
  • Brigham Young University, Electrical and Computer Engineering Department


According to our database1, Bryan Catanzaro authored at least 138 papers between 2005 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Progressive Learning of 3D Reconstruction Network From 2D GAN Data.
IEEE Trans. Pattern Anal. Mach. Intell., February, 2024

MIND: Math Informed syNthetic Dialogues for Pretraining LLMs.
CoRR, 2024

OMCAT: Omni Context Aware Transformer.
CoRR, 2024

Upcycling Large Language Models into Mixture of Experts.
CoRR, 2024

Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data.
CoRR, 2024

PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation.
CoRR, 2024

NVLM: Open Frontier-Class Multimodal LLMs.
CoRR, 2024

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders.
CoRR, 2024

LLM Pruning and Distillation in Practice: The Minitron Approach.
CoRR, 2024

Effective Large Language Model Debugging with Best-first Tree Search.
CoRR, 2024

Compact Language Models via Pruning and Knowledge Distillation.
CoRR, 2024

ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities.
CoRR, 2024

Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models.
CoRR, 2024

Data, Data Everywhere: A Guide for Pretraining Dataset Construction.
CoRR, 2024

RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs.
CoRR, 2024

Improving Text-To-Audio Models with Synthetic Captions.
CoRR, 2024

Nemotron-4 340B Technical Report.
CoRR, 2024

An Empirical Study of Mamba-based Language Models.
CoRR, 2024

NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models.
CoRR, 2024

Audio Dialogues: Dialogues dataset for audio and music understanding.
CoRR, 2024

Nemotron-4 15B Technical Report.
CoRR, 2024

ChatQA: Building GPT-4 Level Conversational QA Models.
CoRR, 2024

Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed Video Quality Enhancement.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

ODIN: Disentangled Reward Mitigates Hacking in RLHF.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Retrieval meets Long Context Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Scaling Nvidia's Multi-Speaker Multi-Lingual TTS Systems With Zero-Shot TTS to Indic Languages.
Proceedings of the IEEE International Conference on Acoustics, 2024

LLM-Evolve: Evaluation for LLM's Evolving Capability on Benchmarks.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Data, Data Everywhere: A Guide for Pretraining Dataset Construction.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

CircuitVAE: Efficient and Scalable Latent Circuit Optimization.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

2023
Fine Detailed Texture Learning for 3D Meshes With Generative Models.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

Partial Convolution for Padding, Inpainting, and Image Synthesis.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2023

ChipNeMo: Domain-Adapted LLMs for Chip Design.
CoRR, 2023

RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models.
CoRR, 2023

Multilingual Multiaccented Multispeaker TTS with RADTTS.
CoRR, 2023

P-Flow: A Fast and Data-Efficient Zero-Shot TTS through Speech Prompting.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Reducing Activation Recomputation in Large Transformer Models.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

RAD-MMM: Multilingual Multiaccented Multispeaker Text To Speech.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

BigVGAN: A Universal Neural Vocoder with Large-Scale Training.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

GraPhSyM: Graph Physical Synthesis Model.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

High-Acoustic Fidelity Text To Speech Synthesis With Fine-Grained Control Of Speech Attributes.
Proceedings of the IEEE International Conference on Acoustics, 2023

Any-to-Any Voice Conversion with F0 and Timbre Disentanglement and Novel Timbre Conditioning.
Proceedings of the IEEE International Conference on Acoustics, 2023

Vani: Very-Lightweight Accent-Controllable TTS for Native And Non-Native Speakers With Identity Preservation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Context Generation Improves Open Domain Question Answering.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

Adding Instructions during Pretraining: Effective way of Controlling Toxicity in Language Models.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

Language Models: The Most Important Compute Challenge of Our Time (Keynote).
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
Unsupervised Disentanglement of Pose, Appearance and Background from Images and Videos.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers.
CoRR, 2022

Factuality Enhanced Language Models for Open-Ended Text Generation.
CoRR, 2022

Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows.
CoRR, 2022

Leveraging Bitstream Metadata for Fast and Accurate Video Compression Correction.
CoRR, 2022

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model.
CoRR, 2022

Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Factuality Enhanced Language Models for Open-Ended Text Generation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Efficient Token Mixing for Transformers via Adaptive Fourier Neural Operators.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Speech Denoising in the Waveform Domain With Self-Attention.
Proceedings of the IEEE International Conference on Acoustics, 2022

One TTS Alignment to Rule Them All.
Proceedings of the IEEE International Conference on Acoustics, 2022

Evaluating Parameter Efficient Learning for Generation.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Multi-Stage Prompting for Knowledgeable Dialogue Generation.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

2021
Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases.
CoRR, 2021

Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers.
CoRR, 2021

Guiding Global Placement With Reinforcement Learning.
CoRR, 2021

Efficient Large-Scale Language Model Training on GPU Clusters.
CoRR, 2021

Efficient large-scale language model training on GPU clusters using megatron-LM.
Proceedings of the International Conference for High Performance Computing, 2021

Long-Short Transformer: Efficient Transformers for Language and Vision.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis.
Proceedings of the 9th International Conference on Learning Representations, 2021

DiffWave: A Versatile Diffusion Model for Audio Synthesis.
Proceedings of the 9th International Conference on Learning Representations, 2021

Dual Contrastive Loss and Attention for GANs.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

PrefixRL: Optimization of Parallel Prefix Circuits using Deep Reinforcement Learning.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

View Generalization for Single Image Textured 3D Models.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

End-to-End Training of Neural Retrievers for Open-Domain Question Answering.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
Accelerating Chip Design With Machine Learning.
IEEE Micro, 2020

Local Knowledge Powered Conversational Agents.
CoRR, 2020

Transposer: Universal Texture Synthesis Using Feature Maps as Transposed Convolution Filter.
CoRR, 2020

Hierarchical Multi-Scale Attention for Semantic Segmentation.
CoRR, 2020

Neural FFTs for Universal Texture Image Synthesis.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Mellotron: Multispeaker Expressive Voice Synthesis by Conditioning on Rhythm, Pitch and Global Style Tokens.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Training Question Answering Models From Synthetic Data.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Panoptic-Based Image Synthesis.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Large Scale Multi-Actor Generative Dialog Modeling.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019
Neural ODEs for Image Segmentation with Level Sets.
CoRR, 2019

Zero-shot Text Classification With Generative Language Models.
CoRR, 2019

Improving SAT Solver Heuristics with Graph Networks and Reinforcement Learning.
CoRR, 2019

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism.
CoRR, 2019

Video Interpolation and Prediction with Unsupervised Landmarks.
CoRR, 2019

SysML: The New Frontier of Machine Learning Systems.
CoRR, 2019

Graphical Contrastive Losses for Scene Graph Generation.
CoRR, 2019

CongestionNet: Routing Congestion Prediction Using Deep Graph Neural Networks.
Proceedings of the 27th IFIP/IEEE International Conference on Very Large Scale Integration, 2019

Few-shot Video-to-Video Synthesis.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Unsupervised Video Interpolation Using Cycle Consistency.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Waveglow: A Flow-based Generative Network for Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2019

Graphical Contrastive Losses for Scene Graph Parsing.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Improving Semantic Segmentation via Video Propagation and Label Relaxation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Practical Text Classification With Large Pre-Trained Language Models.
CoRR, 2018

Partial Convolution based Padding.
CoRR, 2018

An Interpretable Model for Scene Graph Generation.
CoRR, 2018

SDCNet: Video Prediction Using Spatially-Displaced Convolution.
CoRR, 2018

Introduction to the 1st Place Winning Model of OpenImages Relationship Detection Challenge.
CoRR, 2018

Video-to-Video Synthesis.
CoRR, 2018

Large Scale Language Modeling: Converging on 40GB of Text in Four Hours.
Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018

Video-to-Video Synthesis.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

SDC-Net: Video Prediction Using Spatially-Displaced Convolution.
Proceedings of the Computer Vision - ECCV 2018, 2018

Image Inpainting for Irregular Holes Using Partial Convolutions.
Proceedings of the Computer Vision - ECCV 2018, 2018

High-Resolution Image Synthesis and Semantic Manipulation With Conditional GANs.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Malware Detection by Eating a Whole EXE.
Proceedings of the Workshops of the The Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
DSD: Dense-Sparse-Dense Training for Deep Neural Networks.
Proceedings of the 5th International Conference on Learning Representations, 2017

2016
DSD: Regularizing Deep Neural Networks with Dense-Sparse-Dense Training Flow.
CoRR, 2016

Persistent RNNs: Stashing Recurrent Weights On-Chip.
Proceedings of the 33nd International Conference on Machine Learning, 2016


2015
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin.
CoRR, 2015

A collection-oriented programming model for performance portability.
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

2014
Deep Speech: Scaling up end-to-end speech recognition.
CoRR, 2014

cuDNN: Efficient Primitives for Deep Learning.
CoRR, 2014

A decomposition for in-place matrix transposition.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Nitro: A Framework for Adaptive Code Variant Tuning.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

2013
GPU Scripting and Code Generation with PyCUDA
CoRR, 2013

Deep learning with COTS HPC systems.
Proceedings of the 30th International Conference on Machine Learning, 2013

2012
PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation.
Parallel Comput., 2012

2011
Compilation Techniques for Embedded Data Parallel Languages.
PhD thesis, 2011

Copperhead: compiling an embedded data parallel language.
Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011

Considerations When Evaluating Microprocessor Platforms.
Proceedings of the 3rd USENIX Workshop on Hot Topics in Parallelism, 2011

PALLAS: Mapping Applications onto Manycore.
Proceedings of the Multiprocessor System-on-Chip - Hardware Design and Tool Integration., 2011

2010
Ubiquitous Parallel Computing from Berkeley, Illinois, and Stanford.
IEEE Micro, 2010

Parallel computing with patterns and frameworks.
XRDS, 2010

2009
PyCUDA: GPU Run-Time Code Generation for High-Performance Computing
CoRR, 2009

Efficient, high-quality image contour detection.
Proceedings of the IEEE 12th International Conference on Computer Vision, ICCV 2009, Kyoto, Japan, September 27, 2009

2008
Fast support vector machine training and classification on graphics processors.
Proceedings of the Machine Learning, 2008

Parallelizing CAD: a timely research agenda for EDA.
Proceedings of the 45th Design Automation Conference, 2008

2007
Efficient Parallelization of H.264 Decoding with Macro Block Level Scheduling.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

2005
Choice of base revisited: higher radices for FPGA-based floating-point computation (abstract only).
Proceedings of the ACM/SIGDA 13th International Symposium on Field Programmable Gate Arrays, 2005

Higher Radix Floating-Point Representations for FPGA-Based Arithmetic.
Proceedings of the 13th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2005), 2005


  Loading...