Quentin Anthony

Orcid: 0000-0002-6823-9080

According to our database1, Quentin Anthony authored at least 36 papers between 2019 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Simple and Scalable Strategies to Continually Pre-train Large Language Models.
Trans. Mach. Learn. Res., 2024

Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters.
CoRR, 2024

Zyda: A 1.3T Dataset for Open Language Modeling.
CoRR, 2024

Zamba: A Compact 7B SSM Hybrid Model.
CoRR, 2024

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence.
CoRR, 2024

BlackMamba: Mixture of Experts for State-Space Models.
CoRR, 2024

Infer-HiRes: Accelerating Inference for High-Resolution Images with Quantization and Distributed Deep Learning.
Proceedings of the Practice and Experience in Advanced Research Computing 2024: Human Powered Computing, 2024

Comparative Study of Large Language Model Architectures on Frontier.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

The Case for Co-Designing Model Architectures with Hardware.
Proceedings of the 53rd International Conference on Parallel Processing, 2024

Demystifying the Communication Characteristics for Distributed Transformer Models.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2024

Accelerating Large Language Model Training with Hybrid GPU-based Compression.
Proceedings of the 24th IEEE International Symposium on Cluster, 2024

2023
Continual Pre-Training of Large Language Models: How to (re)warm your model?
CoRR, 2023

RWKV: Reinventing RNNs for the Transformer Era.
CoRR, 2023

Emergent and Predictable Memorization in Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Accelerating Distributed Deep Learning Training with Compression Assisted Allgather and Reduce-Scatter Communication.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

MCR-DL: Mix-and-Match Communication Runtime for Deep Learning.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling.
Proceedings of the International Conference on Machine Learning, 2023


trlX: A Framework for Large Scale Reinforcement Learning from Human Feedback.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

ScaMP: Scalable Meta-Parallelism for Deep Learning Search.
Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, 2023

2022
GPT-NeoX-20B: An Open-Source Autoregressive Language Model.
CoRR, 2022

Accelerating MPI All-to-All Communication with Online Compression on Modern GPU Clusters.
Proceedings of the High Performance Computing - 37th International Conference, 2022

Hy-Fi: Hybrid Five-Dimensional Parallel DNN Training on High-Performance GPU Clusters.
Proceedings of the High Performance Computing - 37th International Conference, 2022

Highly Efficient Alltoall and Alltoallv Communication Algorithms for GPU Systems.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Accelerating Broadcast Communication with GPU Compression for Deep Learning Workloads.
Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022

2021
Cross-layer Visualization and Profiling of Network and I/O Communication for HPC Clusters.
CoRR, 2021

Evaluating Multi-Level Checkpointing for Distributed Deep Neural Network Training.
Proceedings of the 2021 SC Workshops Supplementary Proceedings, 2021

Scaling Single-Image Super-Resolution Training on Modern HPC Clusters: Early Experiences.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

Adaptive and Hierarchical Large Message All-to-all Communication Algorithms for Large-scale Dense GPU Systems.
Proceedings of the 21st IEEE/ACM International Symposium on Cluster, 2021

2020
HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training with TensorFlow.
Proceedings of the High Performance Computing - 35th International Conference, 2020

GEMS: GPU-enabled memory-aware model-parallelism system for distributed DNN training.
Proceedings of the International Conference for High Performance Computing, 2020

Accelerating GPU-based Machine Learning in Python using MPI Library: A Case Study with MVAPICH2-GDR.
Proceedings of the 6th IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2020

Efficient Training of Semantic Image Segmentation on Summit using Horovod and MVAPICH2-GDR.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

2019
HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training using TensorFlow.
CoRR, 2019

Performance Characterization of DNN Training using TensorFlow and PyTorch on Modern Clusters.
Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019


  Loading...