Minjia Zhang

Proceedings of the SC22: International Conference for High Performance Computing, 2022

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers.

[BibT_eX]

[DOI]

Zhewei Yao

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models.

[BibT_eX]

[DOI]

Conglong Li

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale.

[BibT_eX]

[DOI]

Ammar Ahmad Awan

Jeff Rasley

Proceedings of the International Conference on Machine Learning, 2022

CarM: hierarchical episodic memory for continual learning.

[BibT_eX]

[DOI]

Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Adversarial Data Augmentation for Task-Specific Knowledge Distillation of Pre-trained Transformers.

[BibT_eX]

[DOI]

Uma-Naresh Niranjan

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

A Survey of Large-Scale Deep Learning Serving System Optimization: Challenges and Opportunities.

[BibT_eX]

[DOI]

CoRR, 2021

Carousel Memory: Rethinking the Design of Episodic Memory for Continual Learning.

[BibT_eX]

[DOI]

CoRR, 2021

Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training.

[BibT_eX]

[DOI]

Conglong Li

CoRR, 2021

Understanding and Generalizing Monotonic Proximity Graphs for Approximate Nearest Neighbor Search.

[BibT_eX]

[DOI]

Dantong Zhu

CoRR, 2021

DL Inference and Training Optimization Towards Speed and Scale.

[BibT_eX]

[DOI]

Proceedings of the Companion of The Web Conference 2021, 2021

ZeRO-Offload: Democratizing Billion-Scale Model Training.

[BibT_eX]

[DOI]

Jie Ren

Samyam Rajbhandari

Proceedings of the 2021 USENIX Annual Technical Conference, 2021

NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

DUET: A Compiler-Runtime Subgraph Scheduling Approach for Tensor Programs on a Coupled CPU-GPU Architecture.

[BibT_eX]

[DOI]

Zehua Hu

Mingqin Li

Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Vertical Scaling of Resource for OpenMP Application.

[BibT_eX]

[DOI]

Junfeng Zhao

Hongji Yang

Proceedings of the Service-Oriented Computing - 19th International Conference, 2021

DynaTune: Dynamic Tensor Program Optimization in Deep Neural Network Compilation.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Sentinel: Efficient Tensor Migration and Allocation on Heterogeneous Memory Systems for Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

2020

Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination.

[BibT_eX]

[DOI]

Proceedings of the 2020 International Conference on Management of Data, 2020

Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

AdaTune: Adaptive Tensor Program Compilation Made Efficient.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory.

[BibT_eX]

[DOI]

Jie Ren

Dong Li

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2019

LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory.

[BibT_eX]

[DOI]

CoRR, 2019

Sentinel: Runtime Data Management on Heterogeneous Main MemorySystems for Deep Learning.

[BibT_eX]

[DOI]

CoRR, 2019

Code Refactoring from OpenMP to MapReduce Model for Big Data Processing.

[BibT_eX]

[DOI]

Junfeng Zhao

Hongji Yang

Proceedings of the 2019 IEEE SmartWorld, 2019

Accelerating Large Scale Deep Learning Inference through DeepCPU at Microsoft.

[BibT_eX]

[DOI]

Proceedings of the 2019 USENIX Conference on Operational Machine Learning, 2019

GRIP: Multi-Store Capacity-Optimized High-Performance Nearest Neighbor Search for Vector Search Engine.

[BibT_eX]

[DOI]

Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019

2018

Zoom: SSD-based Vector Search for Optimizing Accuracy, Latency and Memory.

[BibT_eX]

[DOI]

CoRR, 2018

DeepCPU: Serving RNN-based Deep Learning Models 10x Faster.

[BibT_eX]

[DOI]

Proceedings of the 2018 USENIX Annual Technical Conference, 2018

Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Refactoring OpenMP Code Based on MapReduce Model.

[BibT_eX]

[DOI]

Junfeng Zhao

Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2018

Learning Intrinsic Sparse Structures within Long Short-Term Memory.

[BibT_eX]

[DOI]

Proceedings of the 6th International Conference on Learning Representations, 2018

2017

Hybridizing and Relaxing Dependence Tracking for Efficient Parallel Runtime Support.

[BibT_eX]

[DOI]

ACM Trans. Parallel Comput., 2017

POSTER: On the Problem of Consistency Exceptions in the Context of Strong Memory Models.

[BibT_eX]

[DOI]

Swarnendu Biswas

Michael D. Bond

Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Avoiding consistency exceptions under strong memory models.

[BibT_eX]

[DOI]

Swarnendu Biswas

Michael D. Bond

Proceedings of the 2017 ACM SIGPLAN International Symposium on Memory Management, 2017

Lightweight data race detection for production runs.

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on Compiler Construction, 2017

2016

Drinking from both glasses: combining pessimistic and optimistic tracking of cross-thread dependences.

[BibT_eX]

[DOI]

Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

Relaxed dependence tracking for parallel runtime support.

[BibT_eX]

[DOI]

Swarnendu Biswas

Michael D. Bond

Proceedings of the 25th International Conference on Compiler Construction, 2016

2015

Low-overhead software transactional memory with progress guarantees and strong semantics.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

SIRe: an efficient snapshot isolation-based memory model for detecting and tolerating region conflicts.

[BibT_eX]

[DOI]