Minjia Zhang

Orcid: 0000-0002-8165-166X

According to our database1, Minjia Zhang authored at least 81 papers between 2010 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
FedCust: Offloading hyperparameter customization for federated learning.
Perform. Evaluation, 2025

2024
Vexless: A Serverless Vector Data Management System Using Cloud Functions.
Proc. ACM Manag. Data, 2024

Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions.
CoRR, 2024

Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks.
CoRR, 2024

UltraEdit: Instruction-based Fine-Grained Image Editing at Scale.
CoRR, 2024

Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training.
CoRR, 2024

Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native.
CoRR, 2024

System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models.
Proceedings of the 43rd ACM Symposium on Principles of Distributed Computing, 2024

Parcae: Proactive, Liveput-Optimized DNN Training on Preemptible Instances.
Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Networks.
ACM Trans. Embed. Comput. Syst., March, 2023

iQAN: Fast and Accurate Vector Search with Efficient Intra-Query Parallelism on Multi-Core Architectures.
IEEE Data Eng. Bull., 2023

DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies.
CoRR, 2023

DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models.
CoRR, 2023

DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention.
CoRR, 2023

RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model.
CoRR, 2023

Cost-effective On-device Continual Learning over Memory Hierarchy with Miro.
CoRR, 2023

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales.
CoRR, 2023

FedHC: A Scalable Federated Learning Framework for Heterogeneous and Resource-Constrained Clients.
CoRR, 2023

iQAN: Fast and Accurate Vector Search with Efficient Intra-Query Parallelism on Multi-Core Architectures.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs.
Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation, 2023

Cost-effective On-device Continual Learning over Memory Hierarchy with Miro.
Proceedings of the 29th Annual International Conference on Mobile Computing and Networking, 2023

Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Revisiting the Efficiency-Accuracy Tradeoff in Adapting Transformer Models via Adversarial Fine-Tuning.
Proceedings of the ECAI 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Kraków, Poland, 2023

Betty: Enabling Large-Scale GNN Training with Batch-Level Graph Partitioning.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing.
CoRR, 2022

Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers.
CoRR, 2022

Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for Natural Language Understanding.
CoRR, 2022

Extreme Compression for Pre-trained Transformers Made Simple and Efficient.
CoRR, 2022

A Survey of Multi-Tenant Deep Learning Inference on GPU.
CoRR, 2022

Speed-ANN: Low-Latency and High-Accuracy Nearest Neighbor Search via Intra-Query Parallelism.
CoRR, 2022

ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise.
CoRR, 2022

Powering Multi-Task Federated Learning with Competitive GPU Resource Sharing.
Proceedings of the Companion of The Web Conference 2022, Virtual Event / Lyon, France, April 25, 2022

GraSP: Optimizing Graph-based Nearest Neighbor Search with Subgraph Sampling and Pruning.
Proceedings of the WSDM '22: The Fifteenth ACM International Conference on Web Search and Data Mining, Virtual Event / Tempe, AZ, USA, February 21, 2022

DeepSpeed- Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale.
Proceedings of the International Conference on Machine Learning, 2022

CarM: hierarchical episodic memory for continual learning.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Adversarial Data Augmentation for Task-Specific Knowledge Distillation of Pre-trained Transformers.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
A Survey of Large-Scale Deep Learning Serving System Optimization: Challenges and Opportunities.
CoRR, 2021

Carousel Memory: Rethinking the Design of Episodic Memory for Continual Learning.
CoRR, 2021

Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training.
CoRR, 2021

Understanding and Generalizing Monotonic Proximity Graphs for Approximate Nearest Neighbor Search.
CoRR, 2021

DL Inference and Training Optimization Towards Speed and Scale.
Proceedings of the Companion of The Web Conference 2021, 2021

ZeRO-Offload: Democratizing Billion-Scale Model Training.
Proceedings of the 2021 USENIX Annual Technical Conference, 2021

NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

DUET: A Compiler-Runtime Subgraph Scheduling Approach for Tensor Programs on a Coupled CPU-GPU Architecture.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Vertical Scaling of Resource for OpenMP Application.
Proceedings of the Service-Oriented Computing - 19th International Conference, 2021

DynaTune: Dynamic Tensor Program Optimization in Deep Neural Network Compilation.
Proceedings of the 9th International Conference on Learning Representations, 2021

Sentinel: Efficient Tensor Migration and Allocation on Heterogeneous Memory Systems for Deep Learning.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

2020
Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination.
Proceedings of the 2020 International Conference on Management of Data, 2020

Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

AdaTune: Adaptive Tensor Program Compilation Made Efficient.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2019
LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory.
CoRR, 2019

Sentinel: Runtime Data Management on Heterogeneous Main MemorySystems for Deep Learning.
CoRR, 2019

Code Refactoring from OpenMP to MapReduce Model for Big Data Processing.
Proceedings of the 2019 IEEE SmartWorld, 2019

Accelerating Large Scale Deep Learning Inference through DeepCPU at Microsoft.
Proceedings of the 2019 USENIX Conference on Operational Machine Learning, 2019

GRIP: Multi-Store Capacity-Optimized High-Performance Nearest Neighbor Search for Vector Search Engine.
Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019

2018
Zoom: SSD-based Vector Search for Optimizing Accuracy, Latency and Memory.
CoRR, 2018

DeepCPU: Serving RNN-based Deep Learning Models 10x Faster.
Proceedings of the 2018 USENIX Annual Technical Conference, 2018

Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Refactoring OpenMP Code Based on MapReduce Model.
Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2018

Learning Intrinsic Sparse Structures within Long Short-Term Memory.
Proceedings of the 6th International Conference on Learning Representations, 2018

2017
Hybridizing and Relaxing Dependence Tracking for Efficient Parallel Runtime Support.
ACM Trans. Parallel Comput., 2017

POSTER: On the Problem of Consistency Exceptions in the Context of Strong Memory Models.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Avoiding consistency exceptions under strong memory models.
Proceedings of the 2017 ACM SIGPLAN International Symposium on Memory Management, 2017

Lightweight data race detection for production runs.
Proceedings of the 26th International Conference on Compiler Construction, 2017

2016
Drinking from both glasses: combining pessimistic and optimistic tracking of cross-thread dependences.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

Relaxed dependence tracking for parallel runtime support.
Proceedings of the 25th International Conference on Compiler Construction, 2016

2015
Low-overhead software transactional memory with progress guarantees and strong semantics.
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

SIRe: an efficient snapshot isolation-based memory model for detecting and tolerating region conflicts.
Proceedings of the Companion Proceedings of the 2015 ACM SIGPLAN International Conference on Systems, 2015

Valor: efficient, software-only region conflict exceptions.
Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, 2015

Hybrid Static: Dynamic Analysis for Statically Bounded Region Serializability.
Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

2013
OCTET: capturing and controlling cross-thread dependences efficiently.
Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, 2013

2011
Memcached Design on High Performance RDMA Capable Interconnects.
Proceedings of the International Conference on Parallel Processing, 2011

2010
VirtCFT: A Transparent VM-Level Fault-Tolerant System for Virtual Clusters.
Proceedings of the 16th IEEE International Conference on Parallel and Distributed Systems, 2010


  Loading...