Minjia Zhang
Orcid: 0000-0002-8165-166X
According to our database1,
Minjia Zhang
authored at least 81 papers
between 2010 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
Perform. Evaluation, 2025
2024
Proc. ACM Manag. Data, 2024
Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions.
CoRR, 2024
Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks.
CoRR, 2024
Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training.
CoRR, 2024
CoRR, 2024
System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models.
Proceedings of the 43rd ACM Symposium on Principles of Distributed Computing, 2024
Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024
System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
ACM Trans. Embed. Comput. Syst., March, 2023
iQAN: Fast and Accurate Vector Search with Efficient Intra-Query Parallelism on Multi-Core Architectures.
IEEE Data Eng. Bull., 2023
DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies.
CoRR, 2023
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models.
CoRR, 2023
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention.
CoRR, 2023
CoRR, 2023
CoRR, 2023
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales.
CoRR, 2023
FedHC: A Scalable Federated Learning Framework for Heterogeneous and Resource-Constrained Clients.
CoRR, 2023
iQAN: Fast and Accurate Vector Search with Efficient Intra-Query Parallelism on Multi-Core Architectures.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023
Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs.
Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation, 2023
Proceedings of the 29th Annual International Conference on Mobile Computing and Networking, 2023
Proceedings of the Eleventh International Conference on Learning Representations, 2023
Revisiting the Efficiency-Accuracy Tradeoff in Adapting Transformer Models via Adversarial Fine-Tuning.
Proceedings of the ECAI 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Kraków, Poland, 2023
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023
2022
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing.
CoRR, 2022
Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers.
CoRR, 2022
Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for Natural Language Understanding.
CoRR, 2022
CoRR, 2022
Speed-ANN: Low-Latency and High-Accuracy Nearest Neighbor Search via Intra-Query Parallelism.
CoRR, 2022
ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise.
CoRR, 2022
Proceedings of the Companion of The Web Conference 2022, Virtual Event / Lyon, France, April 25, 2022
GraSP: Optimizing Graph-based Nearest Neighbor Search with Subgraph Sampling and Pruning.
Proceedings of the WSDM '22: The Fifteenth ACM International Conference on Web Search and Data Mining, Virtual Event / Tempe, AZ, USA, February 21, 2022
DeepSpeed- Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale.
Proceedings of the SC22: International Conference for High Performance Computing, 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale.
Proceedings of the International Conference on Machine Learning, 2022
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022
Adversarial Data Augmentation for Task-Specific Knowledge Distillation of Pre-trained Transformers.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
2021
A Survey of Large-Scale Deep Learning Serving System Optimization: Challenges and Opportunities.
CoRR, 2021
CoRR, 2021
Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training.
CoRR, 2021
Understanding and Generalizing Monotonic Proximity Graphs for Approximate Nearest Neighbor Search.
CoRR, 2021
Proceedings of the Companion of The Web Conference 2021, 2021
Proceedings of the 2021 USENIX Annual Technical Conference, 2021
NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
DUET: A Compiler-Runtime Subgraph Scheduling Approach for Tensor Programs on a Coupled CPU-GPU Architecture.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021
Proceedings of the Service-Oriented Computing - 19th International Conference, 2021
Proceedings of the 9th International Conference on Learning Representations, 2021
Sentinel: Efficient Tensor Migration and Allocation on Heterogeneous Memory Systems for Deep Learning.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021
2020
Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination.
Proceedings of the 2020 International Conference on Management of Data, 2020
Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
2019
LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory.
CoRR, 2019
Sentinel: Runtime Data Management on Heterogeneous Main MemorySystems for Deep Learning.
CoRR, 2019
Proceedings of the 2019 IEEE SmartWorld, 2019
Proceedings of the 2019 USENIX Conference on Operational Machine Learning, 2019
GRIP: Multi-Store Capacity-Optimized High-Performance Nearest Neighbor Search for Vector Search Engine.
Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019
2018
CoRR, 2018
Proceedings of the 2018 USENIX Annual Technical Conference, 2018
Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018
Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2018
Proceedings of the 6th International Conference on Learning Representations, 2018
2017
ACM Trans. Parallel Comput., 2017
POSTER: On the Problem of Consistency Exceptions in the Context of Strong Memory Models.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017
Proceedings of the 2017 ACM SIGPLAN International Symposium on Memory Management, 2017
Proceedings of the 26th International Conference on Compiler Construction, 2017
2016
Drinking from both glasses: combining pessimistic and optimistic tracking of cross-thread dependences.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016
Proceedings of the 25th International Conference on Compiler Construction, 2016
2015
Low-overhead software transactional memory with progress guarantees and strong semantics.
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015
SIRe: an efficient snapshot isolation-based memory model for detecting and tolerating region conflicts.
Proceedings of the Companion Proceedings of the 2015 ACM SIGPLAN International Conference on Systems, 2015
Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, 2015
Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015
2013
Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, 2013
2011
Proceedings of the International Conference on Parallel Processing, 2011
2010
Proceedings of the 16th IEEE International Conference on Parallel and Distributed Systems, 2010