Yuxiong He
Orcid: 0000-0001-8887-7752
According to our database1,
Yuxiong He
authored at least 142 papers
between 2004 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
Perform. Evaluation, 2025
2024
SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation.
CoRR, 2024
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design.
CoRR, 2024
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference.
CoRR, 2024
Quant-LLM: Accelerating the Serving of Large Language Models via FP6-Centric Algorithm-System Co-Design on Modern GPUs.
Proceedings of the 2024 USENIX Annual Technical Conference, 2024
System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models.
Proceedings of the 43rd ACM Symposium on Principles of Distributed Computing, 2024
System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
ACM Trans. Embed. Comput. Syst., March, 2023
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks.
CoRR, 2023
ZeroQuant-HERO: Hardware-Enhanced Robust Optimized Post-Training Quantization Framework for W8A8 Transformers.
CoRR, 2023
DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies.
CoRR, 2023
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models.
CoRR, 2023
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention.
CoRR, 2023
CoRR, 2023
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales.
CoRR, 2023
ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats.
CoRR, 2023
CoRR, 2023
CoRR, 2023
CoRR, 2023
A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training.
CoRR, 2023
Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases.
CoRR, 2023
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training.
Proceedings of the 37th International Conference on Supercomputing, 2023
HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs.
Proceedings of the 37th International Conference on Supercomputing, 2023
Understanding Int4 Quantization for Language Models: Latency Speedup, Composability, and Failure Cases.
Proceedings of the International Conference on Machine Learning, 2023
Proceedings of the Eleventh International Conference on Learning Representations, 2023
Proceedings of the Eleventh International Conference on Learning Representations, 2023
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023
Revisiting the Efficiency-Accuracy Tradeoff in Adapting Transformer Models via Adversarial Fine-Tuning.
Proceedings of the ECAI 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Kraków, Poland, 2023
2022
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing.
CoRR, 2022
Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers.
CoRR, 2022
Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for Natural Language Understanding.
CoRR, 2022
CoRR, 2022
ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise.
CoRR, 2022
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model.
CoRR, 2022
GraSP: Optimizing Graph-based Nearest Neighbor Search with Subgraph Sampling and Pruning.
Proceedings of the WSDM '22: The Fifteenth ACM International Conference on Web Search and Data Mining, Virtual Event / Tempe, AZ, USA, February 21, 2022
DeepSpeed- Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale.
Proceedings of the SC22: International Conference for High Performance Computing, 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale.
Proceedings of the International Conference on Machine Learning, 2022
1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed.
Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022
Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering, 2022
Adversarial Data Augmentation for Task-Specific Knowledge Distillation of Pre-trained Transformers.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
2021
Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training.
CoRR, 2021
Proceedings of the 2021 USENIX Annual Technical Conference, 2021
Proceedings of the International Conference for High Performance Computing, 2021
SimiGrad: Fine-Grained Adaptive Batching for Large Scale Training using Gradient Similarity Measurement.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed.
Proceedings of the 38th International Conference on Machine Learning, 2021
2020
Knowl. Inf. Syst., 2020
Local trend discovery on real-time microblogs with uncertain locations in tight memory environments.
GeoInformatica, 2020
CoRR, 2020
Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination.
Proceedings of the 2020 International Conference on Management of Data, 2020
Proceedings of the International Conference for High Performance Computing, 2020
Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020
2019
SIGMETRICS Perform. Evaluation Rev., 2019
LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory.
CoRR, 2019
Proceedings of the 2019 USENIX Conference on Operational Machine Learning, 2019
Proceedings of the 2019 USENIX Conference on Operational Machine Learning, 2019
Proceedings of the 2019 IEEE International Conference on Data Mining, 2019
Proceedings of the Fourteenth EuroSys Conference 2019, Dresden, Germany, March 25-28, 2019, 2019
GRIP: Multi-Store Capacity-Optimized High-Performance Nearest Neighbor Search for Vector Search Engine.
Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019
2018
IEEE Trans. Netw. Serv. Manag., 2018
CoRR, 2018
Proceedings of the 2018 World Wide Web Conference on World Wide Web, 2018
Proceedings of the 2018 USENIX Annual Technical Conference, 2018
Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018
Proceedings of the 6th International Conference on Learning Representations, 2018
2017
ACM Trans. Model. Perform. Evaluation Comput. Syst., 2017
Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, 2017
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017
Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference, Las Vegas, NV, USA, December 11, 2017
Swayam: distributed autoscaling to meet SLAs of machine learning inference services with resource efficiency.
Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference, Las Vegas, NV, USA, December 11, 2017
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017
Proceedings of the 2017 Symposium on Cloud Computing, SoCC 2017, Santa Clara, CA, USA, 2017
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017
2016
IEEE Trans. Knowl. Data Eng., 2016
SERF: efficient scheduling for fast deep neural network serving via judicious parallelism.
Proceedings of the International Conference for High Performance Computing, 2016
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016
Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS 2016, Burlingame, California, USA, October 31, 2016
TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services.
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016
2015
Proceedings of the Handbook on Data Centers, 2015
Proc. VLDB Endow., 2015
Delayed-Dynamic-Selective (DDS) Prediction for Reducing Extreme Tail Latency in Web Search.
Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, 2015
Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015
Proceedings of the 23rd IEEE International Symposium on Modeling, 2015
Performance Modeling and Scalability Optimization of Distributed Deep Learning Systems.
Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015
Proceedings of the 2015 IEEE International Conference on Autonomic Computing, 2015
Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services.
Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015
2014
Theor. Comput. Sci., 2014
A Theoretical Foundation for Scheduling and Designing Heterogeneous Processors for Interactive Applications.
Proceedings of the Distributed Computing - 28th International Symposium, 2014
Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2014
Proceedings of the IEEE 30th International Conference on Data Engineering, Chicago, 2014
Proceedings of the IEEE 30th International Conference on Data Engineering, Chicago, 2014
2013
Horton+: A Distributed System for Processing Declarative Reachability Queries over Partitioned Graphs.
Proc. VLDB Endow., 2013
Proceedings of the String Processing and Information Retrieval, 2013
COCA: online distributed resource management for cost minimization and carbon neutrality in data centers.
Proceedings of the International Conference for High Performance Computing, 2013
Energy-Efficient Scheduling for Best-Effort Interactive Services to Achieve High Response Quality.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013
Proceedings of the 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013), 2013
Proceedings of the 10th International Conference on Autonomic Computing, 2013
Proceedings of the 10th International Conference on Autonomic Computing, 2013
Proceedings of the Eighth Eurosys Conference 2013, 2013
Proceedings of the Euro-Par 2013 Parallel Processing, 2013
Proceedings of the ACM Cloud and Autonomic Computing Conference, 2013
2012
Proceedings of the IEEE 28th International Conference on Data Engineering (ICDE 2012), 2012
Provably-Efficient Job Scheduling for Energy and Fairness in Geographically Distributed Data Centers.
Proceedings of the 2012 IEEE 32nd International Conference on Distributed Computing Systems, 2012
Proceedings of the 9th International Conference on Autonomic Computing, 2012
Proceedings of the ACM Symposium on Cloud Computing, SOCC '12, 2012
Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012
2011
Proceedings of the Theory and Practice of Algorithms in (Computer) Systems, 2011
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
Proceedings of the 2011 International Conference on Distributed Computing Systems, 2011
Proceedings of the 49th Annual Allerton Conference on Communication, 2011
Position Paper: Embracing Heterogeneity - Improving Energy Efficiency for Interactive Services on Heterogeneous Data Center Hardware.
Proceedings of the AI for Data Center Management and Cloud Computing, 2011
2010
Improved results for scheduling batched parallel jobs by using a generalized analysis framework.
J. Parallel Distributed Comput., 2010
Proceedings of the SPAA 2010: Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2010
2008
IEEE Trans. Parallel Distributed Syst., 2008
2007
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007
Proceedings of the 2007 International Conference on Parallel Processing (ICPP 2007), 2007
2006
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2006
Proceedings of the 26th IEEE International Conference on Distributed Computing Systems (ICDCS 2006), 2006
2004