Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
RingLeader: Efficiently Offloading Intra-Server Orchestration to NICs.
Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation, 2023
Lowering the Pre-training Tax for Gradient-based Subset Training: A Lightweight Distributed Pre-Training Toolkit.
Proceedings of the International Conference on Machine Learning, 2023
Dataset Efficient Training with Model Ensembling.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Mr.BiQ: Post-Training Non-Uniform Quantization based on Minimizing the Reconstruction Error.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Sequential Encryption of Sparse Neural Networks Toward Optimum Representation of Irregular Sparsity.
CoRR, 2021
Q-Rater: Non-Convex Optimization for Post-Training Uniform Quantization.
CoRR, 2021
Ghost Routing to Enable Oblivious Computation on Memory-centric Networks.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021
Multi-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018