2024

Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design.

[DOI]

Ruisi Cai

Yeonju Ro

Geon-Woo Kim

Peihao Wang

Babak Ehteshami Bejnordi

Aditya Akella

Zhangyang Wang

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping.

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2023

RingLeader: Efficiently Offloading Intra-Server Orchestration to NICs.

[DOI]

Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation, 2023

Lowering the Pre-training Tax for Gradient-based Subset Training: A Lightweight Distributed Pre-Training Toolkit.

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Dataset Efficient Training with Model Ensembling.

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Mr.BiQ: Post-Training Non-Uniform Quantization based on Minimizing the Reconstruction Error.

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Sequential Encryption of Sparse Neural Networks Toward Optimum Representation of Irregular Sparsity.

[DOI]

CoRR, 2021

Q-Rater: Non-Convex Optimization for Post-Training Uniform Quantization.

[DOI]

CoRR, 2021

Ghost Routing to Enable Oblivious Computation on Memory-centric Networks.

[DOI]

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

2018

Multi-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture.

[DOI]

Byungchul Hong

Yeonju Ro

John Kim

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018