Minsoo Rhu
Orcid: 0000-0003-3303-8681
According to our database1,
Minsoo Rhu
authored at least 57 papers
between 2009 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
A Quantitative Analysis of State Space Model-Based Large Language Model: Study of Hungry Hungry Hippos.
IEEE Comput. Archit. Lett., 2024
IEEE Comput. Archit. Lett., 2024
IEEE Comput. Archit. Lett., 2024
PIM-MMU: A Memory Management Unit for Accelerating Data Transfers in Commercial PIM Systems.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024
vTrain: A Simulation Framework for Evaluating Cost-Effective and Compute-Optimal Large Language Model Training.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024
ElasticRec: A Microservice-based Model Serving Architecture Enabling Elastic Resource Scaling for Recommendation Models.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024
LazyDP: Co-Designing Algorithm-Software for Scalable Training of Differentially Private Recommendation Models.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
2023
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference.
CoRR, 2023
Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations.
CoRR, 2023
HAMMER: Hardware-Friendly Approximate Computing for Self-Attention With Mean-Redistribution And Linearization.
IEEE Comput. Archit. Lett., 2023
GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023
2022
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022
ARK: Fully Homomorphic Encryption Accelerator with Runtime Data Generation and Inter-Operation Key Reuse.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022
SmartSAGE: training large-scale graph neural networks using in-storage processing architectures.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022
Training personalized recommendation systems from (GPU) scratch: look forward not backwards.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022
PARIS and ELSA: an elastic scheduling algorithm for reconfigurable multi-GPU inference servers.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022
2021
Understanding the Implication of Non-Volatile Memory for Large-Scale Graph Neural Network Training.
IEEE Comput. Archit. Lett., 2021
IEEE Comput. Archit. Lett., 2021
TRiM: Enhancing Processor-Memory Interfaces with Scalable Tensor Reduction in Memory.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021
Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021
Trident: A Hybrid Correlation-Collision GPU Cache Timing Attack for AES Key Recovery.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021
2020
CoRR, 2020
Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020
PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020
NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020
2019
TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019
2018
Structurally Sparsified Backward Propagation for Faster Long Short-Term Memory Training.
CoRR, 2018
IEEE Comput. Archit. Lett., 2018
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018
Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018
Accelerator-centric deep learning systems for enhanced scalability, energy-efficiency, and programmability.
Proceedings of the 23rd Asia and South Pacific Design Automation Conference, 2018
2017
Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks.
CoRR, 2017
GPUpd: a fast and scalable multi-GPU architecture using cooperative projection and distribution.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017
2016
CoRR, 2016
vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016
2015
Proceedings of the 48th International Symposium on Microarchitecture, 2015
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015
2014
Proceedings of the International Symposium on Low Power Electronics and Design, 2014
2013
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013
2012
CAPRI: Prediction of compaction-adequacy for handling control-divergence in GPGPU architectures.
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012
2010
IEEE Trans. Circuits Syst. Video Technol., 2010
2009
Proceedings of the IEEE Workshop on Signal Processing Systems, 2009
Memory-less bit-plane coder architecture for JPEG2000 with concurrent column-stripe coding.
Proceedings of the International Conference on Image Processing, 2009
Architecture design of a high-performance dual-symbol binary arithmetic coder for JPEG2000.
Proceedings of the International Conference on Image Processing, 2009