Jaewoong Sim

Orcid: 0000-0002-0403-9928

According to our database1, Jaewoong Sim authored at least 31 papers between 2012 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
CuPBoP: Making CUDA a Portable Language.
ACM Trans. Design Autom. Electr. Syst., 2024

InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management.
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

GSCore: Efficient Radiance Field Rendering via Architectural Support for 3D Gaussian Splatting.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
CuPBoP: A Framework to Make CUDA Portable.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

NeuRex: A Case for Neural Rendering Acceleration.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

SDM: Sharing-Enabled Disaggregated Memory System with Cache Coherent Compute Express Link.
Proceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques, 2023

2022
COX : Exposing CUDA Warp-level Functions to CPUs.
ACM Trans. Archit. Code Optim., 2022

CuPBoP: CUDA for Parallelized and Broad-range Processors.
CoRR, 2022

2021
Specializing FGPU for Persistent Deep Learning.
ACM Trans. Reconfigurable Technol. Syst., 2021

COX: CUDA on X86 by Exposing Warp-Level Functions to CPUs.
CoRR, 2021

Supporting CUDA for an extended RISC-V GPU architecture.
CoRR, 2021

2020
Batch-Aware Unified Memory Management in GPUs for Irregular Workloads.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019
Thermal-aware processing-in-memory instruction offloading.
J. Parallel Distributed Comput., 2019

Evaluating and Enhancing Intel® Stratix® 10 FPGAs for Persistent Real-Time AI.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

Why Compete When You Can Work Together: FPGA-ASIC Integration for Persistent RNNs.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

2018
CoolPIM: Thermal-Aware Source Throttling for Efficient PIM Instruction Offloading.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

A Customizable Matrix Multiplication Framework for the Intel HARPv2 Xeon+FPGA Platform: A Deep Learning Case Study.
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

2017
GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

High performance binary neural networks on the Xeon+FPGA™ platform.
Proceedings of the 27th International Conference on Field Programmable Logic and Applications, 2017

Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

2016
Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC.
Proceedings of the 2016 International Conference on Field-Programmable Technology, 2016

Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC.
Proceedings of the 26th International Conference on Field Programmable Logic and Applications, 2016

2015
BSSync: Processing Near Memory for Machine Learning Workloads with Bounded Staleness Consistency Models.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014
A Configurable and Strong RAS Solution for Die-Stacked DRAM Caches.
IEEE Micro, 2014

Transparent Hardware Management of Stacked DRAM as Part of Memory.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

2013
Resilient die-stacked DRAM caches.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

2012
A performance analysis framework for identifying potential benefits in GPGPU applications.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

FLEXclusion: Balancing cache capacity and on-chip bandwidth via Flexible Exclusion.
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012


  Loading...