Jaewoong Sim

Orcid: 0000-0002-0403-9928

According to our database¹, Jaewoong Sim authored at least 31 papers between 2012 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

CuPBoP: Making CUDA a Portable Language.

[BibT_eX]

[DOI]

ACM Trans. Design Autom. Electr. Syst., 2024

InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management.

[BibT_eX]

[DOI]

Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization.

[BibT_eX]

[DOI]

Jungi Lee

Wonbeom Lee

Jaewoong Sim

Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models.

[BibT_eX]

[DOI]

Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

GSCore: Efficient Radiance Field Rendering via Architectural Support for 3D Gaussian Splatting.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023

CuPBoP: A Framework to Make CUDA Portable.

[BibT_eX]

[DOI]

Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

NeuRex: A Case for Neural Rendering Acceleration.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

SDM: Sharing-Enabled Disaggregated Memory System with Cache Coherent Compute Express Link.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques, 2023

2022

COX : Exposing CUDA Warp-level Functions to CPUs.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2022

CuPBoP: CUDA for Parallelized and Broad-range Processors.

[BibT_eX]

[DOI]

CoRR, 2022

2021

Specializing FGPU for Persistent Deep Learning.

[BibT_eX]

[DOI]

ACM Trans. Reconfigurable Technol. Syst., 2021

COX: CUDA on X86 by Exposing Warp-Level Functions to CPUs.

[BibT_eX]

[DOI]

CoRR, 2021

Supporting CUDA for an extended RISC-V GPU architecture.

[BibT_eX]

[DOI]

CoRR, 2021

2020

Batch-Aware Unified Memory Management in GPUs for Irregular Workloads.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019

Thermal-aware processing-in-memory instruction offloading.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2019

Evaluating and Enhancing Intel® Stratix® 10 FPGAs for Persistent Real-Time AI.

[BibT_eX]

[DOI]

Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

Why Compete When You Can Work Together: FPGA-ASIC Integration for Persistent RNNs.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

2018

CoolPIM: Thermal-Aware Source Throttling for Efficient PIM Instruction Offloading.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

A Customizable Matrix Multiplication Framework for the Intel HARPv2 Xeon+FPGA Platform: A Deep Learning Case Study.

[BibT_eX]

[DOI]

Philip Heng Wai Leong

Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

2017

GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

High performance binary neural networks on the Xeon+FPGA™ platform.

[BibT_eX]

[DOI]

Philip Heng Wai Leong

Proceedings of the 27th International Conference on Field Programmable Logic and Applications, 2017

Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

2016

Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Field-Programmable Technology, 2016

Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC.

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on Field Programmable Logic and Applications, 2016

2015

BSSync: Processing Near Memory for Machine Learning Workloads with Bounded Staleness Consistency Models.

[BibT_eX]

[DOI]

Joo Hwan Lee

Jaewoong Sim

Hyesoon Kim

Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014

A Configurable and Strong RAS Solution for Die-Stacked DRAM Caches.

[BibT_eX]

[DOI]

IEEE Micro, 2014

Transparent Hardware Management of Stacked DRAM as Part of Memory.

[BibT_eX]

[DOI]

Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

2013

Resilient die-stacked DRAM caches.

[BibT_eX]

[DOI]

Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

2012

A performance analysis framework for identifying potential benefits in GPGPU applications.

[BibT_eX]

[DOI]

Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch.

[BibT_eX]

[DOI]

Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

FLEXclusion: Balancing cache capacity and on-chip bandwidth via Flexible Exclusion.

[BibT_eX]

[DOI]

Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

Jaewoong Sim

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...