Hyesoon Kim

Orcid: 0000-0002-6061-7825

According to our database1, Hyesoon Kim authored at least 164 papers between 2004 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
CuPBoP: Making CUDA a Portable Language.
ACM Trans. Design Autom. Electr. Syst., 2024

Quantifying CO<sub>2</sub> Emission Reduction Through Spatial Partitioning in Deep Learning Recommendation System Workloads.
IEEE Micro, 2024

Hydro: Adaptive Query Processing of ML Queries.
CoRR, 2024

Unleashing CPU Potential for Executing GPU Programs Through Compiler/Runtime Optimizations.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

Barre Chord: Efficient Virtual Memory Translation for Multi-Chip-Module GPUs.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Comparative Analysis of Executing GPU Applications on FPGA: HLS vs. Soft GPU Approaches.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Understanding Performance Implications of LLM Inference on CPUs.
Proceedings of the IEEE International Symposium on Workload Characterization, 2024

Towards "True" GPU Performance Scaling for OpenGPU.
Proceedings of the 36th IEEE Hot Chips Symposium, 2024

Enabling Fine-Grained Incremental Builds by Making Compiler Stateful.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2024

Exponentially Expanding the Phase-Ordering Search Space via Dormant Information.
Proceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction, 2024

2023
GPU Database Systems Characterization and Optimization.
Proc. VLDB Endow., November, 2023

RV-CURE: A RISC-V Capability Architecture for Full Memory Safety.
CoRR, 2023

Revisiting Query Performance in GPU Database Systems.
CoRR, 2023

Hardware-Assisted Code-Pointer Tagging for Forward-Edge Control-Flow Integrity.
IEEE Comput. Archit. Lett., 2023

Mitigating Timing-Based NoC Side-Channel Attacks With LLC Remapping.
IEEE Comput. Archit. Lett., 2023

Unified Co-Simulation Framework for Autonomous UAVs.
Proceedings of the Practice and Experience in Advanced Research Computing, 2023

CuPBoP-AMD: Extending CUDA to AMD Platforms.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

CuPBoP: A Framework to Make CUDA Portable.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

Extending the Life of Old Systems with More Memory.
Proceedings of the International Symposium on Memory Systems, 2023

EHT-SR: An Entropy-Based Hybrid Approach for Faster Super-Resolution.
Proceedings of the IEEE International Symposium on Multimedia, 2023

Traversing Large Compressed Graphs on GPUs.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

Spica: Exploring FPGA Optimizations to Enable an Efficient SpMV Implementation for Computations at Edge.
Proceedings of the IEEE International Conference on Edge Computing and Communications, 2023

Context-Aware Task Handling in Resource-Constrained Robots with Virtualization.
Proceedings of the IEEE International Conference on Edge Computing and Communications, 2023

Reducing Inference Latency with Concurrent Architectures for Image Recognition at Edge.
Proceedings of the IEEE International Conference on Edge Computing and Communications, 2023

Creating Robust Deep Neural Networks with Coded Distributed Computing for IoT.
Proceedings of the IEEE International Conference on Edge Computing and Communications, 2023

Skybox: Open-Source Graphic Rendering on Programmable RISC-V GPUs.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
COX : Exposing CUDA Warp-level Functions to CPUs.
ACM Trans. Archit. Code Optim., 2022

CuPBoP: CUDA for Parallelized and Broad-range Processors.
CoRR, 2022

FiGO: Fine-Grained Query Optimization in Video Analytics.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

Securing GPU via region-based bounds checking.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Accelerating Graphic Rendering on Programmable RISC-V GPUs.
Proceedings of the 2022 IEEE Hot Chips 34 Symposium, 2022

Maia: Matrix Inversion Acceleration Near Memory.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

2021
Efficiently Solving Partial Differential Equations in a Partially Reconfigurable Specialized Hardware.
IEEE Trans. Computers, 2021

COX: CUDA on X86 by Exposing Warp-Level Functions to CPUs.
CoRR, 2021

Vortex: Extending the RISC-V ISA for GPGPU and 3D-GraphicsResearch.
CoRR, 2021

Supporting CUDA for an extended RISC-V GPU architecture.
CoRR, 2021

Creating Robust Deep Neural Networks With Coded Distributed Computing for IoT Systems.
CoRR, 2021

THIA: Accelerating Video Analytics using Early Inference and Fine-Grained Query Planning.
CoRR, 2021

SmaQ: Smart Quantization for DNN Training by Exploiting Value Clustering.
IEEE Comput. Archit. Lett., 2021

Vortex: Extending the RISC-V ISA for GPGPU and 3D-Graphics.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Copernicus: Characterizing the Performance Implications of Compression Formats Used in Sparse Workloads.
Proceedings of the IEEE International Symposium on Workload Characterization, 2021

FAFNIR: Accelerating Sparse Gathering by Using Efficient Near-Memory Intelligent Reduction.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

Quantifying the design-space tradeoffs in autonomous drones.
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

2020
Traversing Large Graphs on GPUs with Unified Memory.
Proc. VLDB Endow., 2020

The 2019 Top Picks in Computer Architecture.
IEEE Micro, 2020

Toward Collaborative Inferencing of Deep Neural Networks on Internet-of-Things Devices.
IEEE Internet Things J., 2020

Copernicus: Characterizing the Performance Implications of Compression Formats Used in Sparse Workloads.
CoRR, 2020

Secure Location-Aware Authentication and Communication for Intelligent Transportation Systems.
CoRR, 2020

Reducing Inference Latency with Concurrent Architectures for Image Recognition.
CoRR, 2020

Edge-Tailored Perception: Fast Inferencing in-the-Edge with Efficient Model Distribution.
CoRR, 2020

Vortex: OpenCL Compatible RISC-V GPGPU.
CoRR, 2020

Hardware-based Always-On Heap Memory Safety.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Parallel Hash Table Design for NDP Systems.
Proceedings of the MEMSYS 2020: The International Symposium on Memory Systems, 2020

Things to Consider to Enable Dynamic Graphs in Processing-in-Memory.
Proceedings of the MEMSYS 2020: The International Symposium on Memory Systems, 2020

Neural Network Weight Compression with NNW-BDI.
Proceedings of the MEMSYS 2020: The International Symposium on Memory Systems, 2020

Understanding the Software and Hardware Stacks of a General-Purpose Cognitive Drone.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

MEISSA: Multiplying Matrices Efficiently in a Scalable Systolic Architecture.
Proceedings of the 38th IEEE International Conference on Computer Design, 2020

ALRESCHA: A Lightweight Reconfigurable Sparse-Computation Accelerator.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

RISC-V FPGA Platform Toward ROS-Based Robotics Application.
Proceedings of the 30th International Conference on Field-Programmable Logic and Applications, 2020

Productive Hardware Designs using Hybrid HLS-RTL Development.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

Cash: A Single-Source Hardware-Software Codesign Framework for Rapid Prototyping.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

Proposing a Fast and Scalable Systolic Array for Matrix Multiplication.
Proceedings of the 28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2020

Tango: An Optimizing Compiler for Just-In-Time RTL Simulation.
Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition, 2020

ASCELLA: Accelerating Sparse Computation by Enabling Stream Accesses to Memory.
Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition, 2020

PISCES: Power-Aware Implementation of SLAM by Customizing Efficient Sparse Algebra.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Batch-Aware Unified Memory Management in GPUs for Irregular Workloads.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019
ERIDANUS: Efficiently Running Inference of DNNs Using Systolic Arrays.
IEEE Micro, 2019

Thermal-aware processing-in-memory instruction offloading.
J. Parallel Distributed Comput., 2019

A Case Study: Exploiting Neural Machine Translation to Translate CUDA to OpenCL.
CoRR, 2019

Collaborative Execution of Deep Neural Networks on Internet of Things Devices.
CoRR, 2019

Characterizing the Execution of Deep Neural Networks on Collaborative Robots and Edge Devices.
Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning), 2019

Empirical Investigation of Stale Value Tolerance on Parallel RNN Training.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

Characterizing the Deployment of Deep Neural Networks on Commercial Edge Devices.
Proceedings of the IEEE International Symposium on Workload Characterization, 2019

Capella: Customizing Perception for Edge Devices by Efficiently Allocating FPGAs to DNNs.
Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019

FlashGPU: Placing New Flash Next to GPU Cores.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Robustly Executing DNNs in IoT Systems Using Coded Distributed Computing.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

LODESTAR: Creating Locally-Dense CNNs for Efficient Inference on Systolic Arrays.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Video analytics from edge to server: work-in-progress.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis Companion, 2019

Translating CUDA to OpenCL for Hardware Generation using Neural Machine Translation.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019

POSTER: Tango: An Optimizing Compiler for Just-In-Time RTL Simulation.
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018
StaleLearn: Learning Acceleration with Asynchronous Synchronization Between Model Replicas on PIM.
IEEE Trans. Computers, 2018

CODA: Enabling Co-location of Computation and Data for Multiple GPU Systems.
ACM Trans. Archit. Code Optim., 2018

Distributed Perception by Collaborative Robots.
IEEE Robotics Autom. Lett., 2018

Musical Chair: Efficient Real-Time Recognition Using Collaborative IoT Devices.
CoRR, 2018

Performance Implications of NoCs on 3D-Stacked Memories: Insights from the Hybrid Memory Cube.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2018

Performance Characterisation and Simulation of Intel's Integrated GPU Architecture.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2018

CoolPIM: Thermal-Aware Source Throttling for Efficient PIM Instruction Offloading.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Real-Time Image Recognition Using Collaborative IoT Devices.
Proceedings of the 1st on Reproducible Quality-Efficient Systems Tournament on Co-designing Pareto-efficient Deep Learning, 2018

2017
CAIRO: A Compiler-Assisted Technique for Enabling Instruction-Level Offloading of Processing-In-Memory.
ACM Trans. Archit. Code Optim., 2017

Exploring big graph computing - An empirical study from architectural perspective.
J. Parallel Distributed Comput., 2017

Louvre: Light-weight Ordering Using Versioning for Release Consistency.
CoRR, 2017

CODA: Enabling Co-location of Computation and Data for Near-Data Processing.
CoRR, 2017

Inferring Fine-grained Control Flow Inside SGX Enclaves with Branch Shadowing.
Proceedings of the 26th USENIX Security Symposium, 2017

Lightweight SIMT core designs for intelligent 3D stacked DRAM.
Proceedings of the International Symposium on Memory Systems, 2017

SimProf: A Sampling Framework for Data Analytic Workloads.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Demystifying the characteristics of 3D-stacked memories: A case study for Hybrid Memory Cube.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

2016
On the Internet of Things.
IEEE Micro, 2016

Analyzing Consistency Issues in HMC Atomics.
Proceedings of the Second International Symposium on Memory Systems, 2016

2015
GREEN Cache: Exploiting the Disciplined Memory Model of OpenCL on GPUs.
IEEE Trans. Computers, 2015

Block-Precise Processors: Low-Power Processors with Reduced Operand Store Accesses and Result Broadcasts.
IEEE Trans. Computers, 2015

OpenCL Performance Evaluation on Modern Multicore CPUs.
Sci. Program., 2015

SP-CNN: A Scalable and Programmable CNN-Based Accelerator.
IEEE Micro, 2015

Accelerating Application Start-up with Nonvolatile Memory in Android Systems.
IEEE Micro, 2015

Hardware Support for Safe Execution of Native Client Applications.
IEEE Comput. Archit. Lett., 2015

GraphBIG: understanding graph computing in the context of industrial solutions.
Proceedings of the International Conference for High Performance Computing, 2015

Instruction Offloading with HMC 2.0 Standard: A Case Study for Graph Traversals.
Proceedings of the 2015 International Symposium on Memory Systems, 2015

Understanding Energy Aspects of Processing-near-Memory for HPC Workloads.
Proceedings of the 2015 International Symposium on Memory Systems, 2015

SIMT-based Logic Layers for Stacked DRAM Architectures: A Prototype.
Proceedings of the 2015 International Symposium on Memory Systems, 2015

BSSync: Processing Near Memory for Machine Learning Workloads with Bounded Staleness Consistency Models.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014
Power Modeling for GPU Architectures Using McPAT.
ACM Trans. Design Autom. Electr. Syst., 2014

Transparent Hardware Management of Stacked DRAM as Part of Memory.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

GPUMech: GPU Performance Modeling Technique Based on Interval Analysis.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

TBPoint: Reducing Simulation Time for Large-Scale GPGPU Kernels.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Spare register aware prefetching for graph algorithms on GPUs.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Harmonica: An FPGA-Based Data Parallel Soft Core.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

2013
Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures.
ACM Trans. Design Autom. Electr. Syst., 2013

SD3: An Efficient Dynamic Data-Dependence Profiling Mechanism.
IEEE Trans. Computers, 2013

Design space exploration of on-chip ring interconnection for a CPU-GPU heterogeneous architecture.
J. Parallel Distributed Comput., 2013

SESH Framework: A Space Exploration Framework for GPU Application and Hardware Codesign.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation, 2013

OpenCL Performance Evaluation on Modern Multi Core CPUs.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

CHiP: A Profiler to Measure the Effect of Cache Contention on Scalability.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

2012
Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01737-7, 2012

When Prefetching Works, When It Doesn't, and Why.
ACM Trans. Archit. Code Optim., 2012

DRAM Scheduling Policy for GPGPU Architectures Based on a Potential Function.
IEEE Comput. Archit. Lett., 2012

A performance analysis framework for identifying potential benefits in GPGPU applications.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

Design space exploration of memory model for heterogeneous computing.
Proceedings of the 2012 ACM SIGPLAN workshop on Memory Systems Performance and Correctness: held in conjunction with PLDI '12, 2012

Supporting virtual memory in GPGPU without supporting precise exceptions.
Proceedings of the 2012 ACM SIGPLAN workshop on Memory Systems Performance and Correctness: held in conjunction with PLDI '12, 2012

A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

FLEXclusion: Balancing cache capacity and on-chip bandwidth via Flexible Exclusion.
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

Predicting Potential Speedup of Serial Code via Lightweight Profiling and Emulations with Memory Performance Model.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture.
Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

2010
Many-Thread Aware Prefetching Mechanisms for GPGPU Applications.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

SD3: A Scalable Approach to Dynamic Data-Dependence Profiling.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

An integrated GPU power and performance model.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

Design space exploration of the turbo decoding algorithm on GPUs.
Proceedings of the 2010 International Conference on Compilers, 2010

2009
Virtual Program Counter (VPC) Prediction: Very Low Cost Indirect Branch Prediction Using Conditional Branch Prediction Hardware.
IEEE Trans. Computers, 2009

Age based scheduling for asymmetric multiprocessors.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

2008
Dynamic Predication of Indirect Jumps.
IEEE Comput. Archit. Lett., 2008

Understanding performance, power and energy behavior in asymmetric multiprocessors.
Proceedings of the 26th International Conference on Computer Design, 2008

Performance-aware speculation control using wrong path usefulness prediction.
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

Improving the performance of object-oriented languages with dynamic predication of indirect jumps.
Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, 2008

2007
Diverge-Merge Processor: Generalized and Energy-Efficient Dynamic Predication.
IEEE Micro, 2007

VPC prediction: reducing the cost of indirect branches via hardware-based dynamic devirtualization.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers.
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

Profile-assisted Compiler Support for Dynamic Predication in Diverge-Merge Processors.
Proceedings of the Fifth International Symposium on Code Generation and Optimization (CGO 2007), 2007

2006
Address-Value Delta (AVD) Prediction: A Hardware Technique for Efficiently Parallelizing Dependent Cache Misses.
IEEE Trans. Computers, 2006

Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance.
IEEE Micro, 2006

Wish Branches: Enabling Adaptive and Aggressive Predicated Execution.
IEEE Micro, 2006

Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set.
Proceedings of the Fourth IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2006), 2006

2005
An Analysis of the Performance Impact of Wrong-Path Memory References on Out-of-Order and Runahead Execution Processors.
IEEE Trans. Computers, 2005

Using the First-Level Caches as Filters to Reduce the Pollution Caused by Speculative Memory References.
Int. J. Parallel Program., 2005

On Reusing the Results of Pre-Executed Instructions in a Runahead Execution Processor.
IEEE Comput. Archit. Lett., 2005

Address-Value Delta (AVD) Prediction: Increasing the Effectiveness of Runahead Execution by Exploiting Regular Memory Allocation Patterns.
Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-38 2005), 2005

Wish Branches: Combining Conditional Branching and Predication for Adaptive Predicated Execution.
Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-38 2005), 2005

Techniques for Efficient Processing in Runahead Execution Engines.
Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005

2004
Understanding the effects of wrong-path memory references on processor performance.
Proceedings of the 3rd Workshop on Memory Performance Issues, 2004

Cache Filtering Techniques to Reduce the Negative Impact of Useless Speculative Memory References on Processor Performance.
Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2004), 2004

Wrong Path Events: Exploiting Unusual and Illegal Program Behavior for Early Misprediction Detection and Recovery.
Proceedings of the 37th Annual International Symposium on Microarchitecture (MICRO-37 2004), 2004


  Loading...