2025

TLP Balancer: Predictive Thread Allocation for Multitenant Inference in Embedded GPUs.

[DOI]

Minseong Gil

Jaebeom Jeon

IEEE Embed. Syst. Lett., June, 2025

Beyond VABlock: Improving Transformer workloads through aggressive prefetching.

[DOI]

J. Syst. Archit., 2025

MOST: Memory Oversubscription-Aware Scheduling for Tensor Migration on GPU Unified Storage.

[DOI]

IEEE Comput. Archit. Lett., 2025

SSFFT: Energy-Efficient Selective Scaling for Fast Fourier Transform in Embedded GPUs.

[DOI]

Proceedings of the 26th ACM SIGPLAN/SIGBED International Conference on Languages, 2025

Hierarchical Traversal Stack Design Using Shared Memory for GPU Ray Tracing.

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2025

SparsePIM: An Efficient HBM-Based PIM Architecture for Sparse Matrix-Vector Multiplications.

[DOI]

Proceedings of the 39th ACM International Conference on Supercomputing, 2025

HyMM: A Hybrid Sparse-Dense Matrix Multiplication Accelerator for GCNs.

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference, 2025

2024

Adaptive Kernel Merge and Fusion for Multi-Tenant Inference in Embedded GPUs.

[DOI]

IEEE Embed. Syst. Lett., December, 2024

Conflict-aware compiler for hierarchical register file on GPUs.

[DOI]

J. Syst. Archit., 2024

SAVector: Vectored Systolic Arrays.

[DOI]

IEEE Access, 2024

Coldmap: Extending SSD Lifetime Exploiting Multi-Page Mapping Information.

[DOI]

Jaewon Seo

Gunjae Koo

Proceedings of the 13th Non-Volatile Memory Systems and Applications Symposium, 2024

VitBit: Enhancing Embedded GPU Performance for AI Workloads through Register Operand Packing.

[DOI]

Proceedings of the 53rd International Conference on Parallel Processing, 2024

2023

FLIXR: Embedding Index Into Flash Translation Layer in SSDs.

[DOI]

IEEE Trans. Computers, 2023

Vizard: Passing Over Profiling-Based Detection by Manipulating Performance Counters.

[DOI]

Minkyu Song

Taeweon Suh

Gunjae Koo

IEEE Access, 2023

Warped-MC: An Efficient Memory Controller Scheme for Massively Parallel Processors.

[DOI]

Proceedings of the 52nd International Conference on Parallel Processing, 2023

2022

GhostLeg: Selective Memory Coalescing for Secure GPU Architecture.

[DOI]

IEEE Access, 2022

Analyzing GCN Aggregation on GPU.

[DOI]

IEEE Access, 2022

Restore Buffer Overflow Attacks: Breaking Undo-Based Defense Schemes.

[DOI]

Jongmin Lee

Gunjae Koo

Proceedings of the International Conference on Information Networking, 2022

CacheRewinder: Revoking Speculative Cache Updates Exploiting Write-Back Buffer.

[DOI]

Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

Stealth ECC: A Data-Width Aware Adaptive ECC Scheme for DRAM Error Resilience.

[DOI]

Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

2020

Hi-End: Hierarchical, Endurance-Aware STT-MRAM-Based Register File for Energy-Efficient GPUs.

[DOI]

IEEE Access, 2020

2019

Linebacker: preserving victim cache lines in idle register files of GPUs.

[DOI]

Proceedings of the 46th International Symposium on Computer Architecture, 2019

GraphSSD: graph semantics aware SSD.

[DOI]

Proceedings of the 46th International Symposium on Computer Architecture, 2019

2018

CTA-Aware Prefetching and Scheduling for GPU.

[DOI]

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

2017

Improving Energy Efficiency of GPUs through Data Compression and Compressed Execution.

[DOI]

IEEE Trans. Computers, 2017

Summarizer: trading communication with computing near storage.

[DOI]

Gunjae Koo

Kiran Kumar Matam

Te I

H. V. Krishna Giri Narra

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Access Pattern-Aware Cache Management for Improving Data Utilization in GPU.

[DOI]

Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

2016

Warped-preexecution: A GPU pre-execution approach for improving latency hiding.

[DOI]

Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

2015

Warped-compression: enabling power efficient GPUs through register compression.

[DOI]

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Revealing Critical Loads and Hidden Data Locality in GPGPU Applications.

[DOI]

Gunjae Koo

Hyeran Jeon

Murali Annavaram

Proceedings of the 2015 IEEE International Symposium on Workload Characterization, 2015

2006

A robust PRML read channel with digital timing recovery for multi-format optical disc.

[DOI]

Gunjae Koo

Woochul Jung

Heesub Lee

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2006), 2006