Jung Ho Ahn

Orcid: 0000-0003-1733-1394

Affiliations:
  • Seoul National University, Korea
  • Stanford University, USA (PhD, 2007)


According to our database1, Jung Ho Ahn authored at least 123 papers between 2003 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
GraNDe: Efficient Near-Data Processing Architecture for Graph Neural Networks.
IEEE Trans. Computers, October, 2024

Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching.
CoRR, 2024

Cheddar: A Swift Fully Homomorphic Encryption Library for CUDA GPUs.
CoRR, 2024

HyPHEN: A Hybrid Packing Method and Its Optimizations for Homomorphic Encryption-Based Neural Networks.
IEEE Access, 2024

Native DRAM Cache: Re-architecting DRAM as a Large-Scale Cache for Data Centers.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory Commands.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

CLAY: CXL-based Scalable NDP Architecture Accelerating Embedding Layers.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024

IDT: Intelligent Data Placement for Multi-tiered Main Memory with Reinforcement Learning.
Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, 2024

An LPDDR-based CXL-PNM Platform for TCO-efficient Inference of Transformer-based Large Language Models.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

TAROT: A CXL SmartNIC-Based Defense Against Multi-bit Errors by Row-Hammer Attacks.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

AttAcc! Unleashing the Power of PIM for Batched Transformer-based Generative Model Inference.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
MaPHeA: A Framework for Lightweight Memory Hierarchy-aware Profile-guided Heap Allocation.
ACM Trans. Embed. Comput. Syst., 2023

High-precision RNS-CKKS on fixed but smaller word-size architectures: theory and application.
IACR Cryptol. ePrint Arch., 2023

NeuJeans: Private Neural Network Inference with Joint Optimization of Convolution and Bootstrapping.
CoRR, 2023

Toward Practical Privacy-Preserving Convolutional Neural Networks Exploiting Fully Homomorphic Encryption.
CoRR, 2023

CiFHER: A Chiplet-Based FHE Accelerator with a Resizable Structure.
CoRR, 2023

RETROSPECTIVE: Corona: System Implications of Emerging Nanophotonic Technology.
CoRR, 2023

HyPHEN: A Hybrid Packing Method and Optimizations for Homomorphic Encryption-Based Neural Networks.
CoRR, 2023

X-ray: Discovering DRAM Internal Structure and Error Characteristics by Issuing Memory Commands.
IEEE Comput. Archit. Lett., 2023

ADT: Aggressive Demotion and Promotion for Tiered Memory.
IEEE Comput. Archit. Lett., 2023

A Hardware-Friendly Tiled Singular-Value Decomposition-Based Matrix Multiplication for Transformer-Based Models.
IEEE Comput. Archit. Lett., 2023

Unleashing the Potential of PIM: Accelerating Large Batched Inference of Transformer-Based Generative Models.
IEEE Comput. Archit. Lett., 2023

Demystifying CXL Memory with Genuine CXL-Ready Systems and Devices.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

How to Kill the Second Bird with One ECC: The Pursuit of Row Hammer Resilient DRAM.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

SHARP: A Short-Word Hierarchical Accelerator for Robust and Practical Fully Homomorphic Encryption.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

SHADOW: Preventing Row Hammer in DRAM with Intra-Subarray Row Shuffling.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

2022
MVP: An Efficient CNN Accelerator with Matrix, Vector, and Processing-Near-Memory Units.
ACM Trans. Design Autom. Electr. Syst., 2022

Future Scaling of Memory Hierarchy for Tensor Cores and Eliminating Redundant Shared Memory Traffic Using Inter-Warp Multicasting.
IEEE Trans. Computers, 2022

AESPA: Accuracy Preserving Low-degree Polynomial Activation for Fast Private Inference.
CoRR, 2022

GraNDe: Near-Data Processing Architecture With Adaptive Matrix Mapping for Graph Convolutional Networks.
IEEE Comput. Archit. Lett., 2022

ARK: Fully Homomorphic Encryption Accelerator with Runtime Data Generation and Inter-Operation Key Reuse.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

BTS: an accelerator for bootstrappable fully homomorphic encryption.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

A Slice and Dice Approach to Accelerate Compound Sparse Attention on GPU.
Proceedings of the IEEE International Symposium on Workload Characterization, 2022

Accelerating Transformer Networks through Recomposing Softmax Layers.
Proceedings of the IEEE International Symposium on Workload Characterization, 2022

Mithril: Cooperative Row Hammer Protection on Commodity DRAM Leveraging Managed Refresh.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

2021
Over 100x Faster Bootstrapping in Fully Homomorphic Encryption through Memory-centric Optimization with GPUs.
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2021

TRiM: Tensor Reduction in Memory.
IEEE Comput. Archit. Lett., 2021

Row-Streaming Dataflow Using a Chaining Buffer and Systolic Array+ Structure.
IEEE Comput. Archit. Lett., 2021

Accelerating Fully Homomorphic Encryption Through Architecture-Centric Analysis and Optimization.
IEEE Access, 2021

TRiM: Enhancing Processor-Memory Interfaces with Scalable Tensor Reduction in Memory.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

MaPHeA: a lightweight memory hierarchy-aware profile-guided heap allocation framework.
Proceedings of the LCTES '21: 22nd ACM SIGPLAN/SIGBED International Conference on Languages, 2021

Accelerating Fully Homomorphic Encryption Through Microarchitecture-Aware Analysis and Optimization.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

BCD deduplication: effective memory compression using partial cache-line deduplication.
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

2020
MViD: Sparse Matrix-Vector Multiplication in Mobile DRAM for Accelerating Recurrent Neural Networks.
IEEE Trans. Computers, 2020

HEAAN Demystified: Accelerating Fully Homomorphic Encryption Through Architecture-centric Analysis and Optimization.
CoRR, 2020

CAT-TWO: Counter-Based Adaptive Tree, Time Window Optimized for DRAM Row-Hammer Prevention.
IEEE Access, 2020

Graphene: Strong yet Lightweight Row Hammer Protection.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Accelerating Number Theoretic Transformations for Bootstrappable Homomorphic Encryption on GPUs.
Proceedings of the IEEE International Symposium on Workload Characterization, 2020

2019
Restructuring Batch Normalization to Accelerate CNN Training.
Proceedings of the Second Conference on Machine Learning and Systems, SysML 2019, 2019

TWiCe: preventing row-hammering by exploiting time window counters.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

Enforcing Last-Level Cache Partitioning through Memory Virtual Channels.
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018
TWiCe: Time Window Counter Based Row Refresh to Prevent Row-Hammering.
IEEE Comput. Archit. Lett., 2018

Partitioning Compute Units in CNN Acceleration for Statistical Memory Traffic Shaping.
IEEE Comput. Archit. Lett., 2018

Leveraging Power-Performance Relationship of Energy-Efficient Modern DRAM Devices.
IEEE Access, 2018

Memory Hierarchy for Web Search.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

3D-Xpath: high-density managed DRAM architecture with cost-effective alternative paths for memory transactions.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017
Excavating the Hidden Parallelism Inside DRAM Architectures With Buffered Compares.
IEEE Trans. Very Large Scale Integr. Syst., 2017

Selective DRAM cache bypassing for improving bandwidth on DRAM/NVM hybrid main memory systems.
IEICE Electron. Express, 2017

Evaluation of Performance Unfairness in NUMA System Architecture.
IEEE Comput. Archit. Lett., 2017

SALAD: Achieving Symmetric Access Latency with Asymmetric DRAM Architecture.
IEEE Comput. Archit. Lett., 2017

Understanding power-performance relationship of energy-efficient modern DRAM devices.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

Work as a team or individual: Characterizing the system-level impacts of main memory partitioning.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

SOUP-N-SALAD: Allocation-Oblivious Access Latency Reduction with Asymmetric DRAM Microarchitectures.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

Defect Analysis and Cost-Effective Resilience Architecture for Future DRAM Devices.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

History-Based Arbitration for Fairness in Processor-Interconnect of NUMA Servers.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

2016
Full-Stack Architecting to Achieve a Billion-Requests-Per-Second Throughput on a Single Key-Value Store Server Platform.
ACM Trans. Comput. Syst., 2016

Near-DRAM Acceleration with Single-ISA Heterogeneous Processing in Standard Memory Modules.
IEEE Micro, 2016

Achieving One Billion Key-Value Requests per Second on a Single Server.
IEEE Micro, 2016

Exploring new features of high-bandwidth memory for GPUs.
IEICE Electron. Express, 2016

Large Pages on Steroids: Small Ideas to Accelerate Big Memory Applications.
IEEE Comput. Archit. Lett., 2016

Chameleon: Versatile and practical near-DRAM acceleration architecture for large memory systems.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Adaptive and flexible key-value stores through soft data partitioning.
Proceedings of the 34th IEEE International Conference on Computer Design, 2016

Buffered compares: Excavating the hidden parallelism inside DRAM architectures with lightweight logic.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

Accelerating Linked-list Traversal Through Near-Data Processing.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
CIDR: A Cache Inspired Area-Efficient DRAM Resilience Architecture against Permanent Faults.
IEEE Comput. Archit. Lett., 2015

DRAMA: An Architecture for Accelerated Processing Near Memory.
IEEE Comput. Archit. Lett., 2015

Architecting to achieve a billion requests per second throughput on a single key-value store server platform.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

History-Assisted Adaptive-Granularity Caches (HAAG$) for High Performance 3D DRAM Architectures.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Alloy: Parallel-serial memory channel architecture for single-chip heterogeneous processor systems.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

CiDRA: A cache-inspired DRAM resilience architecture.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

2014
Microbank: Architecting Through-Silicon Interposer-Based Main Memory Systems.
Proceedings of the International Conference for High Performance Computing, 2014

Row-buffer decoupling: A case for low-latency DRAM microarchitecture.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

2013
Exploiting Replicated Cache Blocks to Reduce L2 Cache Leakage in CMPs.
IEEE Trans. Very Large Scale Integr. Syst., 2013

MAEPER: Matching Access and Error Patterns With Error-Free Resource for Low Vcc L1 Cache.
IEEE Trans. Very Large Scale Integr. Syst., 2013

Mapping and Scheduling of Tasks and Communications on Many-Core SoC Under Local Memory Constraint.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2013

The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing.
ACM Trans. Archit. Code Optim., 2013

Scalable high-radix router microarchitecture using a network switch organization.
ACM Trans. Archit. Code Optim., 2013

McSimA+: A manycore simulator with application-level+ simulation and detailed microarchitecture modeling.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

Reducing memory access latency with asymmetric DRAM bank organizations.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

Dynamic bandwidth scaling for embedded DSPs with 3D-stacked DRAM and wide I/Os.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2013

Memory-centric system interconnect design with Hybrid Memory Cubes.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012
Improving System Energy Efficiency with Memory Rank Subsetting.
ACM Trans. Archit. Code Optim., 2012

Optical High Radix Switch Design.
IEEE Micro, 2012

MAGE: adaptive granularity and ECC for resilient and power efficient memory systems.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Network within a network approach to create a scalable high-radix router microarchitecture.
Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

2011
3D network-on-chip with wireless links through inductive coupling.
Proceedings of the International SoC Design Conference, 2011

The role of optics in future high radix switch design.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques.
Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011

A quantitative analysis of performance benefits of 3D die stacking on mobile and embedded SoC.
Proceedings of the Design, Automation and Test in Europe, 2011

Matching cache access behavior and bit error pattern for high performance low Vcc L1 cache.
Proceedings of the 48th Design Automation Conference, 2011

CMOS Nanophotonics: Technology, System Implications, and a CMP Case Study.
Proceedings of the Low Power Networks-on-Chip., 2011

2010
Replication-aware leakage management in chip multiprocessors with private L2 cache.
Proceedings of the 2010 International Symposium on Low Power Electronics and Design, 2010

2009
How to simulate 1000 cores.
SIGARCH Comput. Archit. News, 2009

Multicore DIMM: an Energy Efficient Memory Module with Independently Controlled DRAMs.
IEEE Comput. Archit. Lett., 2009

Future scaling of processor-memory interfaces.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

HyperX: topology, routing, and packaging of efficient large-scale networks.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

2008
Corona: System Implications of Emerging Nanophotonic Technology.
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies.
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

A Nanophotonic Interconnect for High-Performance Many-Core Computation.
Proceedings of the 16th Annual IEEE Symposium on High Performance Interconnects (HOTI 2008), 2008

2007
Executing irregular scientific applications on stream architectures.
Proceedings of the 21th Annual International Conference on Supercomputing, 2007

Tradeoff between data-, instruction-, and thread-level parallelism in stream processors.
Proceedings of the 21th Annual International Conference on Supercomputing, 2007

2006
Data parallel address architecture.
IEEE Comput. Archit. Lett., 2006

Architecture - The design space of data-parallel memory systems.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

2005
Scatter-Add in Data Parallel Architectures.
Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

2004
Stream Processors: Progammability and Efficiency.
ACM Queue, 2004

Analysis and Performance Results of a Molecular Modeling Application on Merrimac.
Proceedings of the ACM/IEEE SC2004 Conference on High Performance Networking and Computing, 2004

Evaluating the Imagine Stream Architecture.
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

Stream Register Files with Indexed Access.
Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004

2003
Programmable Stream Processors.
Computer, 2003

Merrimac: Supercomputing with Streams.
Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003


  Loading...