Joo-Young Kim

Orcid: 0000-0003-1099-1496

Affiliations:
  • Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
  • Microsoft Research, Redmond, WA, USA (since 2010)


According to our database1, Joo-Young Kim authored at least 82 papers between 2006 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
A DVS-Enabled Distributed Digital LDO Providing Rapid Uniform Power Grid and Ripple Reduction Achieving 20.1-ps FOM in 28 nm CMOS.
IEEE Trans. Circuits Syst. I Regul. Pap., November, 2024

SP-PIM: A Super-Pipelined Processing-In-Memory Accelerator With Local Error Prediction for Area/Energy-Efficient On-Device Learning.
IEEE J. Solid State Circuits, August, 2024

EPU: An Energy-Efficient Explainable AI Accelerator With Sparsity-Free Computation and Heat Map Compression/Pruning.
IEEE J. Solid State Circuits, March, 2024

HURRY: Highly Utilized, Reconfigurable ReRAM-based In-situ Accelerator with Multifunctionality.
CoRR, 2024

SAL-PIM: A Subarray-level Processing-in-Memory Architecture with LUT-based Linear Interpolation for Transformer-based Text Generation.
CoRR, 2024

Trinity: In-Database Near-Data Machine Learning Acceleration Platform for Advanced Data Analytics.
IEEE Access, 2024

BLESS: Bandwidth and Locality Enhanced SMEM Seeding Acceleration for DNA Sequencing.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Morphling: A Throughput-Maximized TFHE-based Accelerator using Transform-domain Reuse.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

Picasso: An Area/Energy-Efficient End-to-End Diffusion Accelerator with Hyper-Precision Data Type.
Proceedings of the 36th IEEE Hot Chips Symposium, 2024

A 38.5TOPS/W Point Cloud Neural Network Processor with Virtual Pillar and Quadtree-based Workload Management for Real-Time Outdoor BEV Detection.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2024

ACane: An Efficient FPGA-based Embedded Vision Platform with Accumulation-as-Convolution Packing for Autonomous Mobile Robots.
Proceedings of the 29th Asia and South Pacific Design Automation Conference, 2024

2023
Introduction to the Special Section on the 2022 Asian Solid-State Circuits Conference (A-SSCC).
IEEE J. Solid State Circuits, October, 2023

South Korea's Nationwide Effort for AI Semiconductor Industry.
Commun. ACM, July, 2023

T-PIM: An Energy-Efficient Processing-in-Memory Accelerator for End-to-End On-Device Training.
IEEE J. Solid State Circuits, March, 2023

Accelerating Deep Convolutional Neural Networks Using Number Theoretic Transform.
IEEE Trans. Circuits Syst. I Regul. Pap., January, 2023

Agamotto: A Performance Optimization Framework for CNN Accelerator With Row Stationary Dataflow.
IEEE Trans. Circuits Syst. I Regul. Pap., 2023

Accelerating Large-Scale Graph-Based Nearest Neighbor Search on a Computational Storage Platform.
IEEE Trans. Computers, 2023

Darwin: A DRAM-based Multi-level Processing-in-Memory Architecture for Data Analytics.
CoRR, 2023

SP-PIM: A 22.41TFLOPS/W, 8.81Epochs/Sec Super-Pipelined Processing-In-Memory Accelerator with Local Error Prediction for On-Device Learning.
Proceedings of the 2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2023

Strix: An End-to-End Streaming Architecture with Two-Level Ciphertext Batching for Fully Homomorphic Encryption with Programmable Bootstrapping.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

PRIMO: A Full-Stack Processing-in-DRAM Emulation Framework for Machine Learning Workloads.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

LightTrader: A Standalone High-Frequency Trading System with Deep Learning Inference Accelerators and Proactive Scheduler.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

JNPU: A 1.04TFLOPS Joint-DNN Training Processor with Speculative Cyclic Quantization and Triple Heterogeneity on Microarchitecture / Precision / Dataflow.
Proceedings of the 49th IEEE European Solid State Circuits Conference, 2023

A 26.55TOPS/W Explainable AI Processor with Dynamic Workload Allocation and Heat Map Compression/Pruning.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2023

2022
An Overview of Processing-in-Memory Circuits for Artificial Intelligence and Machine Learning.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2022

Guest Editorial Revolution of AI and Machine Learning With Processing-in-Memory (PIM): From Systems, Architectures, to Circuits.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2022

Design of Processing-in-Memory With Triple Computational Path and Sparsity Handling for Energy-Efficient DNN Training.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2022

Accelerating Large-Scale Graph-based Nearest Neighbor Search on a Computational Storage Platform.
CoRR, 2022

Exploration of Systolic-Vector Architecture with Resource Scheduling for Dynamic ML Workloads.
CoRR, 2022

OpenMDS: An Open-Source Shell Generation Framework for High-Performance Design on Xilinx Multi-Die FPGAs.
IEEE Comput. Archit. Lett., 2022

Federated Onboard-Ground Station Computing With Weakly Supervised Cascading Pyramid Attention Network for Satellite Image Analysis.
IEEE Access, 2022

LightTrader : World's first AI-enabled High-Frequency Trading Solution with 16 TFLOPS / 64 TOPS Deep Learning Inference Accelerators.
Proceedings of the 2022 IEEE Hot Chips 34 Symposium, 2022

Trinity: End-to-End In-Database Near-Data Machine Learning Acceleration Platform for Advanced Data Analytics.
Proceedings of the 2022 IEEE Hot Chips 34 Symposium, 2022

DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation.
Proceedings of the 2022 IEEE Hot Chips 34 Symposium, 2022

LearningGroup: A Real-Time Sparse Training on FPGA via Learnable Weight Grouping for Multi-Agent Reinforcement Learning.
Proceedings of the International Conference on Field-Programmable Technology, 2022

FSHMEM: Supporting Partitioned Global Address Space on FPGAs for Large-Scale Hardware Acceleration Infrastructure.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

OpenMDS: An Open-Source Shell Generation Framework for High-Performance Design on Multi-Die FPGAs.
Proceedings of the 30th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2022

A Dual-Mode Similarity Search Accelerator based on Embedding Compression for Online Cross-Modal Image-Text Retrieval.
Proceedings of the 30th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2022

T-PIM: A 2.21-to-161.08TOPS/W Processing-In-Memory Accelerator for End-to-End On-Device Training.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2022

2021
Z-PIM: A Sparsity-Aware Processing-in-Memory Architecture With Fully Variable Weight Bit-Precision for Energy-Efficient Deep Neural Networks.
IEEE J. Solid State Circuits, 2021

Chapter Five - FPGA based neural network accelerators.
Adv. Comput., 2021

Accelerating Large-Scale Nearest Neighbor Search with Computational Storage Device.
Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021

FIXAR: A Fixed-Point Deep Reinforcement Learning Platform with Quantization-Aware Training and Adaptive Parallelism.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

2020
Z-PIM: An Energy-Efficient Sparsity Aware Processing-In-Memory Architecture with Fully-Variable Weight Precision.
Proceedings of the IEEE Symposium on VLSI Circuits, 2020

2017
Configurable Clouds.
IEEE Micro, 2017

2016
A reconfigurable fabric for accelerating large-scale datacenter services.
Commun. ACM, 2016

A cloud-scale acceleration architecture.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

2015
Toward accelerating deep learning at scale using specialized hardware in the datacenter.
Proceedings of the 2015 IEEE Hot Chips 27 Symposium (HCS), 2015

A Scalable High-Bandwidth Architecture for Lossless Compression on FPGAs.
Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2015

2014
A Scalable Multi-engine Xpress9 Compressor with Asynchronous Data Transfer.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

Energy efficient canonical huffman encoding.
Proceedings of the IEEE 25th International Conference on Application-Specific Systems, 2014

2013
A 320 mW 342 GOPS Real-Time Dynamic Object Recognition Processor for HD 720p Video Streams.
IEEE J. Solid State Circuits, 2013

2012
Low-Power, Real-Time Object-Recognition Processors for Mobile Vision Systems.
IEEE Micro, 2012

A 92-mW Real-Time Traffic Sign Recognition System With Robust Illumination Adaptation and Support Vector Machine.
IEEE J. Solid State Circuits, 2012

A simultaneous multithreading heterogeneous object recognition processor with machine learning based dynamic resource management.
Proceedings of the 2012 IEEE Symposium on Low-Power and High-Speed Chips, 2012

2011
24-GOPS 4.5-mm<sup>2</sup> Digital Cellular Neural Network for Rapid Visual Attention in an Object-Recognition SoC.
IEEE Trans. Neural Networks, 2011

2010
Visual Image Processing RAM: Memory Architecture With 2-D Data Location Search and Data Consistency Management for a Multicore Object Recognition Processor.
IEEE Trans. Circuits Syst. Video Technol., 2010

An attention controlled multi-core architecture for energy efficient object recognition.
Signal Process. Image Commun., 2010

Familiarity based unified visual attention model for fast and robust object recognition.
Pattern Recognit., 2010

A 118.4 GB/s Multi-Casting Network-on-Chip With Hierarchical Star-Ring Combined Topology for Real-Time Object Recognition.
IEEE J. Solid State Circuits, 2010

A 201.4 GOPS 496 mW Real-Time Multi-Object Recognition Processor With Bio-Inspired Neural Perception Engine.
IEEE J. Solid State Circuits, 2010

Intelligent NoC with neuro-fuzzy bandwidth regulation for a 51 IP object recognition processor.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2010

2009
81.6 GOPS Object Recognition Processor Based on a Memory-Centric NoC.
IEEE Trans. Very Large Scale Integr. Syst., 2009

A Configurable Heterogeneous Multicore Architecture With Cellular Neural Network for Real-Time Object Recognition.
IEEE Trans. Circuits Syst. Video Technol., 2009

Real-Time Object Recognition with Neuro-Fuzzy Controlled Workload-Aware Task Pipelining.
IEEE Micro, 2009

A 125 GOPS 583 mW Network-on-Chip Based Parallel Processor With Bio-Inspired Visual Attention Engine.
IEEE J. Solid State Circuits, 2009

Memory-centric network-on-chip for power efficient execution of task-level pipeline on a multi-core processor.
IET Comput. Digit. Tech., 2009

A 201.4GOPS 496mW real-time multi-object recognition processor with bio-inspired neural perception engine.
Proceedings of the IEEE International Solid-State Circuits Conference, 2009

A 60fps 496mW multi-object recognition processor with workload-aware dynamic power management.
Proceedings of the 2009 International Symposium on Low Power Electronics and Design, 2009

A 118.4GB/s multi-casting network-on-chip for real-time object recognition processor.
Proceedings of the 35th European Solid-State Circuits Conference, 2009

A 54GOPS 51.8mW analog-digital mixed mode Neural Perception Engine for fast object detection.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2009

2008
A 125GOPS 583mW Network-on-Chip Based Parallel Processor with Bio-inspired Visual-Attention Engine.
Proceedings of the 2008 IEEE International Solid-State Circuits Conference, 2008

A 0.6pJ/b 3Gb/s/ch transceiver in 0.18 µm CMOS for 10mm on-chip interconnects.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2008), 2008

A 211 GOPS/W dual-mode real-time object recognition processor with Network-on-Chip.
Proceedings of the ESSCIRC 2008, 2008

Vision platform for mobile intelligent robot based on 81.6 GOPS object recognition processor.
Proceedings of the 45th Design Automation Conference, 2008

2007
Solutions for Real Chip Implementation Issues of NoC and Their Application to Memory-Centric NoC.
Proceedings of the First International Symposium on Networks-on-Chips, 2007

Visual image processing RAM for fast 2-D data location search.
Proceedings of the 33rd European Solid-State Circuits Conference, 2007

An 81.6 GOPS Object Recognition Processor Based on NoC and Visual Image Processing Memory.
Proceedings of the IEEE 2007 Custom Integrated Circuits Conference, 2007

2006
A Low-power Star-topology Body Area Network Controller for Periodic Data Monitoring Around and Inside the Human Body.
Proceedings of the Tenth IEEE International Symposium on Wearable Computers (ISWC 2006), 2006

An Ultra Low-Power Body Sensor Network Control Processor with Centralized Node Control.
Proceedings of the International Symposium on System-on-Chip, 2006

A 372 ps 64-bit adder using fast pull-up logic in 0.18µm CMOS.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2006), 2006

A Multi-Nodes Human Body Communication Sensor Network Control Processor.
Proceedings of the IEEE 2006 Custom Integrated Circuits Conference, 2006


  Loading...