Joo-Young Kim
Orcid: 0000-0003-1099-1496Affiliations:
- Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
- Microsoft Research, Redmond, WA, USA (since 2010)
According to our database1,
Joo-Young Kim
authored at least 82 papers
between 2006 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
-
on dl.acm.org
On csauthors.net:
Bibliography
2024
A DVS-Enabled Distributed Digital LDO Providing Rapid Uniform Power Grid and Ripple Reduction Achieving 20.1-ps FOM in 28 nm CMOS.
IEEE Trans. Circuits Syst. I Regul. Pap., November, 2024
SP-PIM: A Super-Pipelined Processing-In-Memory Accelerator With Local Error Prediction for Area/Energy-Efficient On-Device Learning.
IEEE J. Solid State Circuits, August, 2024
EPU: An Energy-Efficient Explainable AI Accelerator With Sparsity-Free Computation and Heat Map Compression/Pruning.
IEEE J. Solid State Circuits, March, 2024
HURRY: Highly Utilized, Reconfigurable ReRAM-based In-situ Accelerator with Multifunctionality.
CoRR, 2024
SAL-PIM: A Subarray-level Processing-in-Memory Architecture with LUT-based Linear Interpolation for Transformer-based Text Generation.
CoRR, 2024
Trinity: In-Database Near-Data Machine Learning Acceleration Platform for Advanced Data Analytics.
IEEE Access, 2024
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024
Morphling: A Throughput-Maximized TFHE-based Accelerator using Transform-domain Reuse.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024
Picasso: An Area/Energy-Efficient End-to-End Diffusion Accelerator with Hyper-Precision Data Type.
Proceedings of the 36th IEEE Hot Chips Symposium, 2024
A 38.5TOPS/W Point Cloud Neural Network Processor with Virtual Pillar and Quadtree-based Workload Management for Real-Time Outdoor BEV Detection.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2024
ACane: An Efficient FPGA-based Embedded Vision Platform with Accumulation-as-Convolution Packing for Autonomous Mobile Robots.
Proceedings of the 29th Asia and South Pacific Design Automation Conference, 2024
2023
Introduction to the Special Section on the 2022 Asian Solid-State Circuits Conference (A-SSCC).
IEEE J. Solid State Circuits, October, 2023
Commun. ACM, July, 2023
T-PIM: An Energy-Efficient Processing-in-Memory Accelerator for End-to-End On-Device Training.
IEEE J. Solid State Circuits, March, 2023
IEEE Trans. Circuits Syst. I Regul. Pap., January, 2023
Agamotto: A Performance Optimization Framework for CNN Accelerator With Row Stationary Dataflow.
IEEE Trans. Circuits Syst. I Regul. Pap., 2023
Accelerating Large-Scale Graph-Based Nearest Neighbor Search on a Computational Storage Platform.
IEEE Trans. Computers, 2023
Darwin: A DRAM-based Multi-level Processing-in-Memory Architecture for Data Analytics.
CoRR, 2023
SP-PIM: A 22.41TFLOPS/W, 8.81Epochs/Sec Super-Pipelined Processing-In-Memory Accelerator with Local Error Prediction for On-Device Learning.
Proceedings of the 2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2023
Strix: An End-to-End Streaming Architecture with Two-Level Ciphertext Batching for Fully Homomorphic Encryption with Programmable Bootstrapping.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023
PRIMO: A Full-Stack Processing-in-DRAM Emulation Framework for Machine Learning Workloads.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023
LightTrader: A Standalone High-Frequency Trading System with Deep Learning Inference Accelerators and Proactive Scheduler.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023
JNPU: A 1.04TFLOPS Joint-DNN Training Processor with Speculative Cyclic Quantization and Triple Heterogeneity on Microarchitecture / Precision / Dataflow.
Proceedings of the 49th IEEE European Solid State Circuits Conference, 2023
A 26.55TOPS/W Explainable AI Processor with Dynamic Workload Allocation and Heat Map Compression/Pruning.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2023
2022
An Overview of Processing-in-Memory Circuits for Artificial Intelligence and Machine Learning.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2022
Guest Editorial Revolution of AI and Machine Learning With Processing-in-Memory (PIM): From Systems, Architectures, to Circuits.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2022
Design of Processing-in-Memory With Triple Computational Path and Sparsity Handling for Energy-Efficient DNN Training.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2022
Accelerating Large-Scale Graph-based Nearest Neighbor Search on a Computational Storage Platform.
CoRR, 2022
Exploration of Systolic-Vector Architecture with Resource Scheduling for Dynamic ML Workloads.
CoRR, 2022
OpenMDS: An Open-Source Shell Generation Framework for High-Performance Design on Xilinx Multi-Die FPGAs.
IEEE Comput. Archit. Lett., 2022
Federated Onboard-Ground Station Computing With Weakly Supervised Cascading Pyramid Attention Network for Satellite Image Analysis.
IEEE Access, 2022
LightTrader : World's first AI-enabled High-Frequency Trading Solution with 16 TFLOPS / 64 TOPS Deep Learning Inference Accelerators.
Proceedings of the 2022 IEEE Hot Chips 34 Symposium, 2022
Trinity: End-to-End In-Database Near-Data Machine Learning Acceleration Platform for Advanced Data Analytics.
Proceedings of the 2022 IEEE Hot Chips 34 Symposium, 2022
DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation.
Proceedings of the 2022 IEEE Hot Chips 34 Symposium, 2022
LearningGroup: A Real-Time Sparse Training on FPGA via Learnable Weight Grouping for Multi-Agent Reinforcement Learning.
Proceedings of the International Conference on Field-Programmable Technology, 2022
FSHMEM: Supporting Partitioned Global Address Space on FPGAs for Large-Scale Hardware Acceleration Infrastructure.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022
OpenMDS: An Open-Source Shell Generation Framework for High-Performance Design on Multi-Die FPGAs.
Proceedings of the 30th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2022
A Dual-Mode Similarity Search Accelerator based on Embedding Compression for Online Cross-Modal Image-Text Retrieval.
Proceedings of the 30th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2022
T-PIM: A 2.21-to-161.08TOPS/W Processing-In-Memory Accelerator for End-to-End On-Device Training.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2022
2021
Z-PIM: A Sparsity-Aware Processing-in-Memory Architecture With Fully Variable Weight Bit-Precision for Energy-Efficient Deep Neural Networks.
IEEE J. Solid State Circuits, 2021
Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021
FIXAR: A Fixed-Point Deep Reinforcement Learning Platform with Quantization-Aware Training and Adaptive Parallelism.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021
2020
Z-PIM: An Energy-Efficient Sparsity Aware Processing-In-Memory Architecture with Fully-Variable Weight Precision.
Proceedings of the IEEE Symposium on VLSI Circuits, 2020
2017
2016
Commun. ACM, 2016
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016
2015
Toward accelerating deep learning at scale using specialized hardware in the datacenter.
Proceedings of the 2015 IEEE Hot Chips 27 Symposium (HCS), 2015
Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2015
2014
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014
Proceedings of the IEEE 25th International Conference on Application-Specific Systems, 2014
2013
A 320 mW 342 GOPS Real-Time Dynamic Object Recognition Processor for HD 720p Video Streams.
IEEE J. Solid State Circuits, 2013
2012
IEEE Micro, 2012
A 92-mW Real-Time Traffic Sign Recognition System With Robust Illumination Adaptation and Support Vector Machine.
IEEE J. Solid State Circuits, 2012
A simultaneous multithreading heterogeneous object recognition processor with machine learning based dynamic resource management.
Proceedings of the 2012 IEEE Symposium on Low-Power and High-Speed Chips, 2012
2011
24-GOPS 4.5-mm<sup>2</sup> Digital Cellular Neural Network for Rapid Visual Attention in an Object-Recognition SoC.
IEEE Trans. Neural Networks, 2011
2010
Visual Image Processing RAM: Memory Architecture With 2-D Data Location Search and Data Consistency Management for a Multicore Object Recognition Processor.
IEEE Trans. Circuits Syst. Video Technol., 2010
An attention controlled multi-core architecture for energy efficient object recognition.
Signal Process. Image Commun., 2010
Familiarity based unified visual attention model for fast and robust object recognition.
Pattern Recognit., 2010
A 118.4 GB/s Multi-Casting Network-on-Chip With Hierarchical Star-Ring Combined Topology for Real-Time Object Recognition.
IEEE J. Solid State Circuits, 2010
A 201.4 GOPS 496 mW Real-Time Multi-Object Recognition Processor With Bio-Inspired Neural Perception Engine.
IEEE J. Solid State Circuits, 2010
Intelligent NoC with neuro-fuzzy bandwidth regulation for a 51 IP object recognition processor.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2010
2009
IEEE Trans. Very Large Scale Integr. Syst., 2009
A Configurable Heterogeneous Multicore Architecture With Cellular Neural Network for Real-Time Object Recognition.
IEEE Trans. Circuits Syst. Video Technol., 2009
Real-Time Object Recognition with Neuro-Fuzzy Controlled Workload-Aware Task Pipelining.
IEEE Micro, 2009
A 125 GOPS 583 mW Network-on-Chip Based Parallel Processor With Bio-Inspired Visual Attention Engine.
IEEE J. Solid State Circuits, 2009
Memory-centric network-on-chip for power efficient execution of task-level pipeline on a multi-core processor.
IET Comput. Digit. Tech., 2009
A 201.4GOPS 496mW real-time multi-object recognition processor with bio-inspired neural perception engine.
Proceedings of the IEEE International Solid-State Circuits Conference, 2009
A 60fps 496mW multi-object recognition processor with workload-aware dynamic power management.
Proceedings of the 2009 International Symposium on Low Power Electronics and Design, 2009
A 118.4GB/s multi-casting network-on-chip for real-time object recognition processor.
Proceedings of the 35th European Solid-State Circuits Conference, 2009
A 54GOPS 51.8mW analog-digital mixed mode Neural Perception Engine for fast object detection.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2009
2008
A 125GOPS 583mW Network-on-Chip Based Parallel Processor with Bio-inspired Visual-Attention Engine.
Proceedings of the 2008 IEEE International Solid-State Circuits Conference, 2008
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2008), 2008
Proceedings of the ESSCIRC 2008, 2008
Vision platform for mobile intelligent robot based on 81.6 GOPS object recognition processor.
Proceedings of the 45th Design Automation Conference, 2008
2007
Solutions for Real Chip Implementation Issues of NoC and Their Application to Memory-Centric NoC.
Proceedings of the First International Symposium on Networks-on-Chips, 2007
Proceedings of the 33rd European Solid-State Circuits Conference, 2007
An 81.6 GOPS Object Recognition Processor Based on NoC and Visual Image Processing Memory.
Proceedings of the IEEE 2007 Custom Integrated Circuits Conference, 2007
2006
A Low-power Star-topology Body Area Network Controller for Periodic Data Monitoring Around and Inside the Human Body.
Proceedings of the Tenth IEEE International Symposium on Wearable Computers (ISWC 2006), 2006
An Ultra Low-Power Body Sensor Network Control Processor with Centralized Node Control.
Proceedings of the International Symposium on System-on-Chip, 2006
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2006), 2006
Proceedings of the IEEE 2006 Custom Integrated Circuits Conference, 2006