JNPU: A 1.04TFLOPS Joint-DNN Training Processor with Speculative Cyclic Quantization and Triple Heterogeneity on Microarchitecture / Precision / Dataflow.

[BibT_eX]

[DOI]

Je Yang

Sukbin Lim

Sukjin Lee

Jae-Young Kim

Joo-Young Kim

Proceedings of the 49th IEEE European Solid State Circuits Conference, 2023

A 26.55TOPS/W Explainable AI Processor with Dynamic Workload Allocation and Heat Map Compression/Pruning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Custom Integrated Circuits Conference, 2023

2022

An Overview of Processing-in-Memory Circuits for Artificial Intelligence and Machine Learning.

[BibT_eX]

[DOI]

IEEE J. Emerg. Sel. Topics Circuits Syst., 2022

Guest Editorial Revolution of AI and Machine Learning With Processing-in-Memory (PIM): From Systems, Architectures, to Circuits.

[BibT_eX]

[DOI]

IEEE J. Emerg. Sel. Topics Circuits Syst., 2022

Design of Processing-in-Memory With Triple Computational Path and Sparsity Handling for Energy-Efficient DNN Training.

[BibT_eX]

[DOI]

IEEE J. Emerg. Sel. Topics Circuits Syst., 2022

Accelerating Large-Scale Graph-based Nearest Neighbor Search on a Computational Storage Platform.

[BibT_eX]

[DOI]

CoRR, 2022

Exploration of Systolic-Vector Architecture with Resource Scheduling for Dynamic ML Workloads.

[BibT_eX]

[DOI]

CoRR, 2022

OpenMDS: An Open-Source Shell Generation Framework for High-Performance Design on Xilinx Multi-Die FPGAs.

[BibT_eX]

[DOI]

Gyeongcheol Shin

Junsoo Kim

Joo-Young Kim

IEEE Comput. Archit. Lett., 2022

Federated Onboard-Ground Station Computing With Weakly Supervised Cascading Pyramid Attention Network for Satellite Image Analysis.

[BibT_eX]

[DOI]

IEEE Access, 2022

LightTrader : World's first AI-enabled High-Frequency Trading Solution with 16 TFLOPS / 64 TOPS Deep Learning Inference Accelerators.

[BibT_eX]

[DOI]

Proceedings of the 2022 IEEE Hot Chips 34 Symposium, 2022

Trinity: End-to-End In-Database Near-Data Machine Learning Acceleration Platform for Advanced Data Analytics.

[BibT_eX]

[DOI]

Proceedings of the 2022 IEEE Hot Chips 34 Symposium, 2022

DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation.

[BibT_eX]

[DOI]

Proceedings of the 2022 IEEE Hot Chips 34 Symposium, 2022

LearningGroup: A Real-Time Sparse Training on FPGA via Learnable Weight Grouping for Multi-Agent Reinforcement Learning.

[BibT_eX]

[DOI]

Je Yang

Jaeuk Kim

Joo-Young Kim

Proceedings of the International Conference on Field-Programmable Technology, 2022

FSHMEM: Supporting Partitioned Global Address Space on FPGAs for Large-Scale Hardware Acceleration Infrastructure.

[BibT_eX]

[DOI]

Yashael Faith Arthanto

David Ojika

Joo-Young Kim

Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

OpenMDS: An Open-Source Shell Generation Framework for High-Performance Design on Multi-Die FPGAs.

[BibT_eX]

[DOI]

Gyeongcheol Shin

Junsoo Kim

Joo-Young Kim

Proceedings of the 30th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2022

A Dual-Mode Similarity Search Accelerator based on Embedding Compression for Online Cross-Modal Image-Text Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 30th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2022

T-PIM: A 2.21-to-161.08TOPS/W Processing-In-Memory Accelerator for End-to-End On-Device Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE Custom Integrated Circuits Conference, 2022

2021

Z-PIM: A Sparsity-Aware Processing-in-Memory Architecture With Fully Variable Weight Bit-Precision for Energy-Efficient Deep Neural Networks.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2021

Chapter Five - FPGA based neural network accelerators.

[BibT_eX]

[DOI]

Joo-Young Kim

Adv. Comput., 2021

Accelerating Large-Scale Nearest Neighbor Search with Computational Storage Device.

[BibT_eX]

[DOI]

Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021

FIXAR: A Fixed-Point Deep Reinforcement Learning Platform with Quantization-Aware Training and Adaptive Parallelism.

[BibT_eX]

[DOI]

Je Yang

Seongmin Hong

Joo-Young Kim

Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

2020

Z-PIM: An Energy-Efficient Sparsity Aware Processing-In-Memory Architecture with Fully-Variable Weight Precision.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium on VLSI Circuits, 2020

2017

Configurable Clouds.

[BibT_eX]

[DOI]

IEEE Micro, 2017

2016

A reconfigurable fabric for accelerating large-scale datacenter services.

[BibT_eX]

[DOI]

Commun. ACM, 2016

A cloud-scale acceleration architecture.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

2015

Toward accelerating deep learning at scale using specialized hardware in the datacenter.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Hot Chips 27 Symposium (HCS), 2015

A Scalable High-Bandwidth Architecture for Lossless Compression on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2015

2014

A Scalable Multi-engine Xpress9 Compressor with Asynchronous Data Transfer.

[BibT_eX]

[DOI]

Joo-Young Kim

Scott Hauck

Doug Burger

Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

Energy efficient canonical huffman encoding.

[BibT_eX]

[DOI]

Janarbek Matai

Joo-Young Kim

Ryan Kastner

Proceedings of the IEEE 25th International Conference on Application-Specific Systems, 2014

2013

A 320 mW 342 GOPS Real-Time Dynamic Object Recognition Processor for HD 720p Video Streams.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2013

2012

Low-Power, Real-Time Object-Recognition Processors for Mobile Vision Systems.

[BibT_eX]

[DOI]

IEEE Micro, 2012

A 92-mW Real-Time Traffic Sign Recognition System With Robust Illumination Adaptation and Support Vector Machine.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2012

A simultaneous multithreading heterogeneous object recognition processor with machine learning based dynamic resource management.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE Symposium on Low-Power and High-Speed Chips, 2012

2011

24-GOPS 4.5-mm<sup>2</sup> Digital Cellular Neural Network for Rapid Visual Attention in an Object-Recognition SoC.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks, 2011

2010

Visual Image Processing RAM: Memory Architecture With 2-D Data Location Search and Data Consistency Management for a Multicore Object Recognition Processor.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2010

An attention controlled multi-core architecture for energy efficient object recognition.

[BibT_eX]

[DOI]

Signal Process. Image Commun., 2010

Familiarity based unified visual attention model for fast and robust object recognition.

[BibT_eX]

[DOI]

Pattern Recognit., 2010

A 118.4 GB/s Multi-Casting Network-on-Chip With Hierarchical Star-Ring Combined Topology for Real-Time Object Recognition.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2010

A 201.4 GOPS 496 mW Real-Time Multi-Object Recognition Processor With Bio-Inspired Neural Perception Engine.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2010

Intelligent NoC with neuro-fuzzy bandwidth regulation for a 51 IP object recognition processor.

[BibT_eX]

[DOI]

Proceedings of the IEEE Custom Integrated Circuits Conference, 2010

2009

81.6 GOPS Object Recognition Processor Based on a Memory-Centric NoC.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2009

A Configurable Heterogeneous Multicore Architecture With Cellular Neural Network for Real-Time Object Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2009

Real-Time Object Recognition with Neuro-Fuzzy Controlled Workload-Aware Task Pipelining.

[BibT_eX]

[DOI]

IEEE Micro, 2009

A 125 GOPS 583 mW Network-on-Chip Based Parallel Processor With Bio-Inspired Visual Attention Engine.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2009

Memory-centric network-on-chip for power efficient execution of task-level pipeline on a multi-core processor.

[BibT_eX]

[DOI]

IET Comput. Digit. Tech., 2009

A 201.4GOPS 496mW real-time multi-object recognition processor with bio-inspired neural perception engine.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2009

A 60fps 496mW multi-object recognition processor with workload-aware dynamic power management.

[BibT_eX]

[DOI]

Proceedings of the 2009 International Symposium on Low Power Electronics and Design, 2009

A 118.4GB/s multi-casting network-on-chip for real-time object recognition processor.

[BibT_eX]

[DOI]

Proceedings of the 35th European Solid-State Circuits Conference, 2009

A 54GOPS 51.8mW analog-digital mixed mode Neural Perception Engine for fast object detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Custom Integrated Circuits Conference, 2009

2008

A 125GOPS 583mW Network-on-Chip Based Parallel Processor with Bio-inspired Visual-Attention Engine.

[BibT_eX]

[DOI]

Proceedings of the 2008 IEEE International Solid-State Circuits Conference, 2008

A 0.6pJ/b 3Gb/s/ch transceiver in 0.18 µm CMOS for 10mm on-chip interconnects.

[BibT_eX]

[DOI]

Joonsung Bae

Joo-Young Kim

Hoi-Jun Yoo

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2008), 2008

A 211 GOPS/W dual-mode real-time object recognition processor with Network-on-Chip.

[BibT_eX]

[DOI]

Proceedings of the ESSCIRC 2008, 2008

Vision platform for mobile intelligent robot based on 81.6 GOPS object recognition processor.

[BibT_eX]

[DOI]

Proceedings of the 45th Design Automation Conference, 2008

2007

Solutions for Real Chip Implementation Issues of NoC and Their Application to Memory-Centric NoC.

[BibT_eX]

[DOI]

Proceedings of the First International Symposium on Networks-on-Chips, 2007

Visual image processing RAM for fast 2-D data location search.

[BibT_eX]

[DOI]

Proceedings of the 33rd European Solid-State Circuits Conference, 2007

An 81.6 GOPS Object Recognition Processor Based on NoC and Visual Image Processing Memory.

[BibT_eX]

[DOI]

Proceedings of the IEEE 2007 Custom Integrated Circuits Conference, 2007

2006

A Low-power Star-topology Body Area Network Controller for Periodic Data Monitoring Around and Inside the Human Body.

[BibT_eX]

[DOI]

Proceedings of the Tenth IEEE International Symposium on Wearable Computers (ISWC 2006), 2006

An Ultra Low-Power Body Sensor Network Control Processor with Centralized Node Control.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on System-on-Chip, 2006

A 372 ps 64-bit adder using fast pull-up logic in 0.18µm CMOS.

[BibT_eX]

[DOI]

Joo-Young Kim

Kangmin Lee

Hoi-Jun Yoo

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2006), 2006

A Multi-Nodes Human Body Communication Sensor Network Control Processor.

[BibT_eX]

[DOI]

Proceedings of the IEEE 2006 Custom Integrated Circuits Conference, 2006

Joo-Young Kim

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...