Jungwook Choi

Orcid: 0000-0002-3075-8694

According to our database1, Jungwook Choi authored at least 102 papers between 2006 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Pruning With Scaled Policy Constraints for Light-Weight Reinforcement Learning.
IEEE Access, 2024

P²URE: Proactive and Probabilistic Uncovered Neighbor-Aware Relay-Selection Method in Multi-Hop FANETs.
IEEE Access, 2024

SPADE: Sparse Pillar-based 3D Object Detection Accelerator for Autonomous Driving.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

InfiniPot: Infinite Context Processing on Memory-Constrained LLMs.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Lightweight Error Correction for In-Storage Acceleration of Large Language Model Inference.
Proceedings of the International Conference on Electronics, Information, and Communication, 2024

Searching Optimal Floating-Point Format for Sub-8-Bit Large Language Model Inference.
Proceedings of the International Conference on Electronics, Information, and Communication, 2024

Selectively Dilated Convolution for Accuracy-Preserving Sparse Pillar-based Embedded 3D Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ISP2DLA: Automated Deep Learning Accelerator Design for On-Sensor Image Signal Processing.
Proceedings of the 35th IEEE International Conference on Application-specific Systems, 2024

Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

RA-LoRA: Rank-Adaptive Parameter-Efficient Fine-Tuning for Accurate 2-bit Quantized Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
A Time Synchronization Protocol for Barrage Relay Networks.
Sensors, March, 2023

PillarAcc: Sparse PointPillars Accelerator for Real-Time Point Cloud 3D Object Detection on Edge Devices.
CoRR, 2023

Exploring Attention Map Reuse for Efficient Transformer Neural Networks.
CoRR, 2023

Token-Scaled Logit Distillation for Ternary Weight Generative Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

SiT Dataset: Socially Interactive Pedestrian Trajectory Dataset for Social Navigation Robots.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Distributed Space-Time Block Coding for Barrage Relay Networks.
Proceedings of the IEEE Military Communications Conference, 2023

Finding Optimal Numerical Format for Sub-8-Bit Post-Training Quantization of Vision Transformers.
Proceedings of the IEEE International Conference on Acoustics, 2023

Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

Range-Invariant Approximation of Non-Linear Operations for Efficient BERT Fine-Tuning.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Architecture-Aware Optimization of Layer Fusion for Latency-Optimal CNN Inference.
Proceedings of the 5th IEEE International Conference on Artificial Intelligence Circuits and Systems, 2023

2022
Optimization of General Matrix Multiply Library for Ternary Weight for Fast DNN Inference.
J. Signal Process. Syst., 2022

Minimizing Global Buffer Access in a Deep Learning Accelerator Using a Local Register File with a Rearranged Computational Sequence.
Sensors, 2022

A 7-nm Four-Core Mixed-Precision AI Chip With 26.2-TFLOPS Hybrid-FP8 Training, 104.9-TOPS INT4 Inference, and Workload-Aware Throttling.
IEEE J. Solid State Circuits, 2022

Achieving low write latency through new stealth program operation supporting early write completion in NAND flash memory.
J. Syst. Archit., 2022

Automatic Network Adaptation for Ultra-Low Uniform-Precision Quantization.
CoRR, 2022

Learning from distinctive candidates to optimize reduced-precision convolution program on tensor cores.
CoRR, 2022

Improving NVM Lifetime Using Task Stack Migration on Low-End MCU-Based Devices.
IEEE Access, 2022

Regularizing Activation Distribution for Ultra Low-bit Quantization-Aware Training of MobileNets.
Proceedings of the IEEE Workshop on Signal Processing Systems, 2022

Understanding and Optimizing INT4 Convolution for Accelerated DNN Inference on Tensor Cores.
Proceedings of the IEEE Workshop on Signal Processing Systems, 2022

Understanding the Role of Self Attention for Efficient Speech Recognition.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Understanding and Improving Knowledge Distillation for Quantization Aware Training of Large Transformer Encoders.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

NN-LUT: neural approximation of non-linear operations for efficient transformer inference.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Optimizing Exponent Bias for Sub-8bit Floating-Point Inference of Fine-tuned Transformers.
Proceedings of the 4th IEEE International Conference on Artificial Intelligence Circuits and Systems, 2022

2021
Implementation of Embedded Testbeds Using USRP and GNU-Radio for Performance Measurement and Analysis of PPS and PCO-Based Time Synchronizations.
Int. J. Interdiscip. Telecommun. Netw., 2021

Internal Task-Aware Command Scheduling to Improve Read Performance of Embedded Flash Storage Systems.
IEEE Access, 2021

Binarized Encoder-Decoder Network and Binarized Deconvolution Engine for Semantic Segmentation.
IEEE Access, 2021

Buffer Management With Append-Only Data Isolation for Improving SSD Performance.
IEEE Access, 2021

TernGEMM: GEneral Matrix Multiply Library with Ternary Weights for Fast DNN Inference.
Proceedings of the IEEE Workshop on Signal Processing Systems, 2021


Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling.
Proceedings of the 18th International SoC Design Conference, 2021

Understanding and Reducing Weight-Load Overhead of Systolic Deep Learning Accelerators.
Proceedings of the 18th International SoC Design Conference, 2021


Thermal Face Detection for High-Speed AI Thermometer.
Proceedings of the 7th IEEE International Conference on Network Intelligence and Digital Content, 2021

Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized Deep Neural Networks.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Efficient AI System Design With Cross-Layer Approximate Computing.
Proc. IEEE, 2020

Hardware and Software Co-optimization for the Initialization Failure of the ReRAM-based Cross-bar Array.
ACM J. Emerg. Technol. Comput. Syst., 2020

Guest Editorial: Robust Resource-Constrained Systems for Machine Learning.
IEEE Des. Test, 2020

Robust Machine Learning Systems: Challenges, Current Trends, Perspectives, and the Road Ahead.
IEEE Des. Test, 2020

Improving Write Performance Through Reliable Asynchronous Operation in Physically-Addressable SSD.
IEEE Access, 2020

GALRU: A Group-Aware Buffer Management Scheme for Flash Storage Systems.
IEEE Access, 2020

Level Aware Data Placement Technique for Hybrid NAND Flash Storage of Log-Structured Merge-Tree Based Key-Value Store System.
IEEE Access, 2020


OPTIMUS: OPTImized matrix MUltiplication Structure for Transformer neural network accelerator.
Proceedings of the Third Conference on Machine Learning and Systems, 2020

2019
DeepTools: Compiler and Execution Runtime Extensions for RaPiD AI Accelerator.
IEEE Micro, 2019

Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Accurate and Efficient 2-bit Quantized Neural Networks.
Proceedings of the Second Conference on Machine Learning and Systems, SysML 2019, 2019

CH-MAC: A Cluster-based, Hybrid TDMA MAC Protocol over Wireless Ad-hoc Networks.
Proceedings of the 2019 IEEE Military Communications Conference, 2019

Performance-driven Programming of Multi-TFLOP Deep Learning Accelerators.
Proceedings of the IEEE International Symposium on Workload Characterization, 2019

Accumulation Bit-Width Scaling For Ultra-Low Precision Training Of Deep Networks.
Proceedings of the 7th International Conference on Learning Representations, 2019

Workload-aware Automatic Parallelization for Multi-GPU DNN Training.
Proceedings of the IEEE International Conference on Acoustics, 2019

Memory and Interconnect Optimizations for Peta-Scale Deep Learning Systems.
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

BiScaled-DNN: Quantizing Long-tailed Datastructures with Two Scale Factors for Deep Neural Networks.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

A Compiler for Deep Neural Network Accelerators to Generate Optimized Code for a Wide Range of Data Parameters from a Hand-crafted Computation Kernel.
Proceedings of the IEEE Symposium in Low-Power and High-Speed Chips, 2019

DLFloat: A 16-b Floating Point Format Designed for Deep Learning Training and Inference.
Proceedings of the 26th IEEE Symposium on Computer Arithmetic, 2019

Approximate Computing Techniques for Deep Neural Networks.
Proceedings of the Approximate Circuits, Methodologies and CAD., 2019

2018
Bridging the Accuracy Gap for 2-bit Quantized Neural Networks (QNN).
CoRR, 2018

PACT: Parameterized Clipping Activation for Quantized Neural Networks.
CoRR, 2018


Training Deep Neural Networks with 8-bit Floating Point Numbers.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Taming the beast: Programming Peta-FLOP class Deep Learning Systems.
Proceedings of the International Symposium on Low Power Electronics and Design, 2018


PROMISE: An End-to-End Design of a Programmable Mixed-Signal Accelerator for Machine-Learning Algorithms.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

True Gradient-Based Training of Deep Binary Activated Neural Networks Via Continuous Binarization.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Exploiting approximate computing for deep learning acceleration.
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

Compensated-DNN: energy efficient low-precision deep neural networks by compensating quantization errors.
Proceedings of the 55th Annual Design Automation Conference, 2018

AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Toward a pixel-parallel architecture for graph cuts inference on FPGA.
Proceedings of the 27th International Conference on Field Programmable Logic and Applications, 2017

Accelerator Design for Deep Learning Training: Extended Abstract: Invited.
Proceedings of the 54th Annual Design Automation Conference, 2017

POSTER: Design Space Exploration for Performance Optimization of Deep Neural Networks on Shared Memory Accelerators.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016
Error Resilient and Energy Efficient MRF Message-Passing-Based Stereo Matching.
IEEE Trans. Very Large Scale Integr. Syst., 2016

Video-Rate Stereo Matching Using Markov Random Field TRW-S Inference on a Hybrid CPU+FPGA Computing Platform.
IEEE Trans. Circuits Syst. Video Technol., 2016

Energy-Efficient Simultaneous Localization and Mapping via Compounded Approximate Computing.
Proceedings of the 2016 IEEE International Workshop on Signal Processing Systems, 2016

Approximate computing: Challenges and opportunities.
Proceedings of the IEEE International Conference on Rebooting Computing, 2016

Analysis of error resiliency of belief propagation in computer vision.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Configurable and scalable belief propagation accelerator for computer vision.
Proceedings of the 26th International Conference on Field Programmable Logic and Applications, 2016

2015
High performance and error resilient probabilistic inference system for machine learning
PhD thesis, 2015

Transmission Power Control with the Guaranteed Communication Reliability in WSN.
Int. J. Distributed Sens. Networks, 2015

Fast hierarchical implementation of sequential tree-reweighted belief propagation for probabilistic inference.
Proceedings of the 25th International Conference on Field Programmable Logic and Applications, 2015

2014
A robust message passing based stereo matching kernel via system-level error resiliency.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013
Error resilient MRF message passing architecture for stereo matching.
Proceedings of the IEEE Workshop on Signal Processing Systems, 2013

FPGA acceleration of Markov Random Field TRW-S inference for stereo matching.
Proceedings of the 11th ACM/IEEE International Conference on Formal Methods and Models for Codesign, 2013

EMERALD: Characterization of emerging applications and algorithms for low-power devices.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

2012
Flexible and Expandable Speech Recognition Hardware with Weighted Finite State Transducers.
J. Signal Process. Syst., 2012

Deformable Carbon Nanotube-Contact Pads for Inertial Microswitch to Extend Contact Time.
IEEE Trans. Ind. Electron., 2012

Hardware implementation of MRF map inference on an FPGA platform.
Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL), 2012

2011
Memory Access Optimized VLSI for 5000-Word Continuous Speech Recognition.
J. Signal Process. Syst., 2011

2010
A Real-Time FPGA-Based 20 000-Word Speech Recognizer With Optimized DRAM Access.
IEEE Trans. Circuits Syst. I Regul. Pap., 2010

Supporting handover in an IEEE 802.11p-based wireless access system.
Proceedings of the Seventh International Workshop on Vehicular Ad Hoc Networks, 2010

An FPGA implementation of speech recognition with weighted finite state transducers.
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
VLSI for 5000-word continuous speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2009

2006
A Study on the Development of Ubiquitous CellPhone Robot.
Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2006


  Loading...