Sungjoo Yoo

Orcid: 0000-0002-5853-0675

According to our database1, Sungjoo Yoo authored at least 163 papers between 1996 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
NeRF-PIM: PIM Hardware-Software Co-Design of Neural Rendering Networks.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2024

Phys3DGS: Physically-based 3D Gaussian Splatting for Inverse Rendering.
CoRR, 2024

Baking Relightable NeRF for Real-time Direct/Indirect Illumination Rendering.
CoRR, 2024

Breaking MLPerf Training: A Case Study on Optimizing BERT.
CoRR, 2024


Geometry Transfer for Stylizing Radiance Fields.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MFOS: Model-Free & One-Shot Object Pose Estimation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

MetaMix: Meta-State Precision Searcher for Mixed-Precision Activation Quantization.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Retrospective: A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing.
CoRR, 2023

Masked Token Similarity Transfer for Compressing Transformer-Based ASR Models.
Proceedings of the IEEE International Conference on Acoustics, 2023

NIPQ: Noise proxy-based Integrated Pseudo-Quantization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Multi-scale Local Implicit Keypoint Descriptor for Keypoint Matching.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

AnyFlow: Arbitrary Scale Optical Flow with Implicit Neural Representation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Introduction to the Special Section on Energy-Efficient AI Chips.
ACM Trans. Design Autom. Electr. Syst., 2022

Memory Efficient Patch-based Training for INR-based GANs.
CoRR, 2022

TernaryNeRF: Quantizing Voxel Grid-based NeRF Models.
Proceedings of the IEEE International Workshop on Rapid System Prototyping, 2022

BASQ: Branch-wise Activation-clipping Search Quantization for Sub-4-bit Neural Networks.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
On the Overlooked Significance of Underutilized Contextual Features in Recent News Recommendation Models.
CoRR, 2021

StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing.
CoRR, 2021

Augmenting Few-Shot Learning With Supervised Contrastive Learning.
IEEE Access, 2021

FPGA Prototyping of Systolic Array-based Accelerator for Low-Precision Inference of Deep Neural Networks.
Proceedings of the IEEE International Workshop on Rapid System Prototyping, 2021

Fine-grained Semantics-aware Representation Enhancement for Self-supervised Monocular Depth Estimation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Exploiting Spatial Dimensions of Latent in GAN for Real-Time Image Editing.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
$Q$ -Value Prediction for Reinforcement Learning Assisted Garbage Collection to Reduce Long Tail Latency in SSD.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Reducing DRAM refresh power consumption by runtime profiling of retention time and dual-row activation.
Microprocess. Microsystems, 2020

McDRAM v2: In-Dynamic Random Access Memory Systolic Array Accelerator to Address the Large Model Problem in Deep Neural Networks on the Edge.
IEEE Access, 2020

MEANTIME: Mixture of Attention Mechanisms with Multi-temporal Embeddings for Sequential Recommendation.
Proceedings of the RecSys 2020: Fourteenth ACM Conference on Recommender Systems, 2020

PROFIT: A Novel Training Method for sub-4-bit MobileNet Models.
Proceedings of the Computer Vision - ECCV 2020, 2020

A Novel In-DRAM Accelerator Architecture for Binary Neural Network.
Proceedings of the 2020 IEEE Symposium in Low-Power and High-Speed Chips, 2020

2019
Low-overhead, one-cycle timing-error detection and correction technique for flip-flop based pipelines.
IEICE Electron. Express, 2019

Machine Learning-Based Automatic Generation of eFuse Configuration in NAND Flash Chip.
Proceedings of the IEEE International Test Conference, 2019

Tag2Pix: Line Art Colorization Using Text Tag With SECat and Changing Loss.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Machine Learning at Facebook: Understanding Inference at the Edge.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

2018
Nonvolatile Write Buffer-Based Journaling Bypass for Storage Write Reduction in Mobile Devices.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

McDRAM: Low Latency and Energy-Efficient Matrix Computations in DRAM.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

ZeNA: Zero-Aware Neural Network Accelerator.
IEEE Des. Test, 2018

Precision Highway for Ultra Low-Precision Quantization.
CoRR, 2018

Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications.
CoRR, 2018

FPGA Prototyping of Low-Precision Zero-Skipping Accelerator for Neural Networks.
Proceedings of the 2018 International Symposium on Rapid System Prototyping, 2018

Energy-Efficient Neural Network Accelerator Based on Outlier-Aware Low-Precision Computation.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

Value-Aware Quantization for Training and Inference of Neural Networks.
Proceedings of the Computer Vision - ECCV 2018, 2018

Joint optimization of speed, accuracy, and energy for embedded image recognition systems.
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

Dynamic management of key states for reinforcement learning-assisted garbage collection to reduce long tail latency in SSD.
Proceedings of the 55th Annual Design Automation Conference, 2018

2017
Reinforcement Learning-Assisted Garbage Collection to Mitigate Long-Tail Latency in SSD.
ACM Trans. Embed. Comput. Syst., 2017

ExtraV: Boosting Graph Processing Near Storage with a Coherent Accelerator.
Proc. VLDB Endow., 2017

An FPGA-based platform for non volatile memory emulation.
Proceedings of the IEEE 6th Non-Volatile Memory Systems and Applications Symposium, 2017

A novel zero weight/activation-aware hardware architecture of convolutional neural network.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017

Making DRAM Stronger Against Row Hammering.
Proceedings of the 54th Annual Design Automation Conference, 2017

Weighted-Entropy-Based Quantization for Deep Neural Networks.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016
Low-Power Hybrid Memory Cubes With Link Power Management and Two-Level Prefetching.
IEEE Trans. Very Large Scale Integr. Syst., 2016

Differential Write-Conscious Software Design on Phase-Change Memory: An SQLite Case Study.
ACM Trans. Design Autom. Electr. Syst., 2016

Improving Write Performance by Controlling Target Resistance Distributions in MLC PRAM.
ACM Trans. Design Autom. Electr. Syst., 2016

Memory Access Scheduling for a Smart TV.
IEEE Trans. Circuits Syst. Video Technol., 2016

Array Organization and Data Management Exploration in Racetrack Memory.
IEEE Trans. Computers, 2016

Prediction Hybrid Cache: An Energy-Efficient STT-RAM Cache Architecture.
IEEE Trans. Computers, 2016

AIM: Energy-Efficient Aggregation Inside the Memory Hierarchy.
ACM Trans. Archit. Code Optim., 2016

Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications.
Proceedings of the 4th International Conference on Learning Representations, 2016

Selective refresh to avoid read disturb errors in STT-RAM main memory.
Proceedings of the International SoC Design Conference, 2016

A dual-retention time architecture towards secure and high performance STT-RAM main memory subsystem.
Proceedings of the International SoC Design Conference, 2016

Zero and data reuse-aware fast convolution for deep neural networks on GPU.
Proceedings of the Eleventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2016

Area-efficient one-cycle correction scheme for timing errors in flip-flop based pipelines.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2016

2015
Dynamic Wear Leveling for Phase-Change Memories With Endurance Variations.
IEEE Trans. Very Large Scale Integr. Syst., 2015

Hybrid Main Memory for High Bandwidth Multi-Core System.
IEEE Trans. Multi Scale Comput. Syst., 2015

Extending lifetime of flash memory using strong error correction coding.
IEEE Trans. Consumer Electron., 2015

Time slot assignment for convergecast in wireless sensor networks.
J. Parallel Distributed Comput., 2015

Filtering dirty data in DRAM to reduce PRAM writes.
Proceedings of the 2015 IFIP/IEEE International Conference on Very Large Scale Integration, 2015

Locality-aware vertex scheduling for GPU-based graph computation.
Proceedings of the 2015 IFIP/IEEE International Conference on Very Large Scale Integration, 2015

PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

A scalable processing-in-memory accelerator for parallel graph processing.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

A small non-volatile write buffer to reduce storage writes in smartphones.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

Memory fast-forward: a low cost special function unit to enhance energy efficiency in GPU for big data processing.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

A tiny-capacitor-backed non-volatile buffer to reduce storage writes in smartphones.
Proceedings of the 2015 International Conference on Hardware/Software Codesign and System Synthesis, 2015

Big/little deep neural network for ultra low power inference.
Proceedings of the 2015 International Conference on Hardware/Software Codesign and System Synthesis, 2015

2014
A Memory-Efficient Architecture of Full HD Around View Monitor Systems.
IEEE Trans. Intell. Transp. Syst., 2014

An Adaptive Idle-Time Exploiting Method for Low Latency NAND Flash-Based Storage Devices.
IEEE Trans. Computers, 2014

FPGA-based prototyping systems for emerging memory technologies.
Proceedings of the 25nd IEEE International Symposium on Rapid System Prototyping, 2014

DASCA: Dead Write Prediction Assisted STT-RAM Cache Architecture.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Accelerating graph computation with racetrack memory and pointer-assisted graph representation.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

Coarse-grained Bubble Razor to exploit the potential of two-phase transparent latch designs.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

Dynamic Power Management of Off-Chip Links for Hybrid Memory Cubes.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

2013
MAEPER: Matching Access and Error Patterns With Error-Free Resource for Low Vcc L1 Cache.
IEEE Trans. Very Large Scale Integr. Syst., 2013

A network congestion-aware memory subsystem for manycore.
ACM Trans. Embed. Comput. Syst., 2013

Write intensity prediction for energy-efficient non-volatile caches.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Multi-histogram based scene change detection for frame rate up-conversion.
Proceedings of the IEEE International Conference on Consumer Electronics, 2013

Selectively protecting error-correcting code for area-efficient and reliable STT-RAM caches.
Proceedings of the 18th Asia and South Pacific Design Automation Conference, 2013

2012
Optimizing Video Application Design for Phase-Change RAM-Based Main Memory.
IEEE Trans. Very Large Scale Integr. Syst., 2012

A Multistep Tag Comparison Method for a Low-Power L2 Cache.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2012

Active Memory Processor for Network-on-Chip-Based Architecture.
IEEE Trans. Computers, 2012

Optimal wake-up scheduling of data gathering trees for wireless sensor networks.
J. Parallel Distributed Comput., 2012

Bloom filter-based dynamic wear leveling for phase-change RAM.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

A case study on the application of real phase-change RAM to main memory subsystem.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

Write performance improvement by hiding R drift latency in phase-change RAM.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

Hybrid DRAM/PRAM-based main memory for single-chip CPU/GPU.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

2011
Program Phase-Aware Dynamic Voltage Scaling Under Variable Computational Workload and Memory Stall Environment.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2011

Runtime Power Management of 3-D Multi-Core Architectures Under Peak Power and Temperature Constraints.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2011

Maximizing throughput of temperature-constrained multi-core systems with 3D-stacked cache memory.
Proceedings of the 12th International Symposium on Quality Electronic Design, 2011

A novel tag access scheme for low power L2 cache.
Proceedings of the Design, Automation and Test in Europe, 2011

A quantitative analysis of performance benefits of 3D die stacking on mobile and embedded SoC.
Proceedings of the Design, Automation and Test in Europe, 2011

Power management of hybrid DRAM/PRAM-based main memory.
Proceedings of the 48th Design Automation Conference, 2011

FlexiBuffer: reducing leakage power in on-chip network routers.
Proceedings of the 48th Design Automation Conference, 2011

Matching cache access behavior and bit error pattern for high performance low Vcc L1 cache.
Proceedings of the 48th Design Automation Conference, 2011

2010
Dual Motion Estimation for Frame Rate Up-Conversion.
IEEE Trans. Circuits Syst. Video Technol., 2010

Temperature-Aware Integrated DVFS and Power Gating for Executing Tasks With Runtime Distribution.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2010

A Network Congestion-Aware Memory Controller.
Proceedings of the NOCS 2010, 2010

Event statistics and criticality-aware bitrate allocation to minimize energy consumption of memory-constrained wireless surveillance system.
Proceedings of the 2010 IEEE International Conference on Multimedia and Expo, 2010

An analytical dynamic scaling of supply voltage and body bias exploiting memory stall time variation.
Proceedings of the 15th Asia South Pacific Design Automation Conference, 2010

2009
Topology/Floorplan/Pipeline Co-Design of Cascaded Crossbar Bus.
IEEE Trans. Very Large Scale Integr. Syst., 2009

An Analytical Dynamic Scaling of Supply Voltage and Body Bias Based on Parallelism-Aware Workload and Runtime Distribution.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2009

Topology Synthesis of Cascaded Crossbar Switches.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2009

Power Modeling of Solid State Disk for Dynamic Power Management Policy Design in Embedded Systems.
Proceedings of the Software Technologies for Embedded and Ubiquitous Systems, 2009

In-network reorder buffer to improve overall NoC performance while resolving the in-order requirement problem.
Proceedings of the Design, Automation and Test in Europe, 2009

Program phase and runtime distribution-aware online DVFS for combined Vdd/Vbb scaling.
Proceedings of the Design, Automation and Test in Europe, 2009

Multiprocessor System-on-Chip designs with active memory processors for higher memory efficiency.
Proceedings of the 46th Design Automation Conference, 2009

2008
Entry control in network-on-chip for memory power reduction.
Proceedings of the 2008 International Symposium on Low Power Electronics and Design, 2008

An Open-Loop Flow Control Scheme Based on the Accurate Global Information of On-Chip Communication.
Proceedings of the Design, Automation and Test in Europe, 2008

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution.
Proceedings of the Design, Automation and Test in Europe, 2008

A practical approach of memory access parallelization to exploit multiple off-chip DDR memories.
Proceedings of the 45th Design Automation Conference, 2008

Mixed integer linear programming-based optimal topology synthesis of cascaded crossbar switches.
Proceedings of the 13th Asia South Pacific Design Automation Conference, 2008

An industrial perspective of power-aware reliable SoC design.
Proceedings of the 13th Asia South Pacific Design Automation Conference, 2008

2007
Fast cycle-approximate MPSoC simulation based on synchronization time-point prediction.
Des. Autom. Embed. Syst., 2007

Scheduling with accurate communication delay model and scheduler implementation for multiprocessor system-on-chip.
Des. Autom. Embed. Syst., 2007

Communication Architecture Synthesis of Cascaded Bus Matrix.
Proceedings of the 12th Conference on Asia South Pacific Design Automation, 2007

2006
Runtime distribution-aware dynamic voltage scaling.
Proceedings of the 2006 International Conference on Computer-Aided Design, 2006

Creation and utilization of a virtual platform for embedded software optimization: : an industrial case study.
Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis, 2006

PowerV<i>i</i>P: Soc power estimation framework at transaction level.
Proceedings of the 2006 Conference on Asia South Pacific Design Automation: ASP-DAC 2006, 2006

2005
ChronoSym: a new approach for fast and accurate SoC cosimulation.
Int. J. Embed. Syst., 2005

Scheduler implementation in MP SoC design.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

2004
Multi-Processor SoC Design Methodology Using a Concept of Two-Layer Hardware-Dependent Software.
Proceedings of the 2004 Design, 2004

Debugging HW/SW interface for MPSoC: video encoder system design case study.
Proceedings of the 41th Design Automation Conference, 2004

Fast and accurate timed execution of high level embedded software using HW/SW interface simulation model.
Proceedings of the 2004 Conference on Asia South Pacific Design Automation: Electronic Design and Solution Fair 2004, 2004

2003
An Efficient Simulation Environment and Simulation Techniques for Bluetooth Device Design.
Des. Autom. Embed. Syst., 2003

Introduction to Hardware Abstraction Layers for SoC.
Proceedings of the 2003 Design, 2003

Building Fast and Accurate SW Simulation Models Based on Hardware Abstraction Layer and Simulation Environment Abstraction Layer.
Proceedings of the 2003 Design, 2003

Scheduling and Timing Analysis of HW/SW On-Chip Communication in MP SoC Design.
Proceedings of the 2003 Design, 2003

Multi-Level Software Validation for NoC.
Proceedings of the Networks on Chip, 2003

Introduction to Hardware Abstraction Layers for SoC.
Proceedings of the Embedded Software for SoC, 2003

Scheduling and Timing Analysis of HW/SW On-Chip Communication in MP SoC Design.
Proceedings of the Embedded Software for SoC, 2003

2002
Desiderata pour la spécification et la conception des systèmes électroniques.
Tech. Sci. Informatiques, 2002

Multiprocessor SoC Platforms: A Component-Based Design Approach.
IEEE Des. Test Comput., 2002

Application of Multi-Domain and Multi-Language Cosimulation to an Optical MEM Switch Design.
Proceedings of the 7th Asia and South Pacific Design Automation Conference (ASP-DAC 2002), 2002

Validation in a Component-Based Design Flow for Multicore SoCs.
Proceedings of the 15th International Symposium on System Synthesis (ISSS 2002), 2002

An intra-task dynamic voltage scaling method for SoC design with hierarchical FSM and synchronous dataflow model.
Proceedings of the 2002 International Symposium on Low Power Electronics and Design, 2002

Timed HW-SW cosimulation using native execution of OS and application SW.
Proceedings of the Seventh IEEE International High-Level Design Validation and Test Workshop 2002, 2002

Automatic Generation of Fast Timed Simulation Models for Operating Systems in SoC Design.
Proceedings of the 2002 Design, 2002

Component-based design approach for multicore SoCs.
Proceedings of the 39th Design Automation Conference, 2002

Reconfigurable SoC design with hierarchical FSM and synchronous dataflow model.
Proceedings of the Tenth International Symposium on Hardware/Software Codesign, 2002

2001
Automatic generation and targeting of application-specificoperating systems and embedded systems software.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2001

Fast timed cosimulation of HW/SW implementation of embedded multiprocessor SoC communication.
Proceedings of the Sixth IEEE International High-Level Design Validation and Test Workshop 2001, 2001

Mixed-level cosimulation for fine gradual refinement of communication in SoC design.
Proceedings of the Conference on Design, Automation and Test in Europe, 2001

Performance improvement of multi-processor systems cosimulation based on SW analysis.
Proceedings of the Conference on Design, Automation and Test in Europe, 2001

Automatic generation and targeting of application specific operating systems and embedded systems software.
Proceedings of the Conference on Design, Automation and Test in Europe, 2001

Automatic Generation of Application-Specific Architectures for Heterogeneous Multiprocessor System-on-Chip.
Proceedings of the 38th Design Automation Conference, 2001

A generic wrapper architecture for multi-processor SoC cosimulation and design.
Proceedings of the Ninth International Symposium on Hardware/Software Codesign, 2001

Scalable and flexible cosimulation of SoC designs with heterogeneous multi-processor target architectures.
Proceedings of ASP-DAC 2001, 2001

2000
Performance improvement of geographically distributed cosimulation by hierarchically grouped messages.
IEEE Trans. Very Large Scale Integr. Syst., 2000

Optimizing Timed Cosimulation by Hybrid Synchronization.
Des. Autom. Embed. Syst., 2000

Fast Hardware-Software Coverification by Optimistic Execution of Real Processor.
Proceedings of the 2000 Design, 2000

Performance estimation of multiple-cache IP-based systems: case study of an interdependency problem and application of an extended shared memory model.
Proceedings of the Eighth International Workshop on Hardware/Software Codesign, 2000

Hardware-software cosynthesis for run-time incrementally reconfigurable FPGAs.
Proceedings of ASP-DAC 2000, 2000

1999
Exploiting Early Partial Reconfiguration of Run-Time Reconfigurable FPGAs in Embedded Systems Design.
Proceedings of the 1999 ACM/SIGDA Seventh International Symposium on Field Programmable Gate Arrays, 1999

Optimizing geographically distributed timed cosimulation by hierarchically grouped messages.
Proceedings of the Seventh International Workshop on Hardware/Software Codesign, 1999

1998
Optimistic distributed timed cosimulation based on thread simulation model.
Proceedings of the Sixth International Workshop on Hardware/Software Codesign, 1998

1996
Hardware-Software Codesign of Resource-Constrained Real-Time Systems.
Proceedings of the Third International Workshop on Real-Time Computing Systems Application (RTCSA '96), October 30, 1996


  Loading...