Won Woo Ro

Orcid: 0000-0001-5390-6445

According to our database1, Won Woo Ro authored at least 132 papers between 2003 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
SHREG: Mitigating register redundancy in GPUs.
J. Syst. Archit., 2024

M3XU: Achieving High-Precision and Complex Matrix Multiplication with Low-Precision MXUs.
Proceedings of the International Conference for High Performance Computing, 2024

Generalizing Ray Tracing Accelerators for Tree Traversals on GPUs.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

GUMSO: Gating Unnecessary On-Chip Memory Slices for Power Optimization on GPUs.
Proceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design, 2024

Geneva: A Dynamic Confluence of Speculative Execution and In-Order Commitment Windows.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

REPrune: Channel Pruning via Kernel Representative Selection.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Recompiling QAOA Circuits on Various Rotational Directions.
Proceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques, 2024

2023
A convertible neural processor supporting adaptive quantization for real-time neural networks.
J. Syst. Archit., December, 2023

FLIXR: Embedding Index Into Flash Translation Layer in SSDs.
IEEE Trans. Computers, 2023

MAD MAcce: Supporting Multiply-Add Operations for Democratizing Matrix-Multiplication Accelerators.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Exploiting Inherent Properties of Complex Numbers for Accelerating Complex Valued Neural Networks.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

McCore: A Holistic Management of High-Performance Heterogeneous Multicores.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

AESPA: Asynchronous Execution Scheme to Exploit Bank-Level Parallelism of Processing-in-Memory.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Early-Adaptor: An Adaptive Framework forProactive UVM Memory Management.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023

TensorCV: Accelerating Inference-Adjacent Computation Using Tensor Processors.
Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2023

R2D2: Removing ReDunDancy Utilizing Linearity of Address Generation in GPUs.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

SnakeByte: A TLB Design with Adaptive and Recursive Page Merging in GPUs.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

Lightning Talk: Efficiency and Programmability of DNN Accelerators and GPUs.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Quixote: Improving Fidelity of Quantum Program by Independent Execution of Controlled Gates.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Context Swap: Multi-PIM System Preventing Remote Memory Access for Large Embedding Model Acceleration.
Proceedings of the 5th IEEE International Conference on Artificial Intelligence Circuits and Systems, 2023

Balanced Column-Wise Block Pruning for Maximizing GPU Parallelism.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

INTERPRET: Inter-Warp Register Reuse for GPU Tensor Core.
Proceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques, 2023

2022
CASH-RF: A Compiler-Assisted Hierarchical Register File in GPUs.
IEEE Embed. Syst. Lett., 2022

TEA-RC: Thread Context-Aware Register Cache for GPUs.
IEEE Access, 2022

Reconstructing Out-of-Order Issue Queue.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

2021
Two-Stage In-Storage Processing and Scheduling for Pattern Matching Applications.
IEEE Access, 2021

PIMCaffe: Functional Evaluation of a Machine Learning Framework for In-Memory Neural Processing Unit.
IEEE Access, 2021

Chapter Six - Deep learning with GPUs.
Adv. Comput., 2021

QoS-Aware Scheduling for Cellular Networks Using Deep Reinforcement Learning.
Proceedings of the Network and Parallel Computing, 2021

SPACE: Locality-Aware Processing in Heterogeneous Memory for Personalized Recommendations.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

2020
REACT: Scalable and High-Performance Regular Expression Pattern Matching Accelerator for In-Storage Processing.
IEEE Trans. Parallel Distributed Syst., 2020

Hi-End: Hierarchical, Endurance-Aware STT-MRAM-Based Register File for Energy-Efficient GPUs.
IEEE Access, 2020

Duplo: Lifting Redundant Memory Accesses of Deep Neural Networks for GPU Tensor Cores.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Check-In: In-Storage Checkpointing for Key-Value Store System Leveraging Flash-Based SSDs.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

CASINO Core Microarchitecture: Generating Out-of-Order Schedules Using Cascaded In-Order Scheduling Windows.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

2019
Fast CU Depth Decision for HEVC Using Neural Networks.
IEEE Trans. Circuits Syst. Video Technol., 2019

Adaptive Cooperation of Prefetching and Warp Scheduling on GPUs.
IEEE Trans. Computers, 2019

OverCome: Coarse-Grained Instruction Commit with Handover Register Renaming.
IEEE Trans. Computers, 2019

Contents-aware partitioning algorithm for parallel high efficiency video coding.
Multim. Tools Appl., 2019

Linebacker: preserving victim cache lines in idle register files of GPUs.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

Efficient Dilated-Winograd Convolutional Neural Networks.
Proceedings of the 2019 IEEE International Conference on Image Processing, 2019

Access Characteristic-based Cache Replacement Policy in an SSD.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

2018
Exploiting Pseudo-Quadtree Structure for Accelerating HEVC Spatial Resolution Downscaling Transcoder.
IEEE Trans. Multim., 2018

Architectural Protection of Application Privacy against Software and Physical Attacks in Untrusted Cloud Environment.
IEEE Trans. Cloud Comput., 2018

WASP: Selective Data Prefetching with Monitoring Runtime Warp Progress on GPUs.
IEEE Trans. Computers, 2018

Simultaneous and Speculative Thread Migration for Improving Energy Efficiency of Heterogeneous Core Architectures.
IEEE Trans. Computers, 2018

A semantic sensor mashup platform for Internet of Things.
Proceedings of the 4th IEEE World Forum on Internet of Things, 2018

FineReg: Fine-Grained Register File Management for Augmenting GPU Throughput.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

WIR: Warp Instruction Reuse to Minimize Repeated Computations in GPUs.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

2017
Dynamic Resizing on Active Warps Scheduler to Hide Operation Stalls on GPUs.
IEEE Trans. Parallel Distributed Syst., 2017

Improving Energy Efficiency of GPUs through Data Compression and Compressed Execution.
IEEE Trans. Computers, 2017

Dynamic Load Balancing of Dispatch Scheduling for Solid State Disks.
IEEE Trans. Computers, 2017

An adaptive plan-based approach to integrating semantic streams with remote RDF data.
J. Inf. Sci., 2017

Parallel in-order execution architecture for low-power processor.
Proceedings of the International SoC Design Conference, 2017

Characterizing convolutional neural network workloads on a detailed GPU simulator.
Proceedings of the International SoC Design Conference, 2017

Access Pattern-Aware Cache Management for Improving Data Utilization in GPU.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

2016
Exploiting Thread-Level Parallelism on HEVC by Employing a Reference Dependency Graph.
IEEE Trans. Circuits Syst. Video Technol., 2016

Parallel GPU Architecture Simulation Framework Exploiting Architectural-Level Parallelism with Timing Error Prediction.
IEEE Trans. Computers, 2016

Server side, play buffer based quality control for adaptive media streaming.
Multim. Tools Appl., 2016

Virtual Thread: Maximizing Thread-Level Parallelism beyond GPU Scheduling Limit.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU Multiprogramming.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

APRES: Improving Cache Efficiency by Exploiting Load Characteristics on GPUs.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Warped-preexecution: A GPU pre-execution approach for improving latency hiding.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

2015
Dynamic Load Balancing of Parallel SURF with Vertical Partitioning.
IEEE Trans. Parallel Distributed Syst., 2015

Network Variation and Fault Tolerant Performance Acceleration in Mobile Devices with Simultaneous Remote Execution.
IEEE Trans. Computers, 2015

A Performance-Energy Model to Evaluate Single Thread Execution Acceleration.
IEEE Comput. Archit. Lett., 2015

Proactive Plan-Based Continuous Query Processing over Diverse SPARQL Endpoints.
Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 2015

DRAW: investigating benefits of adaptive fetch group size on GPU.
Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015

A frequency scaling model for energy efficient DVFS designs based on circuit delay optimization.
Proceedings of the International Symposium on Consumer Electronics, 2015

Warped-compression: enabling power efficient GPUs through register compression.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Complex Sensor Mashups for Linking Sensors and Formula-Based Knowledge Bases.
Proceedings of the 2015 IEEE International Conference on Information Reuse and Integration, 2015

An accelerated separable median filter with sorting networks.
Proceedings of the 2015 IEEE International Conference on Image Processing, 2015

True motion compensation with feature detection for frame rate up-conversion.
Proceedings of the 2015 IEEE International Conference on Image Processing, 2015

Integrity Protection for Big Data Processing with Dynamic Redundancy Computation.
Proceedings of the 2015 IEEE International Conference on Autonomic Computing, 2015

Contention-Free Fair Queuing for High-Speed Storage with RAID-0 Architecture.
Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

Enhancing Software Dependability and Security with Hardware Supported Instruction Address Space Randomization.
Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2015

Another Look at Secure Big Data Processing: Formal Framework and a Potential Approach.
Proceedings of the 8th IEEE International Conference on Cloud Computing, 2015

2014
$C\!\!-\!\!Lock$ : Energy Efficient Synchronization for Embedded Multicore Systems.
IEEE Trans. Computers, 2014

Complexity-Effective Contention Management with Dynamic Backoff for Transactional Memory Systems.
IEEE Trans. Computers, 2014

Exploiting Implementation Diversity and Partial Connection of Routers in Application-Specific Network-on-Chip Topology Synthesis.
IEEE Trans. Computers, 2014

A Malicious Pattern Detection Engine for Embedded Security Systems in the Internet of Things.
Sensors, 2014

Boosting CUDA Applications with CPU-GPU Hybrid Computing.
Int. J. Parallel Program., 2014

Swarm Processor System: hardware process scheduler based energy efficient multi-core system.
IEICE Electron. Express, 2014

Architectural investigation of matrix data layout on multicore processors.
Future Gener. Comput. Syst., 2014

Accelerating MapReduce framework on multi-GPU systems.
Clust. Comput., 2014

LUT based secure cloud computing - An implementation using FPGAs.
Proceedings of the 2014 International Conference on ReConFigurable Computing and FPGAs, 2014

Workload synthesis: Generating benchmark workloads from statistical execution profile.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

Accelerating gesture recognition algorithm using coarse grained reconfigurable architectures.
Proceedings of the International Conference on Audio, 2014

Hyper threading-aware Virtual Machine migration.
Proceedings of the International Conference on Electronics, Information and Communications, 2014

Development of efficient VCPU pinning mechanism in Xen.
Proceedings of the International Conference on Electronics, Information and Communications, 2014

Multicore speedup models using frequency scaling with fixed power budget.
Proceedings of the International Conference on Electronics, Information and Communications, 2014

2013
Design and evaluation of random linear network coding Accelerators on FPGAs.
ACM Trans. Embed. Comput. Syst., 2013

Importance of Coherence Protocols with Network Applications on Multicore Processors.
IEEE Trans. Computers, 2013

A Distributed Signature Detection Method for Detecting Intrusions in Sensor Systems.
Sensors, 2013

Parallelized sub-resource loading for web rendering engine.
J. Syst. Archit., 2013

Benefits of using parallelized non-progressive network coding.
J. Netw. Comput. Appl., 2013

GPU-Friendly Parallel Genome Matching with Tiled Access and Reduced State Transition Table.
Int. J. Parallel Program., 2013

Exploiting SIMD parallelism on dynamically partitioned parallel network coding for P2P systems.
Comput. Electr. Eng., 2013

Parallel GPU architecture simulation framework exploiting work allocation unit parallelism.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

Mark-Sharing: A Parallel Garbage Collection Algorithm for Low Synchronization Overhead.
Proceedings of the 19th IEEE International Conference on Parallel and Distributed Systems, 2013

MGMR: Multi-GPU Based MapReduce.
Proceedings of the Grid and Pervasive Computing - 8th International Conference, 2013

2012
Offloading of media transcoding for high-quality multimedia services.
IEEE Trans. Consumer Electron., 2012

Reconfigurable and parallelized network coding decoder for VANETs.
Mob. Inf. Syst., 2012

Introducing the Extremely Heterogeneous Architecture.
J. Interconnect. Networks, 2012

An Efficient Block Cipher Implementation on Many-Core Graphics Processing Units.
J. Inf. Process. Syst., 2012

Multi-Threading and Suffix Grouping on Massive Multiple Pattern Matching Algorithm.
Comput. J., 2012

Accelerated Network Coding with Dynamic Stream Decomposition on Graphics Processing Unit.
Comput. J., 2012

Conflict Avoidance Scheduling Using Grouping List for Transactional Memory.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Cooperative heterogeneous computing for parallel processing on CPU/GPU hybrids.
Proceedings of the 16th Workshop on Interaction between Compilers and Computer Architectures, 2012

2011
Network Coding on Heterogeneous Multi-Core Processors for Wireless Sensor Networks.
Sensors, 2011

A Novel Sequential Tree Algorithm Based on Scoreboard for MPI Broadcast Communication.
IEICE Trans. Inf. Syst., 2011

A Low-Cost Standard Mode MPI Hardware Unit for Embedded MPSoC.
IEICE Trans. Inf. Syst., 2011

2010
On Improving Parallelized Network Coding with Dynamic Partitioning.
IEEE Trans. Parallel Distributed Syst., 2010

Multithreaded pattern matching algorithm with data rearrangement.
IEICE Electron. Express, 2010

Hardware implementation of a tessellation accelerator for the OpenVG standard.
IEICE Electron. Express, 2010

Implementing FFT using SPMD style of OpenMP.
Proceedings of the International Conference on Networked Computing and Advanced Information Management, 2010

FPGA implementation of highly parallelized decoder logic for network coding (abstract only).
Proceedings of the ACM/SIGDA 18th International Symposium on Field Programmable Gate Arrays, 2010

2009
A complexity-effective microprocessor design with decoupled dispatch queues and prefetching.
Parallel Comput., 2009

Efficient Parallelized Network Coding for P2P File Sharing Applications.
Proceedings of the Advances in Grid and Pervasive Computing, 4th International Conference, 2009

Fully Pipelined Hardware Implementation of 128-Bit SEED Block Cipher Algorithm.
Proceedings of the Reconfigurable Computing: Architectures, 2009

2008
A low-complexity microprocessor design with speculative pre-execution.
J. Syst. Archit., 2008

Efficient peer-to-peer file sharing using network coding in MANET.
J. Commun. Networks, 2008

Delay Analysis of Car-to-Car Reliable Data Delivery Strategies Based on Data Mulling with Network Coding.
IEICE Trans. Inf. Syst., 2008

Simultaneous thin-thread processors for low-power embedded systems.
IEICE Electron. Express, 2008

2006
Design and evaluation of a hierarchical decoupled architecture.
J. Supercomput., 2006

Speculative pre-execution assisted by compiler (SPEAR).
J. Parallel Distributed Comput., 2006

Design and Effectiveness of Small-Sized Decoupled Dispatch Queues.
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

2005
Techniques to Improve Performance Beyond Pipelining: Superpipelining, Superscalar, and VLIW.
Adv. Comput., 2005

A Low-Complexity Issue Queue Design with Speculative Pre-execution.
Proceedings of the High Performance Computing, 2005

2004
SPEAR: A Hybrid Model for Speculative Pre-Execution.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

2003
HiDISC: A Decoupled Architecture for Data-Intensive Application.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Compiler Support for Dynamic Speculative Pre-Execution.
Proceedings of the 7th Annual Workshop on Interaction between Compilers and Computer Architecture (INTERACT-7 2003), 2003


  Loading...