Yongjun Park

Orcid: 0000-0003-3725-0380

Affiliations:
  • Yonsei University, Seoul, South Korea
  • Hanyang University, Seoul, South Korea (former)
  • Hongik University, Seoul, South Korea (former)


According to our database1, Yongjun Park authored at least 56 papers between 2009 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
ISP Agent: A Generalized In-storage-processing Workload Offloading Framework by Providing Multiple Optimization Opportunities.
ACM Trans. Archit. Code Optim., March, 2024

Orchestrating Multiple Mixed Precision Models on a Shared Precision-Scalable NPU.
Proceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, 2024

Discovering Efficient Fused Layer Configurations for Executing Multi-Workloads on Multi-Core NPUs.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

2023
MaPHeA: A Framework for Lightweight Memory Hierarchy-aware Profile-guided Heap Allocation.
ACM Trans. Embed. Comput. Syst., 2023

Synchronization-Aware NAS for an Efficient Collaborative Inference on Mobile Platforms.
Proceedings of the 24th ACM SIGPLAN/SIGBED International Conference on Languages, 2023

Orchestrating Large-Scale SpGEMMs using Dynamic Block Distribution and Data Transfer Minimization on Heterogeneous Systems.
Proceedings of the 39th IEEE International Conference on Data Engineering, 2023

Tailoring CUTLASS GEMM using Supervised Learning.
Proceedings of the 41st IEEE International Conference on Computer Design, 2023

Block Group Scheduling: A General Precision-scalable NPU Scheduling Technique with Capacity-aware Memory Allocation.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

SAGE: A Storage-Based Approach for Scalable and Efficient Sparse Generalized Matrix-Matrix Multiplication.
Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023

Virtual PIM: Resource-Aware Dynamic DPU Allocation and Workload Scheduling Framework for Multi-DPU PIM Architecture.
Proceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques, 2023

2022
Dynamic Rate Neural Acceleration Using Multiprocessing Mode Support.
IEEE Trans. Very Large Scale Integr. Syst., 2022

Networked SSD: Flash Memory Interconnection Network for High-Bandwidth SSD.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

SRTuner: Effective Compiler Optimization Customization by Exposing Synergistic Relations.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2022

2021
MaPHeA: a lightweight memory hierarchy-aware profile-guided heap allocation framework.
Proceedings of the LCTES '21: 22nd ACM SIGPLAN/SIGBED International Conference on Languages, 2021

MASCOT: A Quantization Framework for Efficient Matrix Factorization in Recommender Systems.
Proceedings of the IEEE International Conference on Data Mining, 2021

Legion: Tailoring Grouped Neural Execution Considering Heterogeneity on Multiple Edge Devices.
Proceedings of the 39th IEEE International Conference on Computer Design, 2021

2020
Two-tier garbage collection for persistent object.
Proceedings of the SAC '20: The 35th ACM/SIGAPP Symposium on Applied Computing, online event, [Brno, Czech Republic], March 30, 2020

LOCKED-Free Journaling: Improving the Coalescing Degree in EXT4 Journaling.
Proceedings of the 9th Non-Volatile Memory Systems and Applications Symposium, 2020

Optimization of GPU-based Sparse Matrix Multiplication for Large Sparse Networks.
Proceedings of the 36th IEEE International Conference on Data Engineering, 2020

Convergence-Aware Neural Network Training.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Navigator: Dynamic Multi-kernel Scheduling to Improve GPU Performance.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

PreScaler: an efficient system-aware precision scaling framework on heterogeneous systems.
Proceedings of the CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020

2019
Adaptive Cooperation of Prefetching and Warp Scheduling on GPUs.
IEEE Trans. Computers, 2019

Improving GPU Multitasking Efficiency Using Dynamic Resource Sharing.
IEEE Comput. Archit. Lett., 2019

Microarchitecture-Aware Code Generation for Deep Learning on Single-ISA Heterogeneous Multi-Core Mobile Processors.
IEEE Access, 2019

A compiler-based approach for GPGPU performance calibration using TLP modulation (WIP paper).
Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, 2019

GATE: A Generalized Dataflow-level Approximation Tuning Engine For Data Parallel Architectures.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

2018
WASP: Selective Data Prefetching with Monitoring Runtime Warp Progress on GPUs.
IEEE Trans. Computers, 2018

Runtime Profiling of OpenCL Workloads Using LLVM-based Code Instrumentation.
Proceedings of the TENCON 2018, 2018

Automated Neural Network Accelerator Generation Framework for Multiple Neural Network Applications.
Proceedings of the TENCON 2018, 2018

Core-level DVFS for Spatial Multitasking GPUs.
Proceedings of the TENCON 2018, 2018

Automatic code conversion for non-volatile memory.
Proceedings of the 33rd Annual ACM Symposium on Applied Computing, 2018

NN compactor: Minimizing memory and logic resources for small neural networks.
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

2017
Selective DRAM cache bypassing for improving bandwidth on DRAM/NVM hybrid main memory systems.
IEICE Electron. Express, 2017

Efficient GPU multitasking with latency minimization and cache boosting.
IEICE Electron. Express, 2017

A Comparative Study of Programming Environments Exploiting Heterogeneous Systems.
IEEE Access, 2017

A FPGA-based neural accelerator for small IoT devices.
Proceedings of the International SoC Design Conference, 2017

Dynamic Resource Management for Efficient Utilization of Multitasking GPUs.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

2016
An eDRAM-Based Approximate Register File for GPUs.
IEEE Des. Test, 2016

A bypass first policy for energy-efficient last level caches.
Proceedings of the International Conference on Embedded Computer Systems: Architectures, 2016

APRES: Improving Cache Efficiency by Exploiting Load Characteristics on GPUs.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

2015
SKMD: Single Kernel on Multiple Devices for Transparent CPU-GPU Collaboration.
ACM Trans. Comput. Syst., 2015

ELF: maximizing memory-level parallelism for GPUs with coordinated warp and fetch scheduling.
Proceedings of the International Conference for High Performance Computing, 2015

Enabling Efficient Alias Speculation.
Proceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, 2015

Chimera: Collaborative Preemption for Multitasking on a Shared GPU.
Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

Fine Grain Cache Partitioning Using Per-Instruction Working Blocks.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2013
Efficient execution of augmented reality applications on mobile programmable accelerators.
Proceedings of the 2013 International Conference on Field-Programmable Technology, 2013

Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012
Libra: Tailoring SIMD Execution Using Heterogeneous Hardware and Dynamic Configurability.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Efficient performance scaling of future CGRAs for mobile applications.
Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012

Process variation in near-threshold wide SIMD architectures.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

SIMD defragmenter: efficient ILP realization on data-parallel architectures.
Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012

2010
Resource recycling: putting idle resources to work on a composable accelerator.
Proceedings of the 2010 International Conference on Compilers, 2010

2009
A dataflow-centric approach to design low power control paths in CGRAs.
Proceedings of the IEEE 7th Symposium on Application Specific Processors, 2009

Polymorphic pipeline array: a flexible multicore accelerator with virtualized execution for mobile multimedia applications.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

CGRA express: accelerating execution using dynamic operation fusion.
Proceedings of the 2009 International Conference on Compilers, 2009


  Loading...