Jenq Kuen Lee

Orcid: 0000-0001-9919-6258

  • National Tsing-Hua University, Department of Computer Sciencem, Taiwan

According to our database1, Jenq Kuen Lee authored at least 148 papers between 1991 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 


Online presence:



Optimizing computer vision algorithms with TVM on VLIW architecture based on RVV.
J. Supercomput., January, 2025

Case Study: Optimization Methods With TVM Hybrid-OP on RISC-V Packed SIMD.
IEEE Access, 2024

Low DRAM Memory Access and Flexible Dataflow Convolutional Neural Network Accelerator based on RISC-V Custom Instruction.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2024

Rewriting and Optimizing Vector Length Agnostic Intrinsics from Arm SVE to RVV.
Proceedings of the Workshop Proceedings of the 53rd International Conference on Parallel Processing, 2024

The Rewriting of DataRaceBench Benchmark for OpenCL Program Validations.
Proceedings of the Workshop Proceedings of the 53rd International Conference on Parallel Processing, 2024

Accelerating AI performance with the incorporation of TVM and MediaTek NeuroPilot.
Connect. Sci., December, 2023

Accelerating AI Applications with Sparse Matrix Compression in Halide.
J. Signal Process. Syst., May, 2023

Guest Editorial: Special Issue on Systems Optimizations for DSP and AI Applications.
J. Signal Process. Syst., May, 2023

Auto-tuning Fixed-point Precision with TVM on RISC-V Packed SIMD Extension.
ACM Trans. Design Autom. Electr. Syst., 2023

SIMD Everywhere Optimization from ARM NEON to RISC-V Vector Extensions.
CoRR, 2023

Simulation Environment with Customized RISC-V Instructions for Logic-in-Memory Architectures.
CoRR, 2023

Support of Sparse Tensor Computing for MLIR HLS.
Proceedings of the 52nd International Conference on Parallel Processing Workshops, 2023

Enhancing LLVM Optimizations for Linear Recurrence Programs on RVV.
Proceedings of the 52nd International Conference on Parallel Processing Workshops, 2023

Efficient Realization of Decision Trees for Real-Time Inference.
ACM Trans. Embed. Comput. Syst., November, 2022

Case Study: Design Strategies for Enabling Visual Application Blocks of Bluetooth Library.
IEEE Access, 2022

C++OpenCL4TVM: Support C++OpenCL Kernel for TVM NN Operators.
Proceedings of the IWOCL'22: International Workshop on OpenCL, Bristol, United Kingdom, May 10, 2022

Register-Pressure Aware Predicator for Length Multiplier of RVV.
Proceedings of the Workshop Proceedings of the 51st International Conference on Parallel Processing, 2022

The Support of MLIR HLS Adaptor for LLVM IR.
Proceedings of the Workshop Proceedings of the 51st International Conference on Parallel Processing, 2022

Efficient Support of the Scan Vector Model for RISC-V Vector Extension.
Proceedings of the Workshop Proceedings of the 51st International Conference on Parallel Processing, 2022

Application Showcases for TVM with NeuroPilot on Mobile Devices.
Proceedings of the Workshop Proceedings of the 51st International Conference on Parallel Processing, 2022

Pointer-Based Divergence Analysis for OpenCL 2.0 Programs.
ACM Trans. Parallel Comput., 2021

NNBlocks: a Blockly framework for AI computing.
J. Supercomput., 2021

Support NNEF execution model for NNAPI.
J. Supercomput., 2021

Enabling the Use of C++20 Unseq Execution Policy for OpenCL.
Proceedings of the IWOCL'21: International Workshop on OpenCL, Munich Germany, April, 2021, 2021

Accelerate Binarized Neural Networks with Processing-in-Memory Enabled by RISC-V Custom Instructions.
Proceedings of the ICPP Workshops 2021: 50th International Conference on Parallel Processing, 2021

Support Convolution of CNN with Compression Sparse Matrix Multiplication Flow in TVM.
Proceedings of the ICPP Workshops 2021: 50th International Conference on Parallel Processing, 2021

Experiment and enabled flow for GPGPU-Sim simulators with fixed-point instructions.
J. Syst. Archit., 2020

Experiments and optimizations for TVM on RISC-V Architectures with P Extension.
Proceedings of the 2020 International Symposium on VLSI Design, Automation and Test, 2020

Accelerating NNEF Framework on OpenCL Devices Using clDNN.
Proceedings of the IWOCL '20: International Workshop on OpenCL, 2020

Enabling Android NNAPI Flow for TVM Runtime.
Proceedings of the ICPP Workshops '20: Workshops, Edmonton, AB, Canada, August 17-20, 2020, 2020

Devise Sparse Compression Schedulers to Enhance FastText Methods.
Proceedings of the ICPP Workshops '20: Workshops, Edmonton, AB, Canada, August 17-20, 2020, 2020

Support OpenCL 2.0 Compiler on LLVM for PTX Simulators.
J. Signal Process. Syst., 2019

Guest Editorial: Special Issue on Embedded Multicore Applications and Optimization.
J. Signal Process. Syst., 2019

Sparse-Matrix Compression Primitives with OpenCL Framework to Support Halide.
Proceedings of the International Workshop on OpenCL, 2019

Case Study: Support OpenCL Complex Class for Baseband Computing.
Proceedings of the International Workshop on OpenCL, 2019

Devise Rust Compiler Optimizations on RISC-V Architectures with SIMD Instructions.
Proceedings of the 48th International Conference on Parallel Processing, 2019

Accelerate DNN Performance with Sparse Matrix Compression in Halide.
Proceedings of the 48th International Conference on Parallel Processing, 2019

Architecture and Compiler Support for GPUs Using Energy-Efficient Affine Register Files.
ACM Trans. Design Autom. Electr. Syst., 2018

ViennaCL++: Enable TensorFlow/Eigen via ViennaCL with OpenCL C++ Flow.
Proceedings of the International Workshop on OpenCL, 2018

Scheduling Methods to Optimize Dependent Programs for GPU Architecture.
Proceedings of the 47th International Conference on Parallel Processing, 2018

Enable the Flow for GPGPU-Sim Simulators with Fixed-Point Instructions.
Proceedings of the 47th International Conference on Parallel Processing, 2018

Graph Support and Scheduling for OpenCL on Heterogeneous Multi-core Systems.
Proceedings of the 47th International Conference on Parallel Processing, 2018

Enabling PoCL-based runtime frameworks on the HSA for OpenCL 2.0 support.
J. Syst. Archit., 2017

Analyzing OpenCL 2.0 workloads using a heterogeneous CPU-GPU simulator.
Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017

Hierarchical Read/Write Analysis for Pointer-Based OpenCL Programs on RRAM.
Proceedings of the 46th International Conference on Parallel Processing Workshops, 2017

OpenCL 2.0 Compiler Adaptation on LLVM for PTX Simulators.
Proceedings of the 46th International Conference on Parallel Processing Workshops, 2017

Translating the ARM Neon and VFP instructions in a binary translator.
Softw. Pract. Exp., 2016

Vector data flow analysis for SIMD optimizations on OpenCL programs.
Concurr. Comput. Pract. Exp., 2016

Energy Efficient Affine Register File for GPU Microarchitecture.
Proceedings of the 45th International Conference on Parallel Processing Workshops, 2016

OpenCV Optimization on Heterogeneous Multi-core Systems for Gesture Recognition Applications.
Proceedings of the 45th International Conference on Parallel Processing Workshops, 2016

A Probabilistic Framework for Compiler Optimization with Multithread Power-Gating Controls.
Proceedings of the 45th International Conference on Parallel Processing Workshops, 2016

Compilers for Low Power with Design Patterns on Embedded Multicore Systems.
J. Signal Process. Syst., 2015

Guest Editorial: Embedded Multicore Systems and Applications.
J. Signal Process. Syst., 2015

The Design and Experiments of A SID-Based Power-Aware Simulator for Embedded Multicore Systems.
ACM Trans. Design Autom. Electr. Syst., 2015

Guest Editorial: Multi-Core Embedded Computing for Signal Processing.
J. Signal Process. Syst., 2014

C++ Support and Applications for Embedded Multicore DSP Systems.
J. Signal Process. Syst., 2014

Compiler Optimization for Reducing Leakage Power in Multithread BSP Programs.
ACM Trans. Design Autom. Electr. Syst., 2014

Achieving spilling-friendly register file assignment for highly distributed register files.
J. Supercomput., 2014

Register spilling via transformed interference equations for PAC DSP architecture.
Concurr. Comput. Pract. Exp., 2014

On the and-or-scheduling problems.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

The design of LLVM-based shader compiler for embedded architecture.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

Optimized memory access support for data layout conversion on heterogeneous multi-core systems.
Proceedings of the 12th IEEE Symposium on Embedded Systems for Real-time Multimedia, 2014

Compilers for Low Power with Design Patterns on Embedded Multicore Systems.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

Design of vehicle detection methods with OpenCL programming on multi-core systems.
Proceedings of the 11th IEEE Symposium on Embedded Systems for Real-time Multimedia, 2013

Support of Probabilistic Pointer Analysis in the SSA Form.
IEEE Trans. Parallel Distributed Syst., 2012

Instruction scheduling methods and phase ordering framework for VLIW DSP processors with distributed register files.
J. Supercomput., 2012

Case study: stereo vision experiments with multi-core software API on embedded MPSoC environments.
J. Supercomput., 2012

Parallelization of Belief Propagation on Cell Processors for Stereo Vision.
ACM Trans. Embed. Comput. Syst., 2012

Compiler supports for VLIW DSP processors with SIMD intrinsics.
Concurr. Comput. Pract. Exp., 2012

Enabling an OpenCL Compiler for Embedded Multicore DSP Systems.
Proceedings of the 41st International Conference on Parallel Processing Workshops, 2012

Array Languages, Compiler Techniques for.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Parallel Architecture Core (PAC) - the First Multicore Application Processor SoC in Taiwan Part I: Hardware Architecture & Software Development Tools.
J. Signal Process. Syst., 2011

C++ Compiler Supports for Embedded Multicore DSP Systems.
Proceedings of the 2011 International Conference on Parallel Processing Workshops, 2011

Enable OpenCL Compiler with Open64 Infrastructures.
Proceedings of the 13th IEEE International Conference on High Performance Computing & Communication, 2011

Innovative system and application curriculum on multicore systems.
Proceedings of the 6th Workshop on Embedded Systems Education, 2011

Parallelization of a Bokeh application on embedded multicore DSP systems.
Proceedings of the 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia, 2011

Support of software framework for embedded multi-core systems with Android environments.
Proceedings of the 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia, 2011

Programming model and tools for embedded multicore systems.
Int. J. Embed. Syst., 2010

A Multi-core Software API for Embedded MPSoC Environments.
Proceedings of the Methods and Tools of Parallel Programming Multicomputers, 2010

Support of Android lab modules for embedded system curriculum.
Proceedings of the 2010 Workshop on Embedded Systems Education, 2010

Power aware SID-based simulator for embedded multicore DSP subsystems.
Proceedings of the 8th International Conference on Hardware/Software Codesign and System Synthesis, 2010

LC-GRFA: global register file assignment with local consciousness for VLIW DSP processors with non-uniform register files.
Concurr. Comput. Pract. Exp., 2009

Efficient multiple virtual view generation based on reduced depth stereo image for advanced autostereoscopic displays.
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

Configurable SID-based multi-core simulators for embedded system education.
Proceedings of the 2009 Workshop on Embedded Systems Education, 2009

pTest: An adaptive testing tool for concurrent software on embedded multicore processors.
Proceedings of the Design, Automation and Test in Europe, 2009

Support of Paged Register Files for Improving Context Switching on Embedded Processors.
Proceedings of the 12th IEEE International Conference on Computational Science and Engineering, 2009

Effective Code Generation for Distributed and Ping-Pong Register Files: A Case Study on PAC VLIW DSP Cores.
J. Signal Process. Syst., 2008

Enhancing Microkernel Performance on VLIW DSP Processors via Multiset Context Switch.
J. Signal Process. Syst., 2008

Software architecture design for streaming Java RMI.
Sci. Comput. Program., 2008

Mobile Java RMI support over heterogeneous wireless networks: A case study.
J. Parallel Distributed Comput., 2008

The support of software design patterns for streaming RPC on embedded multicore processors.
Proceedings of the IEEE Workshop on Signal Processing Systems, 2008

Enabling Streaming Remoting on Embedded Dual-Core Processors.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

Parallelization of belief propagation method on embedded multicore processors for stereo vision.
Proceedings of the 6th IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia, 2008

Compilation for compact power-gating controls.
ACM Trans. Design Autom. Electr. Syst., 2007

Energy-aware scheduling and simulation methodologies for parallel security processors with multiple voltage domains.
J. Supercomput., 2007

Switching supports for stateful object remoting on network processors.
J. Supercomput., 2007

PALF: compiler supports for irregular register files in clustered VLIW DSP processors.
Concurr. Comput. Pract. Exp., 2007

Enabling compiler flow for embedded VLIW DSP processors with distributed register files.
Proceedings of the 2007 ACM SIGPLAN/SIGBED Conference on Languages, 2007

Compilers for leakage power reduction.
ACM Trans. Design Autom. Electr. Syst., 2006

Integrating Compiler and System Toolkit Flow for Embedded VLIW DSP Processors.
Proceedings of the 12th IEEE Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2006), 2006

Streaming support for Java RMI in distributed environments.
Proceedings of the 4th International Symposium on Principles and Practice of Programming in Java, 2006

Copy Propagation Optimizations for VLIW DSP Processors with Distributed Register Files.
Proceedings of the Languages and Compilers for Parallel Computing, 2006

PAC DSP Core and Application Processors.
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006

Power Aware H.264/AVC Video Player on PAC Dual-Core SoC Platform.
Proceedings of the Embedded and Ubiquitous Computing, International Conference, 2006

Support and optimization of Java RMI over a Bluetooth environment.
Concurr. Pract. Exp., 2005

Compiler Supports and Optimizations for PAC VLIW DSP Processors.
Proceedings of the Languages and Compilers for Parallel Computing, 2005

Efficient Switching Supports of Distributed .NET Remoting with Network Processors.
Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005

A sink-n-hoist framework for leakage power reduction.
Proceedings of the EMSOFT 2005, 2005

System-level design space exploration for security processor prototyping in analytical approaches.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

Interprocedural Probabilistic Pointer Analysis.
IEEE Trans. Parallel Distributed Syst., 2004

Support and optimization for parallel sparse programs with array intrinsics of Fortran 90.
Parallel Comput., 2004

Case study: an infrastructure for C/ATLAS environments with object-oriented design and XML representation.
J. Syst. Softw., 2004

Power-Aware Scheduling for Parallel Security Processors with Analytical Models.
Proceedings of the Languages and Compilers for High Performance Computing, 2004

Specification and Architecture Supports for Component Adaptations on Distributed Environments.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Efficient support of java RMI over heterogeneous wireless networks.
Proceedings of IEEE International Conference on Communications, 2004

Compiler optimization on VLIW instruction scheduling for low power.
ACM Trans. Design Autom. Electr. Syst., 2003

Segmented Alignment: An Enhanced Model to Align Data Parallel Programs of HPF.
J. Supercomput., 2003

Compiler support for speculative multithreading architecture with probabilistic points-to analysis.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2003

Compiler Analysis and Supports for Leakage Power Reduction on Microprocessors.
Proceedings of the Languages and Compilers for Parallel Computing, 15th Workshop, 2002

Compiler Optimizations with DSP-Specific Semantic Descriptions.
Proceedings of the Languages and Compilers for Parallel Computing, 15th Workshop, 2002

Support and optimization of Java RMI over bluetooth environments.
Proceedings of the 2002 Joint ACM-ISCOPE Conference on Java Grande 2002, 2002

Building Ontology for Optimization and Composition of Parallel JavaBean Programs.
Proceedings of the International Symposium on Parallel Architectures, 2002

Parallel Sparse Supports for Array Intrinsic Functions of Fortran 90.
J. Supercomput., 2001

Array Operation Synthesis to Optimize HPF Programs on Distributed Memory Machines.
J. Parallel Distributed Comput., 2001

Probabilistic Points-to Analysis.
Proceedings of the Languages and Compilers for Parallel Computing, 2001

Probabilistic Inference Schemes for Sparsity Structures of Fortran 90 Array Intrinsics.
Proceedings of the 2001 International Conference on Parallel Processing, 2001

Real-Time Gang Schedulings With Workload Models for Parallel Computers.
J. Inf. Sci. Eng., 2000

Runtime Compositions and Optimizations of Parallel JavaBean Programs on Clustering Environments.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2000

A Bytecode Optimizer to Engineer Bytecodes for Performance.
Proceedings of the Languages and Compilers for Parallel Computing, 2000

Compiler Optimization on Instruction Scheduling for Low Power.
Proceedings of the 13th International Symposium on System Synthesis, 2000

Communication set generations with CSD calculus and expression-rewriting framework.
Parallel Comput., 1999

Compiler Optimizations for Parallel Sparse Programs with Array Intrinsics of Fortran 90.
Proceedings of the International Conference on Parallel Processing 1999, 1999

A Function-Composition Approach to Synthesize Fortran 90 Array Operations.
J. Parallel Distributed Comput., 1998

An Expression-Rewriting Framework to Generic Communication Sets for HPF Programs with Block-Cyclic Distribution.
Proceedings of the 12th International Parallel Processing Symposium / 9th Symposium on Parallel and Distributed Processing (IPPS/SPDP '98), March 30, 1998

Efficient Support of Parallel Sparse Computation for Array Intrinsic Functions of Fortran 90.
Proceedings of the 12th international conference on Supercomputing, 1998

Parallel Array Object I/O Support on Distributed Environments.
J. Parallel Distributed Comput., 1997

Towards Automatic Support of Parallel Sparse Computation in Java with Continuous Compilation.
Concurr. Pract. Exp., 1997

Sampling and Analytical Techniques for Data Distribution of Parallel Sparse Computation.
Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997

Towards the Parallelisation of Pressure Correction Method on Unstructured Grids.
Proceedings of the Conference on Parallel Computational Fluid Dynamics 1997, 1997

Integrating Automatic Data Alignment and Array Operation Synthesis to Optimize Data Parallel Programs.
Proceedings of the Languages and Compilers for Parallel Computing, 1997

Data Distribution Analysis and Optimization for Pointer-Based Distributed Programs.
Proceedings of the 1997 International Conference on Parallel Processing (ICPP '97), 1997

Array Operation Synthesis to Optimize HPF Programs.
Proceedings of the 1996 International Conference on Parallel Processing, 1996

Language and Environment Support for Parallel Array Object I/O on Distributed Environments.
Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, 1995

An Array Operation Synthesis Scheme to Optimize Fortran 90 Programs.
Proceedings of the Fifth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), 1995

The Xthreads library: Design, implementation, and applications.
Proceedings of the Seventeenth Annual International Computer Software and Applications Conference, 1993

Sigma II: A Tool Kit for Building Parallelizing Compilers and Performance Analysis Systems.
Proceedings of the Programming Environments for Parallel Computing, 1992

On Using Object-Oriented Parallel Programming to Build Distributed Algebraic Abstractions.
Proceedings of the Parallel Processing: CONPAR 92, 1992

Object oriented parallel programming: experiments and results.
Proceedings of the Proceedings Supercomputing '91, 1991
