2024
Adaptive Blockwise Task-interleaved Pipeline Parallelism.
CoRR, 2024

A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

2022
EasyView: Enabling and Scheduling Tensor Views in Deep Learning Compilers.
Proceedings of the 51st International Conference on Parallel Processing, 2022

2014
Fast and Accurate Stereo Vision System on FPGA.
ACM Trans. Reconfigurable Technol. Syst., 2014

2012
A fast and high quality stereo matching algorithm on FPGA.
Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL), 2012

A real-time stereo vision system using a tree-structured dynamic programming on FPGA.
Proceedings of the ACM/SIGDA 20th International Symposium on Field Programmable Gate Arrays, 2012