2020
SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Third Conference on Machine Learning and Systems, 2020
2019
Hybrid Quick Error Detection: Validation and Debug of SoCs Through High-Level Synthesis.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019
SkyNet: A Champion Model for DAC-SDC on Low Power Object Detection.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2019
FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019
2017
Machine learning on FPGAs to face the IoT revolution.
Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design, 2017
High-performance video content recognition with long-term recurrent convolutional network for FPGA.
Proceedings of the 27th International Conference on Field Programmable Logic and Applications, 2017
Hardware Acceleration of the Pair-HMM Algorithm for DNA Variant Calling.
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017
2016
FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow.
IEEE Trans. Very Large Scale Integr. Syst., 2016
An Accurate GPU Performance Model for Effective Control Flow Divergence Optimization.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2016
FCUDA-HB: Hierarchical and Scalable Bus Architecture Generation on FPGAs With the FCUDA Flow.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2016
Platform choices and design demands for IoT platforms: cost, power, and performance tradeoffs.
IET Cyper-Phys. Syst.: Theory & Appl., 2016
SoC, NoC and Hierarchical Bus Implementations of Applications on FPGAs Using the FCUDA Flow.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2016
Automated Verification Code Generation in HLS Using Software Execution Traces (Abstract Only).
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016
FCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDA-to-FPGA Compiler.
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016
High Level Synthesis of Complex Applications: An H.264 Video Decoder.
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016
AutoSLIDE: Automatic Source-Level Instrumentation and Debugging for HLS.
Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2016
Acceleration of the Pair-HMM Algorithm for DNA Variant Calling.
Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2016
Real-time system-level implementation of a telepresence robot using an embedded GPU platform.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016
Debugging and verifying SoC designs through effective cross-layer hardware-software co-simulation.
Proceedings of the 53rd Annual Design Automation Conference, 2016
Designing high-quality hardware on a development effort budget: A study of the current state of high-level synthesis.
Proceedings of the 21st Asia and South Pacific Design Automation Conference, 2016
2015
Efficient GPU Spatial-Temporal Multitasking.
IEEE Trans. Parallel Distributed Syst., 2015
JIT trace-based verification for high-level synthesis.
Proceedings of the 2015 International Conference on Field Programmable Technology, 2015
Behavioral-level IP integration in high-level synthesis.
Proceedings of the 2015 International Conference on Field Programmable Technology, 2015
System-level design solutions: Enabling the IoT explosion.
Proceedings of the 2015 IEEE 11th International Conference on ASIC, 2015
2014
High-Level Synthesis With Behavioral-Level Multicycle Path Analysis.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2014
New solutions for system-level and high-level synthesis (Invited paper).
Proceedings of the 2014 International Symposium on Integrated Circuits (ISIC), 2014
Fast and effective placement and routing directed high-level synthesis for FPGAs.
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014
Integrated CUDA-to-FPGA Synthesis with Network-on-Chip.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014
2013
High-level synthesis with behavioral level multi-cycle path analysis.
Proceedings of the 23rd International Conference on Field programmable Logic and Applications, 2013
Improving high level synthesis optimization opportunity through polyhedral transformations.
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013
Register and thread structure optimization for GPUs.
Proceedings of the 18th Asia and South Pacific Design Automation Conference, 2013
High-level synthesis of multiple dependent CUDA kernels on FPGA.
Proceedings of the 18th Asia and South Pacific Design Automation Conference, 2013
2012
High-Level Synthesis: Productivity, Performance, and Software Constraints.
J. Electr. Comput. Eng., 2012
An Accurate GPU Performance Model for Effective Control Flow Divergence Optimization.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012
Real-time implementation and performance optimization of 3D sound localization on GPUs.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012
2011
Scientific Application Demands on a Reconfigurable Functional Unit Interface.
ACM Trans. Reconfigurable Technol. Syst., 2011
High level synthesis of stereo matching: Productivity, performance, and software constraints.
Proceedings of the 2011 International Conference on Field-Programmable Technology, 2011
A study of high-level synthesis: Promises and challenges.
Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011
2010
Dynamic Binding and Scheduling of Firm-Deadline Tasks on Heterogeneous Compute Resources.
Proceedings of the 16th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 2010
A Reconfigurable Computing Scheduler Optimized for Multicore Systems.
Proceedings of the International Conference on Field Programmable Logic and Applications, 2010
Accurately evaluating application performance in simulated hybrid multi-tasking systems.
Proceedings of the ACM/SIGDA 18th International Symposium on Field Programmable Gate Arrays, 2010
2009
Performance metrics for hybrid multi-tasking systems.
Proceedings of the 19th International Conference on Field Programmable Logic and Applications, 2009
Block, Drop or Roll(back): Alternative Preemption Methods for RH Multi-Tasking.
Proceedings of the FCCM 2009, 2009
2007
Reconfigurable Functional Units for Scientific Superscalar Processors.
Proceedings of the 2007 International Conference on Field-Programmable Technology, 2007
Scientific Application Acceleration with Reconfigurable Functional Units.
Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines, 2007
2006
Scientific applications vs. SPEC-FP: a comparison of program behavior.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006