Yun Liang

Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2024

Cement: Streamlining FPGA Hardware Design with Cycle-Deterministic eHDL and Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2024

POPA: Expressing High and Portable Performance across Spatial and Vector Architectures for Tensor Computations.

[BibT_eX]

[DOI]

Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2024

A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

MoteNN: Memory Optimization via Fine-grained Scheduling for Deep Neural Networks on Tiny Devices.

[BibT_eX]

[DOI]

Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

QuFEM: Fast and Accurate Quantum Readout Calibration Using the Finite Element Method.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

SpecPIM: Accelerating Speculative Inference on PIM-Enabled System via Architecture-Dataflow Co-Exploration.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Amanda: Unified Instrumentation Framework for Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Trend-Aware Supervision: On Learning Invariance for Semi-supervised Facial Action Unit Intensity Estimation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Special Issue: "AI Acceleration on FPGAs".

[BibT_eX]

[DOI]

ACM Trans. Embed. Comput. Syst., November, 2023

Automatic Generation of Spatial Accelerator for Tensor Algebra.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., June, 2023

ALEGO: Towards Cost-Aware Architecture and Integration Co-Design for Chiplet-based Spatial Accelerators.

[BibT_eX]

[DOI]

CoRR, 2023

Khronos: Fusing Memory Access for Improved Hardware RTL Simulation.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

TileFlow: A Framework for Modeling Fusion Dataflow via Tree-based Analysis.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Memory-aware Scheduling for Complex Wired Networks with Iterative Graph Optimization.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

Monad: Towards Cost-Effective Specialization for Chiplet-Based Spatial Accelerators.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

ARES: A Mapping Framework of DNNs Towards Diverse PIMs with General Abstractions.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

Stronger Mixed-Size Placement Backbone Considering Second-Order Information.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

Calabash: Accelerating Attention Using a Systolic Array Chain on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 33rd International Conference on Field-Programmable Logic and Applications, 2023

Lasa: Abstraction and Specialization for Productive and Performant Linear Algebra on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 31st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2023

Memory and Computation Coordinated Mapping of DNNs onto Complex Heterogeneous SoC.

[BibT_eX]

[DOI]

Size Zheng

Siyuan Chen

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Rubick: A Synthesis Framework for Spatial Architectures via Dataflow Decomposition.

[BibT_eX]

[DOI]

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

A Model-Specific End-to-End Design Methodology for Resource-Constrained TinyML Hardware.

[BibT_eX]

[DOI]

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

EventFormer: AU Event Transformer for Facial Action Unit Event Detection.

[BibT_eX]

[DOI]

Proceedings of the 34th British Machine Vision Conference 2023, 2023

OpenPARF: An Open-Source Placement and Routing Framework for Large-Scale Heterogeneous FPGAs with Deep Learning Toolkit.

[BibT_eX]

[DOI]

Proceedings of the 15th IEEE International Conference on ASIC, 2023

2022

Optimized separable convolution: Yet another efficient convolution operator.

[BibT_eX]

[DOI]

AI Open, January, 2022

NeoFlow: A Flexible Framework for Enabling Efficient Compilation for High Performance DNN Training.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2022

Critique of "MemXCT: Memory-Centric X-Ray CT Reconstruction With Massive Parallelization" by SCC Team From Peking University.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2022

Morphling: A Reconfigurable Architecture for Tensor Computation.

[BibT_eX]

[DOI]

Liqiang Lu

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

FCNNLib: A Flexible Convolution Algorithm Library for Deep Learning on FPGAs.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

An Efficient Hardware Design for Accelerating Sparse CNNs With NAS-Based Models.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

EventFormer: AU Event Transformer for Facial Action Unit Event Detection.

[BibT_eX]

[DOI]

CoRR, 2022

Pursuing Knowledge Consistency: Supervised Hierarchical Contrastive Learning for Facial Action Unit Recognition.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction.

[BibT_eX]

[DOI]

Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Downscaling and Overflow-aware Model Compression for Efficient Vision Processors.

[BibT_eX]

[DOI]

Proceedings of the 42nd IEEE International Conference on Distributed Computing Systems, 2022

HECTOR: A Multi-Level Intermediate Representation for Hardware Synthesis Methodologies.

[BibT_eX]

[DOI]

Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, 2022

Message from the General Chair and Program Co-Chairs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Field-Programmable Technology, 2022

Preface.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Field-Programmable Technology, 2022

Towards Agile DNN Accelerator Design Using Incremental Synthesis on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the FPGA '22: The 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Virtual Event, USA, 27 February 2022, 2022

On Mitigating Hard Clusters for Face Clustering.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

EMS: efficient memory subsystem synthesis for spatial accelerators.

[BibT_eX]

[DOI]

Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

A Mapping Model of SNNs to Neuromorphic Hardware.

[BibT_eX]

[DOI]

Proceedings of the 4th IEEE International Conference on Artificial Intelligence Circuits and Systems, 2022

Causal Intervention for Subject-Deconfounded Facial Action Unit Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

BISWSRBS: A Winograd-based CNN Accelerator with a Fine-grained Regular Sparsity Pattern and Mixed Precision Quantization.

[BibT_eX]

[DOI]

ACM Trans. Reconfigurable Technol. Syst., 2021

Critique of "Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility" by SCC Team From Peking University.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2021

OMNI: A Framework for Integrating Hardware and Software Optimizations for Sparse CNNs.

[BibT_eX]

[DOI]

Liqiang Lu

Jiaming Xie

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

Cross-Modal Representation Learning for Lightweight and Accurate Facial Action Unit Detection.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., 2021

Preface.

[BibT_eX]

[DOI]

Chao Li

J. Comput. Sci. Technol., 2021

CaFGraph: Context-aware Facial Multi-graph Representation for Facial Action Unit Recognition.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Sanger: A Co-Design Framework for Enabling Sparse Attention using Reconfigurable Architecture.

[BibT_eX]

[DOI]

Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Analyzing the Design Space of Spatial Tensor Accelerators on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2021

HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation.

[BibT_eX]

[DOI]

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

TENET: A Framework for Modeling Tensor Dataflow Based on Relation-centric Notation.

[BibT_eX]

[DOI]

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

Characterizing and Demystifying the Implicit Convolution Algorithm on Commercial Matrix-Multiplication Accelerators.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Workload Characterization, 2021

AUPro: Multi-label Facial Action Unit Proposal Generation for Sequence-Level Analysis.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing - 28th International Conference, 2021

TensorLib: A Spatial Accelerator Generation Framework for Tensor Algebra.

[BibT_eX]

[DOI]

Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

2020

Fork Path: Batching ORAM Requests to Remove Redundant Memory Accesses.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Performance Modeling and Directives Optimization for High-Level Synthesis on FPGA.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Enabling Efficient Fast Convolution Algorithms on GPUs via MegaKernels.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2020

Generating Systolic Array Accelerators With Reusable Blocks.

[BibT_eX]

[DOI]

IEEE Micro, 2020

Fune: An FPGA Tuning Framework for CNN Acceleration.

[BibT_eX]

[DOI]

IEEE Des. Test, 2020

Systolic Computing on GPUs for Productive Performance.

[BibT_eX]

[DOI]

CoRR, 2020

SuSy: A Programming Model for Productive Construction of High-Performance Systolic Arrays on FPGAs.

[BibT_eX]

[DOI]

Christopher J. Hughes

Pradeep Dubey

Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020

A Winograd-Based CNN Accelerator with a Fine-Grained Regular Sparsity Pattern.

[BibT_eX]

[DOI]

Proceedings of the 30th International Conference on Field-Programmable Logic and Applications, 2020

FCNNLib: An Efficient and Flexible Convolution Algorithm Library on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019

CuLDA_CGS: solving large-scale LDA problems on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

A coordinated tiling and batching framework for efficient GEMM on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

Frequency Improvement of Systolic Array-Based CNNs on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2019

A Survey on 5G Network Slicing Enabling the Smart Grid.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Conference on Parallel and Distributed Systems, 2019

Zac: Towards Automatic Optimization and Deployment of Quantized Deep Neural Networks on Embedded Devices.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computer-Aided Design, 2019

CuLDA: Solving Large-scale LDA Problems on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing, 2019

Poly: Efficient Heterogeneous System and Application Management for Interactive Applications.

[BibT_eX]

[DOI]

Wei Zhang

Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

Overcoming Data Transfer Bottlenecks in DNN Accelerators via Layer-Conscious Memory Managment.

[BibT_eX]

[DOI]

Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

Speedy: An Accelerator for Sparse Convolutional Neural Networks on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

LexicalAT: Lexical-Based Adversarial Reinforcement Training for Robust Sentiment Classification.

[BibT_eX]

[DOI]

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Overcoming Data Transfer Bottlenecks in FPGA-based DNN Accelerators via Layer Conscious Memory Management.

[BibT_eX]

[DOI]

Xuechao Wei

Jason Cong

Proceedings of the 56th Annual Design Automation Conference 2019, 2019

E-LSTM: Efficient Inference of Sparse LSTM on Embedded Heterogeneous System.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual Design Automation Conference 2019, 2019

SPART: Optimizing CNNs by Utilizing Both Sparsity of Weights and Feature Maps.

[BibT_eX]

[DOI]

Jiaming Xie

Proceedings of the Advanced Parallel Processing Technologies, 2019

2018

Exploiting Sparsity to Accelerate Fully Connected Layers of CNN-Based Applications on Mobile SoCs.

[BibT_eX]

[DOI]

ACM Trans. Embed. Comput. Syst., 2018

Optimizing Cache Bypassing and Warp Scheduling for GPUs.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

CRAT: Enabling Coordinated Register Allocation and Thread-Level Parallelism Optimization for GPUs.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2018

FlexCL: A Model of Performance and Power for OpenCL Workloads on FPGAs.

[BibT_eX]

[DOI]

Wei Zhang

IEEE Trans. Computers, 2018

Student Cluster Competition 2017, Team Peking University: Reproducing vectorization of the Tersoff multi-body potential on the Intel Broadwell architecture.

[BibT_eX]

[DOI]

Parallel Comput., 2018

A Unified Framework of DNN Weight Pruning and Weight Clustering/Quantization Using ADMM.

[BibT_eX]

[DOI]

CoRR, 2018

cuMBIR: An Efficient Framework for Low-dose X-ray CT Image Reconstruction on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Supercomputing, 2018

Efficient Recurrent Neural Networks using Structured Matrices in FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 6th International Conference on Learning Representations, 2018

TGPA: tile-grained pipeline architecture for low latency CNN inference.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computer-Aided Design, 2018

C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

CAMAS: Static and Dynamic Hybrid Cache Management for CPU-FPGA Platforms.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

SpWA: an efficient sparse winograd convolutional neural networks accelerator on FPGAs.

[BibT_eX]

[DOI]

Liqiang Lu

Proceedings of the 55th Annual Design Automation Conference, 2018

Analytical Two-Level Near Threshold Cache Exploration for Low Power Biomedical Applications.

[BibT_eX]

[DOI]

Proceedings of the Advanced Computer Architecture - 12th Conference, 2018

2017

Efficient Kernel Management on GPUs.

[BibT_eX]

[DOI]

Xiuhong Li

ACM Trans. Embed. Comput. Syst., 2017

Scale-Free Sparse Matrix-Vector Multiplication on Many-Core Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2017

ParConnect reproducibility report.

[BibT_eX]

[DOI]

Parallel Comput., 2017

Enabling high performance deep learning networks on embedded systems.

[BibT_eX]

[DOI]

Qian Li

Proceedings of the IECON 2017 - 43rd Annual Conference of the IEEE Industrial Electronics Society, Beijing, China, October 29, 2017

COMBA: A comprehensive model-based analysis framework for high level synthesis of real applications.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design, 2017

A hybrid approach to cache management in heterogeneous CPU-FPGA platforms.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design, 2017

Exploring cache bypassing and partitioning for multi-tasking on GPUs.

[BibT_eX]

[DOI]

Xiuhong Li

Xiaolong Xie

Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design, 2017

CuMF_SGD: Parallelized Stochastic Gradient Descent for Matrix Factorization on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, 2017

A Framework for Iterative Stencil Algorithm Synthesis on FPGAs from OpenCL Programming Model (Abstract Only).

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

Design Space exploration of FPGA-based accelerators with multi-level parallelism.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017

Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 54th Annual Design Automation Conference, 2017

Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 54th Annual Design Automation Conference, 2017

FlexCL: An Analytical Performance Model for OpenCL Workloads on Flexible FPGAs.

[BibT_eX]

[DOI]

Wei Zhang

Proceedings of the 54th Annual Design Automation Conference, 2017

A Comprehensive Framework for Synthesizing Stencil Algorithms on FPGAs using OpenCL Model.

[BibT_eX]

[DOI]

Proceedings of the 54th Annual Design Automation Conference, 2017

Programming FPGAs Using OpenCL from Performance Model to Application Study.

[BibT_eX]

[DOI]

Proceedings of the first Workshop on Emerging Technologies for software-defined and reconfigurable hardware-accelerated Cloud Datacenters, 2017

Throughput optimization for streaming applications on CPU-FPGA heterogeneous systems.

[BibT_eX]

[DOI]

Proceedings of the 22nd Asia and South Pacific Design Automation Conference, 2017

2016

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2016

An Accurate GPU Performance Model for Effective Control Flow Divergence Optimization.

[BibT_eX]

[DOI]

Muhammad Teguh Satria

Kyle Rupnow

Deming Chen

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2016

FCUDA-HB: Hierarchical and Scalable Bus Architecture Generation on FPGAs With the FCUDA Flow.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2016

Performance-Centric Optimization for Racetrack Memory Based Register File on GPUs.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2016

CuMF_SGD: Fast and Scalable Matrix Factorization.

[BibT_eX]

[DOI]

CoRR, 2016

Lin-analyzer: a high-level performance analysis tool for FPGA-based accelerators.

[BibT_eX]

[DOI]

Proceedings of the 53rd Annual Design Automation Conference, 2016

Performance-centric register file design for GPUs using racetrack memory.

[BibT_eX]

[DOI]

Proceedings of the 21st Asia and South Pacific Design Automation Conference, 2016

2015

MrPhi: An Optimized MapReduce Framework on Intel Xeon Phi Coprocessors.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2015

Efficient GPU Spatial-Temporal Multitasking.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2015

An Efficient Compiler Framework for Cache Bypassing on GPUs.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2015

Instruction Cache Locking Using Temporal Reuse Profile.

[BibT_eX]

[DOI]

Lei Ju

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2015

Fork path: improving efficiency of ORAM by removing redundant memory accesses.

[BibT_eX]

[DOI]

Proceedings of the 48th International Symposium on Microarchitecture, 2015

Enabling coordinated register allocation and thread-level parallelism optimization for GPUs.

[BibT_eX]

[DOI]

Proceedings of the 48th International Symposium on Microarchitecture, 2015

Hi-fi playback: tolerating position errors in shift operations of racetrack memory.

[BibT_eX]

[DOI]

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Coordinated static and dynamic cache bypassing for GPUs.

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Optimizing and auto-tuning scale-free sparse matrix-vector multiplication on Intel Xeon Phi.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2015

Quantitative performance and power analysis of LTE using high level synthesis.

[BibT_eX]

[DOI]

Vanchinathan Venkataramani

Proceedings of the 2015 IEEE 11th International Conference on ASIC, 2015

2014

Rapid design space exploration of two-level unified caches.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systemss, 2014

GPU Accelerated Counterexample Generation in LTL Model Checking.

[BibT_eX]

[DOI]

Proceedings of the Formal Methods and Software Engineering, 2014

Design space exploration of multiple loops on FPGAs using high level synthesis.

[BibT_eX]

[DOI]

Guanwen Zhong

Smaïl Niar

Proceedings of the 32nd IEEE International Conference on Computer Design, 2014

Integrated CUDA-to-FPGA Synthesis with Network-on-Chip.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

WCET-Centric dynamic instruction cache locking.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

Run-Time Technique for Simultaneous Aging and Power Optimization in GPGPUs.

[BibT_eX]

[DOI]

Proceedings of the 51st Annual Design Automation Conference 2014, 2014

2013

An analytical approach for fast and accurate design space exploration of instruction caches.

[BibT_eX]

[DOI]

Alexandros Papakonstantinou

ACM Trans. Embed. Comput. Syst., 2013

Improving high level synthesis optimization opportunity through polyhedral transformations.

[BibT_eX]

[DOI]

Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

Throughput-oriented kernel porting onto FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual Design Automation Conference 2013, 2013

Integrated instruction cache analysis and locking in multitasking real-time systems.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual Design Automation Conference 2013, 2013

Optimizing the MapReduce framework on Intel Xeon Phi coprocessor.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Conference on Big Data (IEEE BigData 2013), 2013

[BibT_eX]

[DOI]

Proceedings of the 18th Asia and South Pacific Design Automation Conference, 2013

High-level synthesis of multiple dependent CUDA kernels on FPGA.

[BibT_eX]

[DOI]

Proceedings of the 18th Asia and South Pacific Design Automation Conference, 2013

Shared cache aware task mapping for WCRT minimization.

[BibT_eX]

[DOI]

Proceedings of the 18th Asia and South Pacific Design Automation Conference, 2013

2012

Timing analysis of concurrent programs running on shared cache multi-cores.

[BibT_eX]

[DOI]

Real Time Syst., 2012

High-Level Synthesis: Productivity, Performance, and Software Constraints.

[BibT_eX]

[DOI]

J. Electr. Comput. Eng., 2012

An Accurate GPU Performance Model for Effective Control Flow Divergence Optimization.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Real-time implementation and performance optimization of 3D sound localization on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

WCET-centric partial instruction cache locking.

[BibT_eX]

[DOI]

Alexandros Papakonstantinou

Proceedings of the 49th Annual Design Automation Conference 2012, 2012

2011

High level synthesis of stereo matching: Productivity, performance, and software constraints.

[BibT_eX]

[DOI]

Proceedings of the 2011 International Conference on Field-Programmable Technology, 2011

Multilevel Granularity Parallelism Synthesis on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines, 2011

A study of high-level synthesis: Promises and challenges.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

2010

An Efficient Algorithm to Estimate Real-time Traffic Information based on Multiple Data Sources.

[BibT_eX]

Proceedings of the ICAART 2010 - Proceedings of the International Conference on Agents and Artificial Intelligence, Volume 1, 2010

Efficient custom instructions generation for system-level design.

[BibT_eX]

[DOI]

Huynh Phung Huynh

Proceedings of the International Conference on Field-Programmable Technology, 2010

Instruction cache locking using temporal reuse profile.

[BibT_eX]

[DOI]

Proceedings of the 47th Design Automation Conference, 2010

Improved procedure placement for set associative caches.

[BibT_eX]

[DOI]

Proceedings of the 2010 International Conference on Compilers, 2010

2009

Cache-aware optimization of BAN applications.

[BibT_eX]

[DOI]

Des. Autom. Embed. Syst., 2009

Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores.

[BibT_eX]

[DOI]

Proceedings of the 30th IEEE Real-Time Systems Symposium, 2009

2008

Cache modeling in probabilistic execution time analysis.

[BibT_eX]

[DOI]

Proceedings of the 45th Design Automation Conference, 2008

Static analysis for fast and accurate design space exploration of caches.

[BibT_eX]

[DOI]