Yun Liang

Orcid: 0000-0002-9076-7998

Affiliations:
  • Peking University, Center for Energy-Efficient Computing and Applications, Beijing, China
  • University of Illinois Urbana-Champaign, Urbana, IL, USA (2010 - 2012)
  • National University of Singapore, Singapore (PhD 2010)


According to our database1, Yun Liang authored at least 169 papers between 2008 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Introduction to the Special Issue on FPGA-based Embedded Systems for Industrial and IoT Applications.
ACM Trans. Reconfigurable Technol. Syst., December, 2024

Proteus: Simulating the Performance of Distributed DNN Training.
IEEE Trans. Parallel Distributed Syst., October, 2024

Rubick: A Unified Infrastructure for Analyzing, Exploring, and Implementing Spatial Architectures via Dataflow Decomposition.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., April, 2024

MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers.
CoRR, 2024

OriGen:Enhancing RTL Code Generation with Code-to-Code Augmentation and Self-Reflection.
CoRR, 2024

The Dawn of AI-Native EDA: Promises and Challenges of Large Circuit Models.
CoRR, 2024

vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

Hestia: An Efficient Cross-Level Debugger for High-Level Synthesis.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

Hermes: Enhancing Extensibility in High-Level Synthesis through Multi-Level IRs.
Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2024

Cement: Streamlining FPGA Hardware Design with Cycle-Deterministic eHDL and Synthesis.
Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2024

POPA: Expressing High and Portable Performance across Spatial and Vector Architectures for Tensor Computations.
Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2024

A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

MoteNN: Memory Optimization via Fine-grained Scheduling for Deep Neural Networks on Tiny Devices.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

QuFEM: Fast and Accurate Quantum Readout Calibration Using the Finite Element Method.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

SpecPIM: Accelerating Speculative Inference on PIM-Enabled System via Architecture-Dataflow Co-Exploration.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Amanda: Unified Instrumentation Framework for Deep Neural Networks.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Trend-Aware Supervision: On Learning Invariance for Semi-supervised Facial Action Unit Intensity Estimation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Special Issue: "AI Acceleration on FPGAs".
ACM Trans. Embed. Comput. Syst., November, 2023

Automatic Generation of Spatial Accelerator for Tensor Algebra.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., June, 2023

ALEGO: Towards Cost-Aware Architecture and Integration Co-Design for Chiplet-based Spatial Accelerators.
CoRR, 2023

Khronos: Fusing Memory Access for Improved Hardware RTL Simulation.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

TileFlow: A Framework for Modeling Fusion Dataflow via Tree-based Analysis.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Memory-aware Scheduling for Complex Wired Networks with Iterative Graph Optimization.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

Monad: Towards Cost-Effective Specialization for Chiplet-Based Spatial Accelerators.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

ARES: A Mapping Framework of DNNs Towards Diverse PIMs with General Abstractions.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

Stronger Mixed-Size Placement Backbone Considering Second-Order Information.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

Calabash: Accelerating Attention Using a Systolic Array Chain on FPGAs.
Proceedings of the 33rd International Conference on Field-Programmable Logic and Applications, 2023

Lasa: Abstraction and Specialization for Productive and Performant Linear Algebra on FPGAs.
Proceedings of the 31st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2023

Memory and Computation Coordinated Mapping of DNNs onto Complex Heterogeneous SoC.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Rubick: A Synthesis Framework for Spatial Architectures via Dataflow Decomposition.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

A Model-Specific End-to-End Design Methodology for Resource-Constrained TinyML Hardware.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

EventFormer: AU Event Transformer for Facial Action Unit Event Detection.
Proceedings of the 34th British Machine Vision Conference 2023, 2023

OpenPARF: An Open-Source Placement and Routing Framework for Large-Scale Heterogeneous FPGAs with Deep Learning Toolkit.
Proceedings of the 15th IEEE International Conference on ASIC, 2023

2022
Optimized separable convolution: Yet another efficient convolution operator.
AI Open, January, 2022

NeoFlow: A Flexible Framework for Enabling Efficient Compilation for High Performance DNN Training.
IEEE Trans. Parallel Distributed Syst., 2022

Critique of "MemXCT: Memory-Centric X-Ray CT Reconstruction With Massive Parallelization" by SCC Team From Peking University.
IEEE Trans. Parallel Distributed Syst., 2022

Morphling: A Reconfigurable Architecture for Tensor Computation.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

FCNNLib: A Flexible Convolution Algorithm Library for Deep Learning on FPGAs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

An Efficient Hardware Design for Accelerating Sparse CNNs With NAS-Based Models.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

EventFormer: AU Event Transformer for Facial Action Unit Event Detection.
CoRR, 2022

Pursuing Knowledge Consistency: Supervised Hierarchical Contrastive Learning for Facial Action Unit Recognition.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Downscaling and Overflow-aware Model Compression for Efficient Vision Processors.
Proceedings of the 42nd IEEE International Conference on Distributed Computing Systems, 2022

HECTOR: A Multi-Level Intermediate Representation for Hardware Synthesis Methodologies.
Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, 2022

Message from the General Chair and Program Co-Chairs.
Proceedings of the International Conference on Field-Programmable Technology, 2022

Preface.
Proceedings of the International Conference on Field-Programmable Technology, 2022

Towards Agile DNN Accelerator Design Using Incremental Synthesis on FPGAs.
Proceedings of the FPGA '22: The 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Virtual Event, USA, 27 February 2022, 2022

On Mitigating Hard Clusters for Face Clustering.
Proceedings of the Computer Vision - ECCV 2022, 2022

EMS: efficient memory subsystem synthesis for spatial accelerators.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

A Mapping Model of SNNs to Neuromorphic Hardware.
Proceedings of the 4th IEEE International Conference on Artificial Intelligence Circuits and Systems, 2022

Causal Intervention for Subject-Deconfounded Facial Action Unit Recognition.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
BISWSRBS: A Winograd-based CNN Accelerator with a Fine-grained Regular Sparsity Pattern and Mixed Precision Quantization.
ACM Trans. Reconfigurable Technol. Syst., 2021

Critique of "Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility" by SCC Team From Peking University.
IEEE Trans. Parallel Distributed Syst., 2021

OMNI: A Framework for Integrating Hardware and Software Optimizations for Sparse CNNs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

Cross-Modal Representation Learning for Lightweight and Accurate Facial Action Unit Detection.
IEEE Robotics Autom. Lett., 2021

Preface.
J. Comput. Sci. Technol., 2021

CaFGraph: Context-aware Facial Multi-graph Representation for Facial Action Unit Recognition.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Sanger: A Co-Design Framework for Enabling Sparse Attention using Reconfigurable Architecture.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Analyzing the Design Space of Spatial Tensor Accelerators on FPGAs.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2021

HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

TENET: A Framework for Modeling Tensor Dataflow Based on Relation-centric Notation.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

Characterizing and Demystifying the Implicit Convolution Algorithm on Commercial Matrix-Multiplication Accelerators.
Proceedings of the IEEE International Symposium on Workload Characterization, 2021

AUPro: Multi-label Facial Action Unit Proposal Generation for Sequence-Level Analysis.
Proceedings of the Neural Information Processing - 28th International Conference, 2021

TensorLib: A Spatial Accelerator Generation Framework for Tensor Algebra.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

2020
Fork Path: Batching ORAM Requests to Remove Redundant Memory Accesses.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Performance Modeling and Directives Optimization for High-Level Synthesis on FPGA.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Enabling Efficient Fast Convolution Algorithms on GPUs via MegaKernels.
IEEE Trans. Computers, 2020

Generating Systolic Array Accelerators With Reusable Blocks.
IEEE Micro, 2020

Fune: An FPGA Tuning Framework for CNN Acceleration.
IEEE Des. Test, 2020

Systolic Computing on GPUs for Productive Performance.
CoRR, 2020

SuSy: A Programming Model for Productive Construction of High-Performance Systolic Arrays on FPGAs.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020

A Winograd-Based CNN Accelerator with a Fine-Grained Regular Sparsity Pattern.
Proceedings of the 30th International Conference on Field-Programmable Logic and Applications, 2020

FCNNLib: An Efficient and Flexible Convolution Algorithm Library on FPGAs.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019
CuLDA_CGS: solving large-scale LDA problems on GPUs.
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

A coordinated tiling and batching framework for efficient GEMM on GPUs.
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

Frequency Improvement of Systolic Array-Based CNNs on FPGAs.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2019

A Survey on 5G Network Slicing Enabling the Smart Grid.
Proceedings of the 25th IEEE International Conference on Parallel and Distributed Systems, 2019

Zac: Towards Automatic Optimization and Deployment of Quantized Deep Neural Networks on Embedded Devices.
Proceedings of the International Conference on Computer-Aided Design, 2019

CuLDA: Solving Large-scale LDA Problems on GPUs.
Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing, 2019

Poly: Efficient Heterogeneous System and Application Management for Interactive Applications.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

Overcoming Data Transfer Bottlenecks in DNN Accelerators via Layer-Conscious Memory Managment.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

Speedy: An Accelerator for Sparse Convolutional Neural Networks on FPGAs.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

LexicalAT: Lexical-Based Adversarial Reinforcement Training for Robust Sentiment Classification.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Overcoming Data Transfer Bottlenecks in FPGA-based DNN Accelerators via Layer Conscious Memory Management.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

E-LSTM: Efficient Inference of Sparse LSTM on Embedded Heterogeneous System.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

SPART: Optimizing CNNs by Utilizing Both Sparsity of Weights and Feature Maps.
Proceedings of the Advanced Parallel Processing Technologies, 2019

2018
Exploiting Sparsity to Accelerate Fully Connected Layers of CNN-Based Applications on Mobile SoCs.
ACM Trans. Embed. Comput. Syst., 2018

Optimizing Cache Bypassing and Warp Scheduling for GPUs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

CRAT: Enabling Coordinated Register Allocation and Thread-Level Parallelism Optimization for GPUs.
IEEE Trans. Computers, 2018

FlexCL: A Model of Performance and Power for OpenCL Workloads on FPGAs.
IEEE Trans. Computers, 2018

Student Cluster Competition 2017, Team Peking University: Reproducing vectorization of the Tersoff multi-body potential on the Intel Broadwell architecture.
Parallel Comput., 2018

A Unified Framework of DNN Weight Pruning and Weight Clustering/Quantization Using ADMM.
CoRR, 2018

cuMBIR: An Efficient Framework for Low-dose X-ray CT Image Reconstruction on GPUs.
Proceedings of the 32nd International Conference on Supercomputing, 2018

Efficient Recurrent Neural Networks using Structured Matrices in FPGAs.
Proceedings of the 6th International Conference on Learning Representations, 2018

TGPA: tile-grained pipeline architecture for low latency CNN inference.
Proceedings of the International Conference on Computer-Aided Design, 2018

C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs.
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

CAMAS: Static and Dynamic Hybrid Cache Management for CPU-FPGA Platforms.
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

SpWA: an efficient sparse winograd convolutional neural networks accelerator on FPGAs.
Proceedings of the 55th Annual Design Automation Conference, 2018

Analytical Two-Level Near Threshold Cache Exploration for Low Power Biomedical Applications.
Proceedings of the Advanced Computer Architecture - 12th Conference, 2018

2017
Efficient Kernel Management on GPUs.
ACM Trans. Embed. Comput. Syst., 2017

Scale-Free Sparse Matrix-Vector Multiplication on Many-Core Architectures.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2017

ParConnect reproducibility report.
Parallel Comput., 2017

Enabling high performance deep learning networks on embedded systems.
Proceedings of the IECON 2017 - 43rd Annual Conference of the IEEE Industrial Electronics Society, Beijing, China, October 29, 2017

COMBA: A comprehensive model-based analysis framework for high level synthesis of real applications.
Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design, 2017

A hybrid approach to cache management in heterogeneous CPU-FPGA platforms.
Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design, 2017

Exploring cache bypassing and partitioning for multi-tasking on GPUs.
Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design, 2017

CuMF_SGD: Parallelized Stochastic Gradient Descent for Matrix Factorization on GPUs.
Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, 2017

A Framework for Iterative Stencil Algorithm Synthesis on FPGAs from OpenCL Programming Model (Abstract Only).
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

Design Space exploration of FPGA-based accelerators with multi-level parallelism.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017

Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs.
Proceedings of the 54th Annual Design Automation Conference, 2017

Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs.
Proceedings of the 54th Annual Design Automation Conference, 2017

FlexCL: An Analytical Performance Model for OpenCL Workloads on Flexible FPGAs.
Proceedings of the 54th Annual Design Automation Conference, 2017

A Comprehensive Framework for Synthesizing Stencil Algorithms on FPGAs using OpenCL Model.
Proceedings of the 54th Annual Design Automation Conference, 2017

Programming FPGAs Using OpenCL from Performance Model to Application Study.
Proceedings of the first Workshop on Emerging Technologies for software-defined and reconfigurable hardware-accelerated Cloud Datacenters, 2017

Throughput optimization for streaming applications on CPU-FPGA heterogeneous systems.
Proceedings of the 22nd Asia and South Pacific Design Automation Conference, 2017

2016
FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow.
IEEE Trans. Very Large Scale Integr. Syst., 2016

An Accurate GPU Performance Model for Effective Control Flow Divergence Optimization.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2016

FCUDA-HB: Hierarchical and Scalable Bus Architecture Generation on FPGAs With the FCUDA Flow.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2016

Performance-Centric Optimization for Racetrack Memory Based Register File on GPUs.
J. Comput. Sci. Technol., 2016

CuMF_SGD: Fast and Scalable Matrix Factorization.
CoRR, 2016

Lin-analyzer: a high-level performance analysis tool for FPGA-based accelerators.
Proceedings of the 53rd Annual Design Automation Conference, 2016

Performance-centric register file design for GPUs using racetrack memory.
Proceedings of the 21st Asia and South Pacific Design Automation Conference, 2016

2015
MrPhi: An Optimized MapReduce Framework on Intel Xeon Phi Coprocessors.
IEEE Trans. Parallel Distributed Syst., 2015

Efficient GPU Spatial-Temporal Multitasking.
IEEE Trans. Parallel Distributed Syst., 2015

An Efficient Compiler Framework for Cache Bypassing on GPUs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2015

Instruction Cache Locking Using Temporal Reuse Profile.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2015

Fork path: improving efficiency of ORAM by removing redundant memory accesses.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

Enabling coordinated register allocation and thread-level parallelism optimization for GPUs.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

Hi-fi playback: tolerating position errors in shift operations of racetrack memory.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Coordinated static and dynamic cache bypassing for GPUs.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Optimizing and auto-tuning scale-free sparse matrix-vector multiplication on Intel Xeon Phi.
Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2015

Quantitative performance and power analysis of LTE using high level synthesis.
Proceedings of the 2015 IEEE 11th International Conference on ASIC, 2015

2014
Rapid design space exploration of two-level unified caches.
Proceedings of the IEEE International Symposium on Circuits and Systemss, 2014

GPU Accelerated Counterexample Generation in LTL Model Checking.
Proceedings of the Formal Methods and Software Engineering, 2014

Design space exploration of multiple loops on FPGAs using high level synthesis.
Proceedings of the 32nd IEEE International Conference on Computer Design, 2014

Integrated CUDA-to-FPGA Synthesis with Network-on-Chip.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

WCET-Centric dynamic instruction cache locking.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

Run-Time Technique for Simultaneous Aging and Power Optimization in GPGPUs.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

2013
An analytical approach for fast and accurate design space exploration of instruction caches.
ACM Trans. Embed. Comput. Syst., 2013

Improving high level synthesis optimization opportunity through polyhedral transformations.
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

Throughput-oriented kernel porting onto FPGAs.
Proceedings of the 50th Annual Design Automation Conference 2013, 2013

Integrated instruction cache analysis and locking in multitasking real-time systems.
Proceedings of the 50th Annual Design Automation Conference 2013, 2013

Optimizing the MapReduce framework on Intel Xeon Phi coprocessor.
Proceedings of the 2013 IEEE International Conference on Big Data (IEEE BigData 2013), 2013

Register and thread structure optimization for GPUs.
Proceedings of the 18th Asia and South Pacific Design Automation Conference, 2013

High-level synthesis of multiple dependent CUDA kernels on FPGA.
Proceedings of the 18th Asia and South Pacific Design Automation Conference, 2013

Shared cache aware task mapping for WCRT minimization.
Proceedings of the 18th Asia and South Pacific Design Automation Conference, 2013

2012
Timing analysis of concurrent programs running on shared cache multi-cores.
Real Time Syst., 2012

High-Level Synthesis: Productivity, Performance, and Software Constraints.
J. Electr. Comput. Eng., 2012

An Accurate GPU Performance Model for Effective Control Flow Divergence Optimization.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Real-time implementation and performance optimization of 3D sound localization on GPUs.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

WCET-centric partial instruction cache locking.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

2011
High level synthesis of stereo matching: Productivity, performance, and software constraints.
Proceedings of the 2011 International Conference on Field-Programmable Technology, 2011

Multilevel Granularity Parallelism Synthesis on FPGAs.
Proceedings of the IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines, 2011

A study of high-level synthesis: Promises and challenges.
Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

2010
An Efficient Algorithm to Estimate Real-time Traffic Information based on Multiple Data Sources.
Proceedings of the ICAART 2010 - Proceedings of the International Conference on Agents and Artificial Intelligence, Volume 1, 2010

Efficient custom instructions generation for system-level design.
Proceedings of the International Conference on Field-Programmable Technology, 2010

Instruction cache locking using temporal reuse profile.
Proceedings of the 47th Design Automation Conference, 2010

Improved procedure placement for set associative caches.
Proceedings of the 2010 International Conference on Compilers, 2010

2009
Cache-aware optimization of BAN applications.
Des. Autom. Embed. Syst., 2009

Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores.
Proceedings of the 30th IEEE Real-Time Systems Symposium, 2009

2008
Cache modeling in probabilistic execution time analysis.
Proceedings of the 45th Design Automation Conference, 2008

Static analysis for fast and accurate design space exploration of caches.
Proceedings of the 6th International Conference on Hardware/Software Codesign and System Synthesis, 2008


  Loading...