Deming Chen

Orcid: 0000-0002-3016-0270

According to our database1, Deming Chen authored at least 282 papers between 2003 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
AutoAI2C: An Automated Hardware Generator for DNN Acceleration on Both FPGA and ASIC.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., October, 2024

CHARM 2.0: Composing Heterogeneous Accelerators for Deep Learning on Versal ACAP Architecture.
ACM Trans. Reconfigurable Technol. Syst., September, 2024

PandoGen: Generating complete instances of future SARS-CoV-2 sequences using Deep Learning.
PLoS Comput. Biol., January, 2024

New Solutions on LLM Acceleration, Optimization, and Application.
CoRR, 2024

Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context.
CoRR, 2024

SnapKV: LLM Knows What You are Looking for Before Generation.
CoRR, 2024

On the Surprising Efficacy of Distillation as an Alternative to Pre-Training Small Models.
CoRR, 2024

RAW 2024 Monday Keynote.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

FedCore: Straggler-Free Federated Learning with Distributed Coresets.
Proceedings of the IEEE International Conference on Communications, 2024

Subgraph Extraction-Based Feedback-Guided Iterative Scheduling for HLS.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

Invited: New Solutions on LLM Acceleration, Optimization, and Application.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

HIDA: A Hierarchical Dataflow Compiler for High-Level Synthesis.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

HomeSGN: A Smarter Home with Novel Rule Mining Enabled by a Scorer-Generator GAN.
Proceedings of the 29th Asia and South Pacific Design Automation Conference, 2024

Invited Paper: Software/Hardware Co-design for LLM and Its Application for Design Verification.
Proceedings of the 29th Asia and South Pacific Design Automation Conference, 2024

OS4C: An Open-Source SR-IOV System for SmartNIC-Based Cloud Platforms.
Proceedings of the 17th IEEE International Conference on Cloud Computing, 2024

S2TAR: Shared Secure Trusted Accelerators with Reconfiguration for Machine Learning in the Cloud.
Proceedings of the 17th IEEE International Conference on Cloud Computing, 2024

UniNet: Accelerating the Container Network Data Plane in IaaS Clouds.
Proceedings of the 17th IEEE International Conference on Cloud Computing, 2024

2023
AutoScaleDSE: A Scalable Design Space Exploration Engine for High-Level Synthesis.
ACM Trans. Reconfigurable Technol. Syst., September, 2023

Cybersecurity for Modern Smart Grid Against Emerging Threats.
Found. Trends Priv. Secur., 2023

RackBlox: A Software-Defined Rack-Scale Storage System with Network-Storage Co-Design.
Proceedings of the 29th Symposium on Operating Systems Principles, 2023

High-level Synthesis for Domain Specific Computing.
Proceedings of the 2023 International Symposium on Physical Design, 2023

Nimblock: Scheduling for Fine-grained FPGA Sharing through Virtualization.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

What Makes Convolutional Models Great on Long Sequence Modeling?
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Extensible and Efficient Proxy for Neural Architecture Search.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

SSDe: FPGA-Based SSD Express Emulation Framework.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

FSSD: FPGA-Based Emulator for SSDs.
Proceedings of the 33rd International Conference on Field-Programmable Logic and Applications, 2023

CHARM: Composing Heterogeneous AcceleRators for Matrix Multiply on Versal ACAP Architecture.
Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2023

Nimblock: Scheduling for Fine-grained FPGA Sharing through Virtualization.
Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2023

AccShield: a New Trusted Execution Environment with Machine-Learning Accelerators.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Lightning Talk: The Next Wave of High-level Synthesis.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

2022
ThunderGP: Resource-Efficient Graph Processing Framework on FPGAs with HLS.
ACM Trans. Reconfigurable Technol. Syst., 2022

Note from the TRETS EiC about the new Journal-first track in FPT'21.
ACM Trans. Reconfigurable Technol. Syst., 2022

Algorithm/Accelerator Co-Design and Co-Search for Edge AI.
IEEE Trans. Circuits Syst. II Express Briefs, 2022

Exploring HW/SW Co-Design for Video Analysis on CPU-FPGA Heterogeneous Systems.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

ReAAP: A Reconfigurable and Algorithm-Oriented Array Processor With Compiler-Architecture Co-Design.
IEEE Trans. Computers, 2022

DML: Dynamic Partial Reconfiguration With Scalable Task Scheduling for Multi-Applications on FPGAs.
IEEE Trans. Computers, 2022

Extensible Proxy for Efficient NAS.
CoRR, 2022

HiKonv: Maximizing the Throughput of Quantized Convolution With Novel Bit-wise Management and Computation.
CoRR, 2022

Efficient Machine Learning, Compilers, and Optimizations for Embedded Systems.
CoRR, 2022

Physics Community Needs, Tools, and Resources for Machine Learning.
CoRR, 2022

AutoDistill: an End-to-End Framework to Explore and Distill Hardware-Efficient Language Models.
CoRR, 2022

YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

BoFL: bayesian optimized local training pace control for energy efficient federated learning.
Proceedings of the Middleware '22: 23rd International Middleware Conference, Quebec, QC, Canada, November 7, 2022

YouHome System and Dataset: Making Your Home Know You Better.
Proceedings of the IEEE International Symposium on Smart Electronic Systems, 2022

EDAML 2022 Invited Speaker 2: AI Algorithm and Accelerator Co-design for Computing on the Edge.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Qilin: Enabling Performance Analysis and Optimization of Shared-Virtual Memory Systems with FPGA Accelerators.
Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, 2022

ScaleHLS: A New Scalable High-Level Synthesis Framework on Multi-Level Intermediate Representation.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

FSLAM: an Efficient and Accurate SLAM Accelerator on SoC FPGAs.
Proceedings of the International Conference on Field-Programmable Technology, 2022

ScaleHLS: a scalable high-level synthesis framework with multi-level transformations and optimizations: invited.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

HiKonv: High Throughput Quantized Convolution With Novel Bit-wise Management and Computation.
Proceedings of the 27th Asia and South Pacific Design Automation Conference, 2022

High-Level Synthesis for Minimizing Power Side-Channel Information Leakage.
Behavioral Synthesis for Hardware Security, 2022

2021
Learning-Based Simultaneous Detection and Characterization of Time Delay Attack in Cyber-Physical Systems.
IEEE Trans. Smart Grid, 2021

Efficient Methods for Mapping Neural Machine Translator on FPGAs.
IEEE Trans. Parallel Distributed Syst., 2021

PyLog: An Algorithm-Centric Python-Based FPGA Programming and Synthesis Flow.
IEEE Trans. Computers, 2021

VecQ: Minimal Loss DNN Model Compression With Vectorized Weight Quantization.
IEEE Trans. Computers, 2021

Improving the Generalization Ability of Deep Neural Networks for Cross-Domain Visual Recognition.
IEEE Trans. Cogn. Dev. Syst., 2021

Compressing Large-Scale Transformer-Based Models: A Case Study on BERT.
Trans. Assoc. Comput. Linguistics, 2021

Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture.
Proc. VLDB Endow., 2021

Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Codesign.
IEEE Des. Test, 2021

Guest Editors' Introduction: Machine Intelligence at the Edge.
IEEE Des. Test, 2021

EH-DNAS: End-to-End Hardware-aware Differentiable Neural Architecture Search.
CoRR, 2021

ScaleHLS: Scalable High-Level Synthesis through MLIR.
CoRR, 2021

Being-ahead: Benchmarking and Exploring Accelerators for Hardware-Efficient AI Deployment.
CoRR, 2021

Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Co-design.
CoRR, 2021

PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses.
CoRR, 2021

HELLO: improved neural network architectures and methodologies for small variant calling.
BMC Bioinform., 2021

A Python-based High-Level Programming Flow for CPU-FPGA Heterogeneous Systems : (Invited Paper).
Proceedings of the IEEE/ACM Programming Environments for Heterogeneous Computing, 2021

Generic Neural Architecture Search via Regression.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

AccGuard: Secure and Trusted Computation on Remote FPGA Accelerators.
Proceedings of the IEEE International Symposium on Smart Electronic Systems, 2021

Chimera: A Hybrid Machine Learning-Driven Multi-Objective Design Space Exploration Tool for FPGA High-Level Synthesis.
Proceedings of the Intelligent Data Engineering and Automated Learning - IDEAL 2021, 2021

Improved GPU Implementations of the Pair-HMM Forward Algorithm for DNA Sequence Alignment.
Proceedings of the 39th IEEE International Conference on Computer Design, 2021

FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

ThunderGP: HLS-based Graph Processing Framework on FPGAs.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

Scaling Up Hardware Accelerator Verification using A-QED with Functional Decomposition.
Proceedings of the Formal Methods in Computer Aided Design, 2021

Extending HLS with High-Level Descriptive Language for Configurable Algorithm-Level Spatial Structure Design.
Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021

F-CAD: A Framework to Explore Hardware Accelerators for Codec Avatar Decoding.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

Skew-Oblivious Data Routing for Data Intensive Applications on FPGAs with HLS.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

Accelerate Non-unit Stride Convolutions with Winograd Algorithms.
Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021

Mixed Precision Quantization for ReRAM-based DNN Inference Accelerators.
Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021

WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs.
Proceedings of the 32nd IEEE International Conference on Application-specific Systems, 2021

TwinDNN: A Tale of Two Deep Neural Networks.
Proceedings of the 32nd IEEE International Conference on Application-specific Systems, 2021

Graviton: A Reconfigurable Memory-Compute Fabric for Data Intensive Applications.
Proceedings of the Applied Reconfigurable Computing. Architectures, Tools, and Applications, 2021

Software/Hardware Co-design for Multi-modal Multi-task Learning in Autonomous Systems.
Proceedings of the 3rd IEEE International Conference on Artificial Intelligence Circuits and Systems, 2021

2020
Leveraging Dynamic Partial Reconfiguration with Scalable ILP Based Task Scheduling.
Proceedings of the 33rd International Conference on VLSI Design and 19th International Conference on Embedded Systems, 2020

SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems.
Proceedings of the Third Conference on Machine Learning and Systems, 2020

FReaC Cache: Folded-logic Reconfigurable Computing in the Last Level Cache.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

HaoCL: Harnessing Large-scale Heterogeneous Processors Made Easy.
Proceedings of the 40th IEEE International Conference on Distributed Computing Systems, 2020

DNNExplorer: A Framework for Modeling and Exploring a Novel Paradigm of FPGA-based DNN Accelerator.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020

Effective Algorithm-Accelerator Co-design for AI Solutions on Edge Devices.
Proceedings of the GLSVLSI '20: Great Lakes Symposium on VLSI 2020, 2020

AutoDNNchip: An Automated DNN Chip Predictor and Builder for Both FPGAs and ASICs.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

HybridDNN: A Framework for High-Performance Hybrid DNN Accelerator Design and Implementation.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

A-QED Verification of Hardware Accelerators.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Is FPGA Useful for Hash Joins?
Proceedings of the 10th Conference on Innovative Data Systems Research, 2020

Thanos: High-Performance CPU-GPU Based Balanced Graph Partitioning Using Cross-Decomposition.
Proceedings of the 25th Asia and South Pacific Design Automation Conference, 2020

2019
Editorial: A Message from the New Editor-in-Chief.
ACM Trans. Reconfigurable Technol. Syst., 2019

A Hardware-Efficient Block Matching Algorithm and Its Hardware Design for Variable Block Size Motion Estimation in Ultra-High-Definition Video Encoding.
ACM Trans. Design Autom. Electr. Syst., 2019

Hybrid Quick Error Detection: Validation and Debug of SoCs Through High-Level Synthesis.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

Cost-Effective Error Detection Through Mersenne Modulo Shadow Datapaths.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

ASAP: Accelerated Short-Read Alignment on Programmable Hardware.
IEEE Trans. Computers, 2019

SkyNet: A Champion Model for DAC-SDC on Low Power Object Detection.
CoRR, 2019

A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices.
CoRR, 2019

Analysis and Modeling of Collaborative Execution Strategies for Heterogeneous CPU-FPGA Architectures.
Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering, 2019

A Hybrid GPU + FPGA System Design for Autonomous Driving Cars.
Proceedings of the 2019 IEEE International Workshop on Signal Processing Systems, 2019

Automated Communication and Floorplan-Aware Hardware/Software Co-Design for SoC.
Proceedings of the 2019 IEEE Computer Society Annual Symposium on VLSI, 2019

Near-Memory and In-Storage FPGA Acceleration for Emerging Cognitive Computing Workloads.
Proceedings of the 2019 IEEE Computer Society Annual Symposium on VLSI, 2019

T-DLA: An Open-source Deep Learning Accelerator for Ternarized DNN Models on Embedded FPGA.
Proceedings of the 2019 IEEE Computer Society Annual Symposium on VLSI, 2019

Accelerating distributed reinforcement learning with in-switch computing.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

µL2Q: An Ultra-Low Loss Quantization Method for DNN Compression.
Proceedings of the International Joint Conference on Neural Networks, 2019

NAIS: Neural Architecture and Implementation Search and its Applications in Autonomous Driving.
Proceedings of the International Conference on Computer-Aided Design, 2019

When CTC Training Meets Acoustic Landmarks.
Proceedings of the IEEE International Conference on Acoustics, 2019

Accelerating Sparse Deep Neural Networks on FPGAs.
Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019

Analysis and Optimization of I/O Cache Coherency Strategies for SoC-FPGA Device.
Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019

On-The-Fly Parallel Data Shuffling for Graph Processing on OpenCL-Based FPGAs.
Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019

Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

FPGAs in Supercomputers: Opportunity or Folly?
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Cross-Layer Resilience: Challenges, Insights, and the Road Ahead.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Implementing neural machine translation with bi-directional GRU and attention mechanism on FPGAs using HLS.
Proceedings of the 24th Asia and South Pacific Design Automation Conference, 2019

A recurrent Markov state-space generative model for sequences.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018
Introduction to the Special Section on Deep Learning in FPGAs.
ACM Trans. Reconfigurable Technol. Syst., 2018

C-Mine: Data Mining of Logic Common Cases for Improved Timing Error Resilience with Energy Efficiency.
ACM Trans. Design Autom. Electr. Syst., 2018

Compact Modeling to Device- and Circuit-Level Evaluation of Flexible TMD Field-Effect Transistors.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

Application-Transparent Near-Memory Processing Architecture with Memory Channel Network.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Improved ASR for Under-resourced Languages through Multi-task Learning with Acoustic Landmarks.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

DNNBuilder: an automated tool for building high-performance DNN hardware accelerators for FPGAs.
Proceedings of the International Conference on Computer-Aided Design, 2018

Triangle Counting and Truss Decomposition using FPGA.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018

Face Recognition with Hybrid Efficient Convolution Algorithms on FPGAs.
Proceedings of the 2018 on Great Lakes Symposium on VLSI, 2018

Design Flow of Accelerating Hybrid Extremely Low Bit-Width Neural Network in Embedded FPGA.
Proceedings of the 28th International Conference on Field Programmable Logic and Applications, 2018

AccDNN: An IP-Based DNN Generator for FPGAs.
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

The Sixth Visual Object Tracking VOT2018 Challenge Results.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

Resource and data optimization for hardware implementation of deep neural networks targeting FPGA-based edge devices.
Proceedings of the 20th System Level Interconnect Prediction Workshop, 2018

CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Deep Learning for Better Variant Calling for Cancer Diagnosis and Treatment.
Proceedings of the 23rd Asia and South Pacific Design Automation Conference, 2018

Low-cost hardware architectures for mersenne modulo functional units.
Proceedings of the 23rd Asia and South Pacific Design Automation Conference, 2018

GPU Acceleration of Advanced k-mer Counting for Computational Genomics.
Proceedings of the 29th IEEE International Conference on Application-specific Systems, 2018

2017
Heterogeneous Computing Meets Near-Memory Acceleration and High-Level Synthesis in the Post-Moore Era.
IEEE Micro, 2017

New advances of high-level synthesis for efficient and reliable hardware design.
Integr., 2017

Acoustic Landmarks Contain More Information About the Phone String than Other Frames.
CoRR, 2017

Collaborative Computing for Heterogeneous Integrated Systems.
Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, 2017

Using Approximated Auditory Roughness as a Pre-Filtering Feature for Human Screaming and Affective Speech AED.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Rebooting the Data Access Hierarchy of Computing Systems.
Proceedings of the IEEE International Conference on Rebooting Computing, 2017

Cross-Layer Resilience in Low-Voltage Digital Systems: Key Insights.
Proceedings of the 2017 IEEE International Conference on Computer Design, 2017

Machine learning on FPGAs to face the IoT revolution.
Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design, 2017

High-performance video content recognition with long-term recurrent convolutional network for FPGA.
Proceedings of the 27th International Conference on Field Programmable Logic and Applications, 2017

Hardware Acceleration of the Pair-HMM Algorithm for DNA Variant Calling.
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

ASAP: Accelerated Short Read Alignment on Programmable Hardware (Abstract Only).
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

Efficient GPGPU Computing with Cross-Core Resource Sharing and Core Reconfiguration.
Proceedings of the 25th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2017

Accurate High-level Modeling and Automated Hardware/Software Co-design for Effective SoC Design Space Exploration.
Proceedings of the 54th Annual Design Automation Conference, 2017

ASP-DAC 2017 keynote speech I: In memory of Edward J. McCluskey: The next wave of pioneering innovations.
Proceedings of the 22nd Asia and South Pacific Design Automation Conference, 2017

High-Level Synthesis for side-channel defense.
Proceedings of the 28th IEEE International Conference on Application-specific Systems, 2017

2016
Analytical SPICE-Compatible Model of Schottky-Barrier-Type GNRFETs With Performance Analysis.
IEEE Trans. Very Large Scale Integr. Syst., 2016

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow.
IEEE Trans. Very Large Scale Integr. Syst., 2016

Introduction.
ACM Trans. Reconfigurable Technol. Syst., 2016

An Accurate GPU Performance Model for Effective Control Flow Divergence Optimization.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2016

PolyPUF: Physically Secure Self-Divergence.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2016

FCUDA-HB: Hierarchical and Scalable Bus Architecture Generation on FPGAs With the FCUDA Flow.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2016

Modeling of Gaussian Network-Based Reconfigurable Network-on-Chip Designs.
IEEE Trans. Computers, 2016

Platform choices and design demands for IoT platforms: cost, power, and performance tradeoffs.
IET Cyper-Phys. Syst.: Theory & Appl., 2016

BLESS 2: accurate, memory-efficient and fast error correction method.
Bioinform., 2016

SoC, NoC and Hierarchical Bus Implementations of Applications on FPGAs Using the FCUDA Flow.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2016

Parallel code-specific CPU simulation with dynamic phase convergence modeling for HW/SW co-design.
Proceedings of the 35th International Conference on Computer-Aided Design, 2016

Automated Verification Code Generation in HLS Using Software Execution Traces (Abstract Only).
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

FCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDA-to-FPGA Compiler.
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

High Level Synthesis of Complex Applications: An H.264 Video Decoder.
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

AutoSLIDE: Automatic Source-Level Instrumentation and Debugging for HLS.
Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2016

Acceleration of the Pair-HMM Algorithm for DNA Variant Calling.
Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2016

Real-time system-level implementation of a telepresence robot using an embedded GPU platform.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

Information dispersion for trojan defense through high-level synthesis.
Proceedings of the 53rd Annual Design Automation Conference, 2016

Debugging and verifying SoC designs through effective cross-layer hardware-software co-simulation.
Proceedings of the 53rd Annual Design Automation Conference, 2016

Designing high-quality hardware on a development effort budget: A study of the current state of high-level synthesis.
Proceedings of the 21st Asia and South Pacific Design Automation Conference, 2016

Flexible transition metal dichalcogenide field-effect transistors: A circuit-level simulation study of delay and power under bending, process variation, and scaling.
Proceedings of the 21st Asia and South Pacific Design Automation Conference, 2016

2015
Efficient GPU Spatial-Temporal Multitasking.
IEEE Trans. Parallel Distributed Syst., 2015

An Efficient Compiler Framework for Cache Bypassing on GPUs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2015

High-level Synthesis for Low-power Design.
IPSJ Trans. Syst. LSI Des. Methodol., 2015

CSL: Coordinated and scalable logic synthesis techniques for effective NBTI reduction.
Proceedings of the 33rd IEEE International Conference on Computer Design, 2015

A Polyhedral-based SystemC Modeling and Generation Framework for Effective Low-power Design Space Exploration.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2015

DA Vision 2015: From Here to Eternity.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2015

JIT trace-based verification for high-level synthesis.
Proceedings of the 2015 International Conference on Field Programmable Technology, 2015

Behavioral-level IP integration in high-level synthesis.
Proceedings of the 2015 International Conference on Field Programmable Technology, 2015

A scalable and high-density FPGA architecture with multi-level phase change memory.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

FPGA accelerated DNA error correction.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

A SPICE model of flexible transition metal dichalcogenide field-effect transistors.
Proceedings of the 52nd Annual Design Automation Conference, 2015

High-level synthesis of error detecting cores through low-cost modulo-3 shadow datapaths.
Proceedings of the 52nd Annual Design Automation Conference, 2015

Hybrid quick error detection (H-QED): accelerator validation and debug using high-level synthesis principles.
Proceedings of the 52nd Annual Design Automation Conference, 2015

System-level design solutions: Enabling the IoT explosion.
Proceedings of the 2015 IEEE 11th International Conference on ASIC, 2015

2014
High-Level Synthesis With Behavioral-Level Multicycle Path Analysis.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2014

Hybrid circuit-switched network for on-chip communication in large-scale chip-multiprocessors.
J. Parallel Distributed Comput., 2014

BLESS: Bloom filter-based error correction solution for high-throughput sequencing reads.
Bioinform., 2014

Analysis of System Reliability for Cache Coherence Scheme in Multi-processor.
Proceedings of the IEEE Eighth International Conference on Software Security and Reliability, 2014

A hardware architecture to deploy complex multiprocessor scheduling algorithms.
Proceedings of the 2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications, 2014

New solutions for system-level and high-level synthesis (Invited paper).
Proceedings of the 2014 International Symposium on Integrated Circuits (ISIC), 2014

Fast and effective placement and routing directed high-level synthesis for FPGAs.
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

Transformations for throughput optimization in high-level synthesis (abstract only).
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

Integrated CUDA-to-FPGA Synthesis with Network-on-Chip.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

Highly accurate SPICE-compatible modeling for single- and double-gate GNRFETs with studies on technology scaling.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

C-Mine: Data Mining of Logic Common Cases for Low Power Synthesis of Better-Than-Worst-Case Designs.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

ClusRed: Clustering and Network Reduction Based Probabilistic Optimal Power Flow Analysis for Large-Scale Smart Grids.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

System-of-PUFs: Multilevel security for embedded systems.
Proceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis, 2014

Fast large-scale optimal power flow analysis for smart grid through network reduction.
Proceedings of the 19th Asia and South Pacific Design Automation Conference, 2014

CNPUF: A Carbon Nanotube-based Physically Unclonable Function for secure low-energy hardware design.
Proceedings of the 19th Asia and South Pacific Design Automation Conference, 2014

2013
A routing algorithm for graphene nanoribbon circuit.
ACM Trans. Design Autom. Electr. Syst., 2013

Efficient compilation of CUDA kernels for high-performance computing on FPGAs.
ACM Trans. Embed. Comput. Syst., 2013

Optimizations in GPU: Smart compilers and core-level reconfiguration.
Proceedings of the ACM/IEEE International Workshop on System Level Interconnect Prediction, 2013

Schottky-barrier-type Graphene Nano-Ribbon Field-Effect Transistors: A study on compact modeling, process variation, and circuit performance.
Proceedings of the IEEE/ACM International Symposium on Nanoscale Architectures, 2013

Graphene nano-ribbon field-effect transistors as future low-power devices.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

High-level synthesis with behavioral level multi-cycle path analysis.
Proceedings of the 23rd International Conference on Field programmable Logic and Applications, 2013

Improving high level synthesis optimization opportunity through polyhedral transformations.
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

A SPICE-compatible model of graphene nano-ribbon field-effect transistors enabling circuit-level delay and power analysis under process variation.
Proceedings of the Design, Automation and Test in Europe, 2013

Throughput-oriented kernel porting onto FPGAs.
Proceedings of the 50th Annual Design Automation Conference 2013, 2013

Improving polyhedral code generation for high-level synthesis.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2013

Register and thread structure optimization for GPUs.
Proceedings of the 18th Asia and South Pacific Design Automation Conference, 2013

High-level synthesis of multiple dependent CUDA kernels on FPGA.
Proceedings of the 18th Asia and South Pacific Design Automation Conference, 2013

2012
Analysis of Digital Circuit Dynamic Behavior With Timed Ternary Decision Diagrams for Better-Than-Worst-Case Design.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2012

High-Level Synthesis: Productivity, Performance, and Software Constraints.
J. Electr. Comput. Eng., 2012

ESL Design Methodology.
J. Electr. Comput. Eng., 2012

A Coarse-Grained Reconfigurable Architecture with Compilation for High Performance.
Int. J. Reconfigurable Comput., 2012

TIGER: tiled iterative genome assembler.
BMC Bioinform., 2012

Improving broadcast efficiency in wireless sensor network time synchronization protocols.
Proceedings of the International Workshop on System Level Interconnect Prediction, 2012

CCP: common case promotion for improved timing error resilience with energy efficiency.
Proceedings of the International Symposium on Low Power Electronics and Design, 2012

An Accurate GPU Performance Model for Effective Control Flow Divergence Optimization.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Real-time implementation and performance optimization of 3D sound localization on GPUs.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

2011
Architecture and performance evaluation of 3D CMOS-NEM FPGA.
Proceedings of the 2011 International Workshop on System Level Interconnect Prediction, 2011

Temperature aware statistical static timing analysis.
Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011

High level synthesis of stereo matching: Productivity, performance, and software constraints.
Proceedings of the 2011 International Conference on Field-Programmable Technology, 2011

Multilevel Granularity Parallelism Synthesis on FPGAs.
Proceedings of the IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines, 2011

Routing with graphene nanoribbons.
Proceedings of the 16th Asia South Pacific Design Automation Conference, 2011

SETmap: A soft error tolerant mapping algorithm for FPGA designs with low power.
Proceedings of the 16th Asia South Pacific Design Automation Conference, 2011

A study of high-level synthesis: Promises and challenges.
Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

2010
LOPASS: A Low-Power Architectural Synthesis System for FPGAs With Interconnect Estimation and Optimization.
IEEE Trans. Very Large Scale Integr. Syst., 2010

Variation-Aware Placement With Multi-Cycle Statistical Timing Analysis for FPGAs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2010

A Routing Approach to Reduce Glitches in Low Power FPGAs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2010

Technology Mapping and Clustering for FPGA Architectures With Dual Supply Voltages.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2010

A Workload-Adaptive and Reconfigurable Bus Architecture for Multicore Processors.
Int. J. Reconfigurable Comput., 2010

BDD-based circuit restructuring for reducing dynamic power.
Proceedings of the 28th International Conference on Computer Design, 2010

Analysis of circuit dynamic behavior with timed ternary decision diagram.
Proceedings of the 2010 International Conference on Computer-Aided Design, 2010

Variation-aware layout-driven scheduling for performance yield optimization.
Proceedings of the 2010 International Conference on Computer-Aided Design, 2010

Variation-aware placement for FPGAs with multi-cycle statistical timing analysis.
Proceedings of the ACM/SIGDA 18th International Symposium on Field Programmable Gate Arrays, 2010

Clock tree synthesis under aggressive buffer insertion.
Proceedings of the 47th Design Automation Conference, 2010

Dynamic power estimation for deep submicron circuits with process variation.
Proceedings of the 15th Asia South Pacific Design Automation Conference, 2010

2009
Design Automation for Microelectronics.
Proceedings of the Springer Handbook of Automation, 2009

A Fast Digital Predistortion Algorithm for Radio-Frequency Power Amplifier Linearization With Loop Delay Compensation.
IEEE J. Sel. Top. Signal Process., 2009

An Optimal Resource Binding Algorithm with Inter-Transition Switching Activities for Low Power.
J. Low Power Electron., 2009

Design and Evaluation of a Carbon Nanotube-Based Programmable Architecture.
Int. J. Parallel Program., 2009

FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs.
Proceedings of the IEEE 7th Symposium on Application Specific Processors, 2009

Workload adaptive shared memory multicore processors with reconfigurable interconnects.
Proceedings of the IEEE 7th Symposium on Application Specific Processors, 2009

Variation Aware Routing for Three-Dimensional FPGAs.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2009

High-performance CUDA kernel execution on FPGAs.
Proceedings of the 23rd international conference on Supercomputing, 2009

A novel SoC architecture on FPGA for ultra fast face detection.
Proceedings of the 27th International Conference on Computer Design, 2009

DynaTune: Circuit-level optimization for timing speculation considering dynamic path behavior.
Proceedings of the 2009 International Conference on Computer-Aided Design, 2009

Blueshift: Designing processors for timing speculation from the ground up.
Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

FPCNA: a field programmable carbon nanotube array.
Proceedings of the ACM/SIGDA 17th International Symposium on Field Programmable Gate Arrays, 2009

CMOS vs Nano: comrades or rivals?
Proceedings of the ACM/SIGDA 17th International Symposium on Field Programmable Gate Arrays, 2009

Reconfigurable circuit design with nanomaterials.
Proceedings of the Design, Automation and Test in Europe, 2009

FPGA-targeted high-level binding algorithm for power and area reduction with glitch-estimation.
Proceedings of the 46th Design Automation Conference, 2009

FastYield: variation-aware, layout-driven simultaneous binding and module selection for performance yield optimization.
Proceedings of the 14th Asia South Pacific Design Automation Conference, 2009

2008
A fast simultaneous input vector generation and gate replacement algorithm for leakage power reduction.
ACM Trans. Design Autom. Electr. Syst., 2008

DDBDD: Delay-Driven BDD Synthesis for FPGAs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2008

Application Acceleration with the Explicitly Parallel Operations System - the EPOS Processor.
Proceedings of the IEEE Symposium on Application Specific Processors, 2008

Efficient ASIP design for configurable processors with fine-grained resource sharing.
Proceedings of the ACM/SIGDA 16th International Symposium on Field Programmable Gate Arrays, 2008

VEBoC: Variation and error-aware design for billions of devices on a chip.
Proceedings of the 13th Asia South Pacific Design Automation Conference, 2008

2007
3-D nFPGA: A Reconfigurable Architecture for 3-D CMOS/Nanomaterial Hybrid Digital Circuits.
IEEE Trans. Circuits Syst. I Regul. Pap., 2007

Performance and power evaluation of a 3D CMOS/nanomaterial reconfigurable architecture.
Proceedings of the 2007 International Conference on Computer-Aided Design, 2007

Timing constraint-driven technology mapping for FPGAs considering false paths and multi-clock domains.
Proceedings of the 2007 International Conference on Computer-Aided Design, 2007

GlitchMap: An FPGA Technology Mapper for Low Power Considering Glitches.
Proceedings of the 44th Design Automation Conference, 2007

High-Level Power Estimation and Low-Power Design Space Exploration for FPGAs.
Proceedings of the 12th Conference on Asia South Pacific Design Automation, 2007

2006
Optimal simultaneous module and multivoltage assignment for low power.
ACM Trans. Design Autom. Electr. Syst., 2006

FPGA Design Automation: A Survey.
Found. Trends Electron. Des. Autom., 2006

Optimal simultaneous mapping and clustering for FPGA delay optimization.
Proceedings of the 43rd Design Automation Conference, 2006

A fast simultaneous input vector generation and gate replacement algorithm for leakage power reduction.
Proceedings of the 43rd Design Automation Conference, 2006

Optimality study of resource binding with multi-Vdds.
Proceedings of the 43rd Design Automation Conference, 2006

2005
Power modeling and characteristics of field programmable gate arrays.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2005

Research of double loop optical fiber self-cicatrized network based on Neuron3150.
Proceedings of the 2005 International Symposium on Autonomous Decentralized Systems, 2005

Optimal module and voltage assignment for low-power.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

2004
Delay optimal low-power circuit clustering for FPGAs with dual supply voltages.
Proceedings of the 2004 International Symposium on Low Power Electronics and Design, 2004

DAOmap: a depth-optimal area optimization mapping algorithm for FPGA designs.
Proceedings of the 2004 International Conference on Computer-Aided Design, 2004

Low-power technology mapping for FPGA architectures with dual supply voltages.
Proceedings of the ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays, 2004

Register binding and port assignment for multiplexer optimization.
Proceedings of the 2004 Conference on Asia South Pacific Design Automation: Electronic Design and Solution Fair 2004, 2004

2003
Performance-driven mapping for CPLD architectures.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2003

Low-power high-level synthesis for FPGA architectures.
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

Architecture evaluation for power-efficient FPGAs.
Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2003


  Loading...