Jason Cong

Orcid: 0000-0003-2887-6963

Affiliations:
  • University of California, Los Angeles, USA


According to our database1, Jason Cong authored at least 574 papers between 1988 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
CHARM 2.0: Composing Heterogeneous Accelerators for Deep Learning on Versal ACAP Architecture.
ACM Trans. Reconfigurable Technol. Syst., September, 2024

PASTA: Programming and Automation Support for Scalable Task-Parallel HLS Programs on Modern Multi-Die FPGAs.
ACM Trans. Reconfigurable Technol. Syst., September, 2024

Rubick: A Unified Infrastructure for Analyzing, Exploring, and Implementing Spatial Architectures via Dataflow Decomposition.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., April, 2024

Compiling Quantum Circuits for Dynamically Field-Programmable Neutral Atoms Array Processors.
Quantum, March, 2024

Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference.
CoRR, 2024

Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review.
CoRR, 2024

Efficient Task Transfer for HLS DSE.
CoRR, 2024

Multi-Token Joint Speculative Decoding for Accelerating Large Language Model Inference.
CoRR, 2024

ML-QLS: Multilevel Quantum Layout Synthesis.
CoRR, 2024

Compilation for Dynamically Field-Programmable Qubit Arrays with Efficient and Provably Near-Optimal Scheduling.
CoRR, 2024

HMT: Hierarchical Memory Transformer for Long Context Language Processing.
CoRR, 2024

Enhancing High-Level Synthesis with Automated Pragma Insertion and Code Transformation Framework.
CoRR, 2024

Quantum State Preparation Using an Exact CNOT Synthesis Formulation.
CoRR, 2024

Cross-Modality Program Representation Learning for Electronic Design Automation with High-Level Synthesis.
Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD, 2024

Learning to Compare Hardware Designs for High-Level Synthesis.
Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD, 2024

Scheduling and Physical Design.
Proceedings of the 2024 International Symposium on Physical Design, 2024

Atomique: A Quantum Compiler for Reconfigurable Neutral Atom Arrays.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Automatic Hardware Pragma Insertion in High-Level Synthesis: A Non-Linear Programming Approach.
Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2024

LevelST: Stream-based Accelerator for Sparse Triangular Solver.
Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2024

Quantum State Preparation Using an Exact CNOT Synthesis Formulation.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

Depth-Optimal Addressing of 2D Qubit Array with 1D Controls Based on Exact Binary Matrix Factorization.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

SpectraFlux: Harnessing the Flow of Multi-FPGA in Mass Spectrometry Clustering.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

Q-Pilot: Field Programmable Qubit Array Compilation with Flying Ancillas.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

TAPA-CS: Enabling Scalable Accelerator Design on Distributed HBM-FPGAs.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
RapidStream 2.0: Automated Parallel Implementation of Latency-Insensitive FPGA Designs Through Partial Reconfiguration.
ACM Trans. Reconfigurable Technol. Syst., December, 2023

TAPA: A Scalable Task-parallel Dataflow Programming Framework for Modern FPGAs with Co-optimization of HLS and Physical Design.
ACM Trans. Reconfigurable Technol. Syst., December, 2023

TARO: Automatic Optimization for Free-Running Kernels in FPGA High-Level Synthesis.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., July, 2023

FPGA Acceleration of Probabilistic Sentential Decision Diagrams with High-level Synthesis.
ACM Trans. Reconfigurable Technol. Syst., June, 2023

FlexCNN: An End-to-end Framework for Composing CNN Accelerators on FPGA.
ACM Trans. Reconfigurable Technol. Syst., June, 2023

Micro/Nano Circuits and Systems Design and Design Automation: Challenges and Opportunities.
Proc. IEEE, June, 2023

FPGA-Based In-Vivo Calcium Image Decoding for Closed-Loop Feedback Applications.
IEEE Trans. Biomed. Circuits Syst., April, 2023

TopSort: A High-Performance Two-Phase Sorting Accelerator Optimized on HBM-Based FPGAs.
IEEE Trans. Emerg. Top. Comput., 2023

Q-Pilot: Field Programmable Quantum Array Compilation with Flying Ancillas.
CoRR, 2023

FPQA-C: A Compilation Framework for Field Programmable Qubit Array.
CoRR, 2023

A Survey on Graph Neural Network Acceleration: Algorithms, Systems, and Customized Hardware.
CoRR, 2023

Locality and Utilization in Placement Suboptimality.
CoRR, 2023

Democratizing Domain-Specific Computing.
Commun. ACM, 2023

Towards a Comprehensive Benchmark for High-Level Synthesis Targeted to FPGAs.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Robust GNN-Based Representation Learning for HLS.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

NeSSA: Near-Storage Data Selection for Accelerated Machine Learning Training.
Proceedings of the 15th ACM/USENIX Workshop on Hot Topics in Storage and File Systems, 2023

CHARM: Composing Heterogeneous AcceleRators for Matrix Multiply on Versal ACAP Architecture.
Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2023

Callipepla: Stream Centric Instruction Set and Mixed Precision for Accelerating Conjugate Gradient Solver.
Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2023

HMLib: Efficient Data Transfer for HLS Using Host Memory.
Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2023

PASTA: Programming and Automation Support for Scalable Task-Parallel HLS Programs on Modern Multi-Die FPGAs.
Proceedings of the 31st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2023

Rubick: A Synthesis Framework for Spatial Architectures via Dataflow Decomposition.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Scalable Optimal Layout Synthesis for NISQ Quantum Processors.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Lightning Talk: Scaling Up Quantum Compilation - Challenges and Opportunities.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

A Comprehensive Automated Exploration Framework for Systolic Array Designs.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks.
Proceedings of the ACM Turing Award Celebration Conference - China 2023, 2023

2022
FPGA HLS Today: Successes, Challenges, and Opportunities.
ACM Trans. Reconfigurable Technol. Syst., 2022

AutoDSE: Enabling Software Programmers to Design Efficient FPGA Accelerators.
ACM Trans. Design Autom. Electr. Syst., 2022

Energy-Efficient LSTM Inference Accelerator for Real-Time Causal Prediction.
ACM Trans. Design Autom. Electr. Syst., 2022

Domain-Specific Quantum Architecture Optimization.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2022

OverGen: Improving FPGA Usability through Domain-specific Overlay Generation.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Efficient Kernels for Real-Time Position Decoding from In Vivo Calcium Images.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2022

Qubit Mapping for Reconfigurable Atom Arrays.
Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, 2022

PYXIS: An Open-Source Performance Dataset Of Sparse Accelerators.
Proceedings of the IEEE International Conference on Acoustics, 2022

Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication.
Proceedings of the FPGA '22: The 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Virtual Event, USA, 27 February 2022, 2022

SPA-GCN: Efficient and Flexible GCN Accelerator with Application for Graph Similarity Computation.
Proceedings of the FPGA '22: The 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Virtual Event, USA, 27 February 2022, 2022

RapidStream: Parallel Physical Implementation of FPGA HLS Designs.
Proceedings of the FPGA '22: The 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Virtual Event, USA, 27 February 2022, 2022

Accelerating SSSP for Power-Law Graphs.
Proceedings of the FPGA '22: The 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Virtual Event, USA, 27 February 2022, 2022

A Versatile Systolic Array for Transposed and Dilated Convolution on FPGA.
Proceedings of the 30th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2022

Serpens: a high bandwidth memory based accelerator for general-purpose sparse matrix-vector multiplication.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Automated accelerator optimization aided by graph neural networks.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Improving GNN-based accelerator design automation with meta learning.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

StreamGCN: Accelerating Graph Convolutional Networks with Streaming Processing.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2022

N-DISE: NDN-based data distribution for large-scale data-intensive science.
Proceedings of the 9th ACM Conference on Information-Centric Networking, 2022

2021
Optimality Study of Existing Quantum Computing Layout Synthesis Tools.
IEEE Trans. Computers, 2021

Search for Optimal Systolic Arrays: A Comprehensive Automated Exploration Framework and Lessons Learned.
CoRR, 2021

Enabling Automated FPGA Accelerator Optimization Using Graph Neural Networks.
CoRR, 2021

SPA-GCN: Efficient and Flexible GCN Accelerator with an Application for Graph Similarity Computation.
CoRR, 2021

Optimal Qubit Mapping with Simultaneous Gate Absorption.
CoRR, 2021

TENET: A Framework for Modeling Tensor Dataflow Based on Relation-centric Notation.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

From Parallelization to Customization - Challenges and Opportunities.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Optimal Qubit Mapping with Simultaneous Gate Absorption ICCAD Special Session Paper.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

AutoDSE: Enabling Software Programmers Design Efficient FPGA Accelerators.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

AutoBridge: Coupling Coarse-Grained Floorplanning and Pipelining for High-Frequency HLS Design on Multi-Die FPGAs.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

HBM Connect: High-Performance HLS Interconnect for FPGA HBM.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

Extending High-Level Synthesis for Task-Parallel Programs.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

AutoSA: A Polyhedral Compiler for High-Performance Systolic Arrays on FPGA.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

MOCHA: Multinode Cost Optimization in Heterogeneous Clouds with Accelerators.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

Scaling Up Hardware Accelerator Verification using A-QED with Functional Decomposition.
Proceedings of the Formal Methods in Computer Aided Design, 2021

FANS: FPGA-Accelerated Near-Storage Sorting.
Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021

Extending High-Level Synthesis for Task-Parallel Programs.
Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021

Live Demonstration: Real-Time Calcium Trace Extraction from Large-Field-of-View Miniscope.
Proceedings of the IEEE Biomedical Circuits and Systems Conference, BioCAS 2021, 2021

Fast Calcium Trace Extraction for Large-Field-of-View Miniscope.
Proceedings of the IEEE Biomedical Circuits and Systems Conference, BioCAS 2021, 2021

2020
SACNN: Self-Attention Convolutional Neural Network for Low-Dose CT Denoising With Self-Supervised Perceptual Loss Network.
IEEE Trans. Medical Imaging, 2020

FLASH: Fast, Parallel, and Accurate Simulator for HLS.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Crane: Mitigating Accelerator Under-utilization Caused by Sparsity Irregularities in CNNs.
IEEE Trans. Computers, 2020

2019 DAC Roundtable.
IEEE Des. Test, 2020

When HLS Meets FPGA HBM: Benchmarking and Bandwidth Optimization.
CoRR, 2020

BLINK: bit-sparse LSTM inference kernel enabling efficient calcium trace extraction for neurofeedback devices.
Proceedings of the ISLPED '20: ACM/IEEE International Symposium on Low Power Electronics and Design, 2020

Bonsai: High-Performance Adaptive Merge Tree Sorting.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

HeteroRefactor: refactoring for heterogeneous computing with FPGA.
Proceedings of the ICSE '20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June, 2020

Optimal Layout Synthesis for Quantum Computing.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020

SuSy: A Programming Model for Productive Construction of High-Performance Systolic Arrays on FPGAs.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020

End-to-End Optimization of Deep Learning Applications.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

HeteroHalide: From Image Processing DSL to Efficient FPGA Acceleration.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

CANSEE: Customized Accelerator for Neural Signal Enhancement and Extraction from the Calcium Image in Real Time.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

Algorithm-Hardware Co-design for BQSR Acceleration in Genome Analysis ToolKit.
Proceedings of the 28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2020

A-QED Verification of Hardware Accelerators.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Analysis and Optimization of the Implicit Broadcasts in FPGA HLS to Improve Maximum Frequency.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Exploiting Computation Reuse for Stencil Accelerators.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

2019
In-Depth Analysis on Microarchitectures of Modern Heterogeneous CPU-FPGA Platforms.
ACM Trans. Reconfigurable Technol. Syst., 2019

Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

RC-NVM: Dual-Addressing Non-Volatile Memory Architecture Supporting Both Row and Column Memory Accesses.
IEEE Trans. Computers, 2019

Customizable Computing - From Single Chip to Datacenters.
Proc. IEEE, 2019

A Millimeter-Wave CMOS Transceiver With Digitally Pre-Distorted PAM-4 Modulation for Contactless Communications.
IEEE J. Solid State Circuits, 2019

INSIDER: Designing In-Storage Computing System for Emerging High-Performance Drive.
Proceedings of the 2019 USENIX Annual Technical Conference, 2019

Frequency Improvement of Systolic Array-Based CNNs on FPGAs.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2019

Analyzing and Modeling In-Storage Computing Workloads On EISC - An FPGA-Based System-Level Emulation Platform.
Proceedings of the International Conference on Computer-Aided Design, 2019

Overcoming Data Transfer Bottlenecks in DNN Accelerators via Layer-Conscious Memory Managment.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

Dataflow Systolic Array Implementations of Matrix Decomposition Using High Level Synthesis.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Computing.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

Rapid Cycle-Accurate Simulator for High-Level Synthesis.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

LANMC: LSTM-Assisted Non-Rigid Motion Correction on FPGA for Calcium Image Stabilization.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

An FPGA-Based BWT Accelerator for Bzip2 Data Compression.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

Hardware Acceleration of Long Read Pairwise Overlapping in Genome Sequencing: A Race Between FPGA and GPU.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

Overcoming Data Transfer Bottlenecks in FPGA-based DNN Accelerators via Layer Conscious Memory Management.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Understanding Performance Gains of Accelerator-Rich Architectures.
Proceedings of the 30th IEEE International Conference on Application-specific Systems, 2019

2018
Smartphone-Based Indoor Map Construction - Principles and Applications
Springer Briefs in Computer Science, Springer, ISBN: 978-981-10-8377-8, 2018

CPU-FPGA Coscheduling for Big Data Applications.
IEEE Des. Test, 2018

AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture.
CoRR, 2018

Best-Effort FPGA Programming: A Few Steps Can Go a Long Way.
CoRR, 2018

Computed Tomography Image Enhancement Using 3D Convolutional Neural Network.
Proceedings of the Deep Learning in Medical Image Analysis - and - Multimodal Learning for Clinical Decision Support, 2018

A 20Gb/s 79.5mW 127GHz CMOS transceiver with digitally pre-distorted PAM-4 modulation for contactless communications.
Proceedings of the 2018 IEEE International Solid-State Circuits Conference, 2018

Doppio: I/O-Aware Performance Analysis, Modeling and Optimization for In-memory Computing Framework.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2018

CLINK: Compact LSTM Inference Kernel for Energy Efficient Neurofeedback Devices.
Proceedings of the International Symposium on Low Power Electronics and Design, 2018

TGPA: tile-grained pipeline architecture for low latency CNN inference.
Proceedings of the International Conference on Computer-Aided Design, 2018

PolySA: polyhedral-based systolic array auto-compilation.
Proceedings of the International Conference on Computer-Aided Design, 2018

HLS-based optimization and design space exploration for applications with variable loop bounds.
Proceedings of the International Conference on Computer-Aided Design, 2018

SODA: stencil with optimized dataflow architecture.
Proceedings of the International Conference on Computer-Aided Design, 2018

RC-NVM: Enabling Symmetric Row and Column Memory Accesses for In-memory Databases.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

From JVM to FPGA: Bridging Abstraction Hierarchy via Optimized Deep Pipelining.
Proceedings of the 10th USENIX Workshop on Hot Topics in Cloud Computing, 2018

SMEM++: A Pipelined and Time-Multiplexed SMEM Seeding Accelerator for Genome Sequencing.
Proceedings of the 28th International Conference on Field Programmable Logic and Applications, 2018

High-Throughput Lossless Compression on Tightly Coupled CPU-FPGA Platforms: (Abstract Only).
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

Understanding Performance Differences of FPGAs and GPUs: (Abtract Only).
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

K-Flow: A Programming and Scheduling Framework to Optimize Dataflow Execution on CPU-FPGA Platforms: (Abstract Only).
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

An Optimal Microarchitecture for Stencil Computation with Data Reuse and Fine-Grained Parallelism: (Abstract Only).
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

FPGA-based LSTM Acceleration for Real-Time EEG Signal Processing: (Abstract Only).
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

ST-Accel: A High-Level Programming Platform for Streaming Applications on FPGA.
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

High-Throughput Lossless Compression on Tightly Coupled CPU-FPGA Platforms.
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

Latte: Locality Aware Transformation for High-Level Synthesis.
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

Automatic Interior I/O Elimination in Systolic Array Architecture.
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

SMEM++: A Pipelined and Time-Multiplexed SMEM Seeding Accelerator for DNA Sequencing.
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

Understanding Performance Differences of FPGAs and GPUs.
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

S2FA: an accelerator automation framework for heterogeneous computing in datacenters.
Proceedings of the 55th Annual Design Automation Conference, 2018

Automated accelerator generation and optimization with composable, parallel and pipeline architecture.
Proceedings of the 55th Annual Design Automation Conference, 2018

Large-Scale Global Placement.
Proceedings of the Handbook of Approximation Algorithms and Metaheuristics, 2018

2017
AIM: accelerating computational genomics through scalable and noninvasive accelerator-interposed memory.
Proceedings of the International Symposium on Memory Systems, 2017

Communication Optimization on GPU: A Case Study of Sequence Alignment Algorithms.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Characterization and acceleration for genomic sequencing and analysis.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

HLScope+, : Fast and accurate performance estimation for FPGA HLS.
Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design, 2017

A cache-based bandwidth optimized motion compensation architecture for video decoder.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Supporting Address Translation for Accelerator-Centric Architectures.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

CPU-FPGA Co-Optimization for Big Data Applications: A Case Study of In-Memory Samtool Sorting (Abstract Only).
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates.
Proceedings of the 25th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2017

HLScope: High-Level Performance Debugging for FPGA Designs.
Proceedings of the 25th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2017

Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs.
Proceedings of the 54th Annual Design Automation Conference, 2017

Bandwidth Optimization Through On-Chip Memory Restructuring for HLS.
Proceedings of the 54th Annual Design Automation Conference, 2017

Throughput optimization for streaming applications on CPU-FPGA heterogeneous systems.
Proceedings of the 22nd Asia and South Pacific Design Automation Conference, 2017

FPGA-based accelerator for long short-term memory recurrent neural networks.
Proceedings of the 22nd Asia and South Pacific Design Automation Conference, 2017

Using Data Compression for Optimizing FPGA-Based Convolutional Neural Network Accelerators.
Proceedings of the Advanced Parallel Processing Technologies, 2017

2016
FPGA Technology Mapping.
Encyclopedia of Algorithms, 2016

An Optimal Microarchitecture for Stencil Computation Acceleration Based on Nonuniform Partitioning of Data Reuse Buffers.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2016

FCUDA-HB: Hierarchical and Scalable Bus Architecture Generation on FPGAs With the FCUDA Flow.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2016

A Distributed Clustered Architecture to Tackle Delay Variations in Datapath Synthesis.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2016

State of the Journal.
IEEE Trans. Computers, 2016

Acceleration of EM-Based 3D CT Reconstruction Using FPGA.
IEEE Trans. Biomed. Circuits Syst., 2016

Platform choices and design demands for IoT platforms: cost, power, and performance tradeoffs.
IET Cyper-Phys. Syst.: Theory & Appl., 2016

Revisiting FPGA Acceleration of Molecular Dynamics Simulation with Dynamic Data Flow Behavior in High-Level Synthesis.
CoRR, 2016

ARAPrototyper: Enabling Rapid Prototyping and Evaluation for Accelerator-Rich Architectures.
CoRR, 2016

Scaling Up Physical Design: Challenges and Opportunities.
Proceedings of the 2016 on International Symposium on Physical Design, 2016

Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster.
Proceedings of the 2016 International Symposium on Low Power Electronics and Design, 2016

Software Infrastructure for Enabling FPGA-Based Accelerations in Data Centers: Invited Paper.
Proceedings of the 2016 International Symposium on Low Power Electronics and Design, 2016

Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks.
Proceedings of the 35th International Conference on Computer-Aided Design, 2016

A High-throughput Architecture for Lossless Decompression on FPGA Designed Using HLS (Abstract Only).
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

ARAPrototyper: Enabling Rapid Prototyping and Evaluation for Accelerator-Rich Architecture (Abstact Only).
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

Energy Efficiency of Full Pipelining: A Case Study for Matrix Multiplication.
Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2016

When Spark Meets FPGAs: A Case Study for Next-Generation DNA Sequencing Acceleration.
Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2016

The SMEM Seeding Acceleration for DNA Sequence Alignment.
Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2016

Invited - Heterogeneous datacenters: options and opportunities.
Proceedings of the 53rd Annual Design Automation Conference, 2016

A quantitative analysis on microarchitectures of modern CPU-FPGA platforms.
Proceedings of the 53rd Annual Design Automation Conference, 2016

Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale.
Proceedings of the Seventh ACM Symposium on Cloud Computing, 2016

A scalable communication-aware compilation flow for programmable accelerators.
Proceedings of the 21st Asia and South Pacific Design Automation Conference, 2016

Source-to-Source Optimization for HLS.
Proceedings of the FPGAs for Software Programmers, 2016

2015
Customizable Computing
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01748-3, 2015

An automated lung segmentation approach using bidirectional chain codes to improve nodule detection accuracy.
Comput. Biol. Medicine, 2015

"High-level synthesis and beyond - From datacenters to IoTs".
Proceedings of the 28th IEEE International System-on-Chip Conference, 2015

Atlas: Baidu's key-value storage system for cloud data.
Proceedings of the IEEE 31st Symposium on Mass Storage Systems and Technologies, 2015

ARACompiler: a prototyping flow and evaluation framework for accelerator-rich architectures.
Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015

Interconnect synthesis of heterogeneous accelerators in a shared memory architecture.
Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2015

High efficiency VLSI implementation of an edge-directed video up-scaler using high level synthesis.
Proceedings of the IEEE International Conference on Consumer Electronics, 2015

Impact of Loop Transformations on Software Reliability.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2015

PARADE: A Cycle-Accurate Full-System Simulation Platform for Accelerator-Rich Architectural Design and Exploration.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2015

FPGA Acceleration for Simultaneous Image Reconstruction and Segmentation based on the Mumford-Shah Regularization (Abstract Only).
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks.
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

Customizable and High Performance Matrix Multiplication Kernel on FPGA (Abstract Only).
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

Growing a Healthy FPGA Ecosystem.
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

Resource-Aware Throughput Optimization for High-Level Synthesis.
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

A Novel High-Throughput Acceleration Engine for Read Alignment.
Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2015

CMOST: a system-level FPGA compilation framework.
Proceedings of the 52nd Annual Design Automation Conference, 2015

On-chip interconnection network for accelerator-rich architectures.
Proceedings of the 52nd Annual Design Automation Conference, 2015

InterFS: An Interplanted Distributed File System to Improve Storage Utilization.
Proceedings of the 6th Asia-Pacific Workshop on Systems, 2015

2014
FPGA-RPI: A Novel FPGA Architecture With RRAM-Based Programmable Interconnects.
IEEE Trans. Very Large Scale Integr. Syst., 2014

System Light-Loading Technology for mHealth: Manifold-Learning-Based Medical Data Cleansing and Clinical Trials in WE-CARE Project.
IEEE J. Biomed. Health Informatics, 2014

Architecture Support for Domain-Specific Accelerator-Rich CMPs.
ACM Trans. Embed. Comput. Syst., 2014

GRT: A Reconfigurable SDR Platform with High Performance and Usability.
SIGARCH Comput. Archit. News, 2014

Better-Than-Worst-Case Design: Progress and Opportunities.
J. Comput. Sci. Technol., 2014

From design to design automation.
Proceedings of the International Symposium on Physical Design, 2014

Accelerator-rich architectures: from single-chip to datacenters.
Proceedings of the International Symposium on Low Power Electronics and Design, 2014

Minimizing Computation in Convolutional Neural Networks.
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2014, 2014

Automating customized computing.
Proceedings of the 2014 International Conference on Field-Programmable Technology, 2014

A scalable, high-performance customized priority queue.
Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014

An efficient and flexible host-FPGA PCIe communication library.
Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014

Theory and algorithm for generalized memory partitioning in high-level synthesis.
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

Transformations for throughput optimization in high-level synthesis (abstract only).
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

EPEE: an efficient PCIe communication library with easy-host-integration property for FPGA accelerators (abstract only).
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

Combining computation and communication optimizations in system synthesis for streaming applications.
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

FPGA Acceleration for Simultaneous Medical Image Reconstruction and Segmentation.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

A Fully Pipelined and Dynamically Composable Architecture of CGRA.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

FPGA Implementation of EM Algorithm for 3D CT Reconstruction.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

An efficient design and implementation of LSM-tree based key-value store on open-channel SSD.
Proceedings of the Ninth Eurosys Conference 2014, 2014

An Optimal Microarchitecture for Stencil Computation Acceleration Based on Non-Uniform Partitioning of Data Reuse Buffers.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

Accelerator-Rich Architectures: Opportunities and Progresses.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

2013
The survivability of design-specific spare placement in FPGA architectures with high defect rates.
ACM Trans. Design Autom. Electr. Syst., 2013

Efficient compilation of CUDA kernels for high-performance computing on FPGAs.
ACM Trans. Embed. Comput. Syst., 2013

An Analytical Placement Framework for 3-D ICs and Its Extension on Thermal Awareness.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2013

Stream arbitration: Towards efficient bandwidth utilization for emerging on-chip interconnects.
ACM Trans. Archit. Code Optim., 2013

Composable accelerator-rich microprocessor enhanced for adaptivity and longevity.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Energy-efficient computing using adaptive table lookup based on nonvolatile memories.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Designing scratchpad memory architecture with emerging STT-RAM memory technologies.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

Accelerator-rich CMPs: From concept to real hardware.
Proceedings of the 2013 IEEE 31st International Conference on Computer Design, 2013

Optimization of interconnects between accelerators and shared memories in dark silicon.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2013

FPGA simulation engine for customized construction of neural microcircuits.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2013

Improving high level synthesis optimization opportunity through polyhedral transformations.
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

Automatic multidimensional memory partitioning for FPGA-based accelerators (abstract only).
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

Polyhedral-based data reuse optimization for configurable computing.
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

Defect recovery in nanodevice-based programmable interconnects (abstract only).
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

Efficient system-level mapping from streaming applications to FPGAs (abstract only).
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

Architecture support for custom instructions with memory operations.
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

Are FPGAs suffering from the innovator's dilemna?
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

FPGA Simulation Engine for Customized Construction of Neural Microcircuit.
Proceedings of the 21st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2013

Memory partitioning for multidimensional arrays in high-level synthesis.
Proceedings of the 50th Annual Design Automation Conference 2013, 2013

Throughput-oriented kernel porting onto FPGAs.
Proceedings of the 50th Annual Design Automation Conference 2013, 2013

Defect tolerance in nanodevice-based programmable interconnects: utilization beyond avoidance.
Proceedings of the 50th Annual Design Automation Conference 2013, 2013

Improving polyhedral code generation for high-level synthesis.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2013

Optimizing routability in large-scale mixed-size placement.
Proceedings of the 18th Asia and South Pacific Design Automation Conference, 2013

2012
Utilizing RF-I and intelligent scheduling for better throughput/watt in a mobile GPU memory system.
ACM Trans. Archit. Code Optim., 2012

Task-Level Data Model for Hardware Synthesis Based on Concurrent Collections.
J. Electr. Comput. Eng., 2012

Utilizing Radio-Frequency Interconnect for a Many-DIMM DRAM System.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2012

Analysis of Noncoherent ASK Modulation-Based RF-Interconnect for Memory Interface.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2012

A Hybrid Architecture for Compressive Sensing 3-D CT Reconstruction.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2012

Mapping a data-flow programming model onto heterogeneous platforms.
Proceedings of the SIGPLAN/SIGBED Conference on Languages, 2012

A Study on the Impact of Compiler Optimizations on High-Level Synthesis.
Proceedings of the Languages and Compilers for Parallel Computing, 2012

An 8Gb/s/pin 4pJ/b/pin Single-T-Line dual (base+RF) band simultaneous bidirectional mobile memory I/O interface with inter-channel interference suppression.
Proceedings of the 2012 IEEE International Solid-State Circuits Conference, 2012

Leakage-aware performance-driven TSV-planning based on network flow algorithm in 3D ICs.
Proceedings of the Thirteenth International Symposium on Quality Electronic Design, 2012

Towards layout-friendly high-level synthesis.
Proceedings of the International Symposium on Physical Design, 2012

Transformation from ad hoc EDA to algorithmic EDA.
Proceedings of the International Symposium on Physical Design, 2012

Energy-efficient scheduling on heterogeneous multi-core architectures.
Proceedings of the International Symposium on Low Power Electronics and Design, 2012

BiN: a buffer-in-NUCA scheme for accelerator-rich CMPs.
Proceedings of the International Symposium on Low Power Electronics and Design, 2012

CHARM: a composable heterogeneous accelerator-rich microprocessor.
Proceedings of the International Symposium on Low Power Electronics and Design, 2012

Static and dynamic co-optimizations for blocks mapping in hybrid caches.
Proceedings of the International Symposium on Low Power Electronics and Design, 2012

Memory partitioning and scheduling co-optimization in behavioral synthesis.
Proceedings of the 2012 IEEE/ACM International Conference on Computer-Aided Design, 2012

FPGA-RR: an enhanced FPGA architecture with RRAM-based reconfigurable interconnects (abstract only).
Proceedings of the ACM/SIGDA 20th International Symposium on Field Programmable Gate Arrays, 2012

FPGA-accelerated 3D reconstruction using compressive sensing.
Proceedings of the ACM/SIGDA 20th International Symposium on Field Programmable Gate Arrays, 2012

Combining module selection and replication for throughput-driven streaming programs.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

Dynamically reconfigurable hybrid cache: An energy-efficient last-level cache design.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

Optimizing memory hierarchy allocation with loop transformations for high-level synthesis.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

A metric for layout-friendly microarchitecture optimization in high-level synthesis.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

Architecture support for accelerator-rich CMPs.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

A 60GHz on-chip RF-Interconnect with λ/4 coupler for 5Gbps bi-directional communication and multi-drop arbitration.
Proceedings of the IEEE 2012 Custom Integrated Circuits Conference, 2012

An integrated and automated memory optimization flow for FPGA behavioral synthesis.
Proceedings of the 17th Asia and South Pacific Design Automation Conference, 2012

Compilation and architecture support for customized vector instruction extension.
Proceedings of the 17th Asia and South Pacific Design Automation Conference, 2012

Platform characterization for Domain-Specific Computing.
Proceedings of the 17th Asia and South Pacific Design Automation Conference, 2012

2011
An 8M Polygons/s 3-D Graphics SoC With Full Hardware Geometric and Rendering Engine for Mobile Applications.
IEEE Trans. Very Large Scale Integr. Syst., 2011

Automatic memory partitioning and scheduling for throughput and power optimization.
ACM Trans. Design Autom. Electr. Syst., 2011

High-Level Synthesis for FPGAs: From Prototyping to Deployment.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2011

Pattern-Mining for Behavioral Synthesis.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2011

Overview of Center for Domain-Specific Computing.
J. Comput. Sci. Technol., 2011

Leakage-Aware TSV-Planning with Power-Temperature-Delay Dependence in 3D ICs.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2011

Customizable Domain-Specific Computing.
IEEE Des. Test Comput., 2011

3D recursive Gaussian IIR on GPU and FPGAs - A case for accelerating bandwidth-bounded applications.
Proceedings of the IEEE 9th Symposium on Application Specific Processors, 2011

mrFPGA: A novel FPGA architecture with memristor-based reconfiguration.
Proceedings of the 2011 IEEE/ACM International Symposium on Nanoscale Architectures, 2011

EM+TV Based Reconstruction for Cone-Beam CT with Reduced Radiation.
Proceedings of the Advances in Visual Computing - 7th International Symposium, 2011

An 8.4Gb/s 2.5pJ/b mobile memory I/O interface using simultaneous bidirectional Dual (Base+RF) band signaling.
Proceedings of the IEEE International Solid-State Circuits Conference, 2011

An energy-efficient adaptive hybrid cache.
Proceedings of the 2011 International Symposium on Low Power Electronics and Design, 2011

A unified optimization framework for simultaneous gate sizing and placement under density constraints.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2011), 2011

The DIMM tree architecture: A high bandwidth and scalable memory system.
Proceedings of the IEEE 29th International Conference on Computer Design, 2011

Combined loop transformation and hierarchy allocation for data reuse optimization.
Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011

ATree-based topology synthesis for on-chip network.
Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011

Assuring application-level correctness against soft errors.
Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011

Accelerating Fluid Registration Algorithm on Multi-FPGA Platforms.
Proceedings of the International Conference on Field Programmable Logic and Applications, 2011

Resolving implicit barrier synchronizations in FPGA HLS (abstract only).
Proceedings of the ACM/SIGDA 19th International Symposium on Field Programmable Gate Arrays, 2011

Multilevel Granularity Parallelism Synthesis on FPGAs.
Proceedings of the IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines, 2011

Thermal-aware cell and through-silicon-via co-placement for 3D ICs.
Proceedings of the 48th Design Automation Conference, 2011

A reuse-aware prefetching scheme for scratchpad memory.
Proceedings of the 48th Design Automation Conference, 2011

HC-Sim: a fast and exact l1 cache simulator with scratchpad memory co-simulation support.
Proceedings of the 9th International Conference on Hardware/Software Codesign and System Synthesis, 2011

Rethinking thermal via planning with timing-power-temperature dependence for 3D ICs.
Proceedings of the 16th Asia South Pacific Design Automation Conference, 2011

Accelerating vision and navigation applications on a customizable platform.
Proceedings of the 22nd IEEE International Conference on Application-specific Systems, 2011

Domain-specific processor with 3D integration for medical image processing.
Proceedings of the 22nd IEEE International Conference on Application-specific Systems, 2011

Era of customization and specialization.
Proceedings of the 22nd IEEE International Conference on Application-specific Systems, 2011

RF-Interconnect for Future Network-On-Chip.
Proceedings of the Low Power Networks-on-Chip., 2011

2010
LOPASS: A Low-Power Architectural Synthesis System for FPGAs With Interconnect Estimation and Optimization.
IEEE Trans. Very Large Scale Integr. Syst., 2010

Behavior-Level Observability Analysis for Operation Gating in Low-Power Behavioral Synthesis.
ACM Trans. Design Autom. Electr. Syst., 2010

Evaluating Statistical Power Optimization.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2010

Technology Mapping and Clustering for FPGA Architectures With Dual Supply Voltages.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2010

Advances and Challenges in 3D Physical Design.
IPSJ Trans. Syst. LSI Des. Methodol., 2010

NSF Workshop on EDA: Past, Present, and Future (Part 2).
IEEE Des. Test Comput., 2010

NSF Workshop on EDA: Past, Present, and Future (Part 1).
IEEE Des. Test Comput., 2010

An analytical placer for mixed-size 3D placement.
Proceedings of the 2010 International Symposium on Physical Design, 2010

Bit-level optimization for high-level synthesis and FPGA-based acceleration.
Proceedings of the ACM/SIGDA 18th International Symposium on Field Programmable Gate Arrays, 2010

LUT-based FPGA technology mapping for reliability (abstract only).
Proceedings of the ACM/SIGDA 18th International Symposium on Field Programmable Gate Arrays, 2010

Accelerating Monte Carlo based SSTA using FPGA.
Proceedings of the ACM/SIGDA 18th International Symposium on Field Programmable Gate Arrays, 2010

A Comparative Study on the Architecture Templates for Dynamic Nested Loops.
Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2010

Coordinated resource optimization in behavioral synthesis.
Proceedings of the Design, Automation and Test in Europe, 2010

A generalized control-flow-aware pattern recognition algorithm for behavioral synthesis.
Proceedings of the Design, Automation and Test in Europe, 2010

LUT-based FPGA technology mapping for reliability.
Proceedings of the 47th Design Automation Conference, 2010

ACES: application-specific cycle elimination and splitting for deadlock-free routing on irregular network-on-chip.
Proceedings of the 47th Design Automation Conference, 2010

Logic-on-logic 3D integration and placement.
Proceedings of the IEEE International Conference on 3D System Integration, 2010

2009
Synthesis Algorithm for Application-Specific Homogeneous Processor Networks.
IEEE Trans. Very Large Scale Integr. Syst., 2009

FPGA-Based Hardware Acceleration of Lithographic Aerial Image Simulation.
ACM Trans. Reconfigurable Technol. Syst., 2009

Simultaneous resource binding and interconnection optimization based on a distributed register-file microarchitecture.
ACM Trans. Design Autom. Electr. Syst., 2009

The Last Byte: The HLS tipping point.
IEEE Des. Test Comput., 2009

Multiband RF-interconnect for reconfigurable network-on-chip communications.
Proceedings of the 11th International Workshop on System-Level Interconnect Prediction (SLIP 2009), 2009

FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs.
Proceedings of the IEEE 7th Symposium on Application Specific Processors, 2009

A scalable micro wireless interconnect structure for CMPs.
Proceedings of the 15th Annual International Conference on Mobile Computing and Networking, 2009

Behavior-level observability don't-cares and application to low-power behavioral synthesis.
Proceedings of the 2009 International Symposium on Low Power Electronics and Design, 2009

High-performance CUDA kernel execution on FPGAs.
Proceedings of the 23rd international conference on Supercomputing, 2009

Parallel multi-level analytical global placement on graphics processing units.
Proceedings of the 2009 International Conference on Computer-Aided Design, 2009

Scheduling with soft constraints.
Proceedings of the 2009 International Conference on Computer-Aided Design, 2009

A rigorous framework for convergent net weighting schemes in timing-driven placement.
Proceedings of the 2009 International Conference on Computer-Aided Design, 2009

Customizable domain-specific computing.
Proceedings of the 19th International Conference on Field Programmable Logic and Applications, 2009

Revisiting bitwidth optimizations.
Proceedings of the ACM/SIGDA 17th International Symposium on Field Programmable Gate Arrays, 2009

Synthesis of reconfigurable high-performance multicore systems.
Proceedings of the ACM/SIGDA 17th International Symposium on Field Programmable Gate Arrays, 2009

Evaluation of Static Analysis Techniques for Fixed-Point Precision Optimization.
Proceedings of the FCCM 2009, 2009

Energy efficient multiprocessor task scheduling under input-dependent variation.
Proceedings of the Design, Automation and Test in Europe, 2009

From milliwatts to megawatts: system level power challenge.
Proceedings of the 46th Design Automation Conference, 2009

Moore's Law: another casualty of the financial meltdown?
Proceedings of the 46th Design Automation Conference, 2009

A variation-tolerant scheduler for better than worst-case behavioral synthesis.
Proceedings of the 7th International Conference on Hardware/Software Codesign and System Synthesis, 2009

A multilevel analytical placement for 3D ICs.
Proceedings of the 14th Asia South Pacific Design Automation Conference, 2009

On the futility of statistical power optimization.
Proceedings of the 14th Asia South Pacific Design Automation Conference, 2009

2008
Enhancing Placement with Multilevel Techniques.
Proceedings of the Handbook of Algorithms for Physical Design Automation., 2008

FPGA Technology Mapping.
Proceedings of the Encyclopedia of Algorithms - 2008 Edition, 2008

A Robust Mixed-Size Legalization and Detailed Placement Algorithm.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2008

Highly Efficient Gradient Computation for Density-Constrained Analytical Placement.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2008

Editorial: Special issue on 3D integrated circuits and microarchitectures.
ACM J. Emerg. Technol. Comput. Syst., 2008

Investigating the effects of fine-grain three-dimensional integration on microarchitecture design.
ACM J. Emerg. Technol. Comput. Syst., 2008

A new generation of C-base synthesis tool and domain-specific computing.
Proceedings of the 21st Annual IEEE International SoC Conference, SoCC 2008, 2008

Power reduction of CMP communication networks via RF-interconnects.
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

Robust gate sizing via mean excess delay minimization.
Proceedings of the 2008 International Symposium on Physical Design, 2008

Highly efficient gradient computation for density-constrained analytical placement methods.
Proceedings of the 2008 International Symposium on Physical Design, 2008

RF interconnects for communications on-chip.
Proceedings of the 2008 International Symposium on Physical Design, 2008

MC-Sim: an efficient simulation tool for MPSoC designs.
Proceedings of the 2008 International Conference on Computer-Aided Design, 2008

Fault tolerant placement and defect reconfiguration for nano-FPGAs.
Proceedings of the 2008 International Conference on Computer-Aided Design, 2008

CMP network-on-chip overlaid with multi-band RF-interconnect.
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

Mapping for better than worst-case delays in LUT-based FPGA designs.
Proceedings of the ACM/SIGDA 16th International Symposium on Field Programmable Gate Arrays, 2008

Lithographic aerial image simulation with FPGA-based hardwareacceleration.
Proceedings of the ACM/SIGDA 16th International Symposium on Field Programmable Gate Arrays, 2008

Pattern-based behavior synthesis for FPGA resource reduction.
Proceedings of the ACM/SIGDA 16th International Symposium on Field Programmable Gate Arrays, 2008

Simultaneous FU and Register Binding Based on Network Flow Method.
Proceedings of the Design, Automation and Test in Europe, 2008

LP based white space redistribution for thermal via planning and performance optimization in 3D ICs.
Proceedings of the 13th Asia South Pacific Design Automation Conference, 2008

Scheduling with integer time budgeting for low-power optimization.
Proceedings of the 13th Asia South Pacific Design Automation Conference, 2008

Behavioral synthesis with activating unused flip-flops for reducing glitch power in FPGA.
Proceedings of the 13th Asia South Pacific Design Automation Conference, 2008

2007
Large-Scale Global Placement.
Proceedings of the Handbook of Approximation Algorithms and Metaheuristics., 2007

Accelerating Sequential Applications on CMPs Using Core Spilling.
IEEE Trans. Parallel Distributed Syst., 2007

Routability-Driven Placement and White Space Allocation.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2007

Optimality Study of Logic Synthesis for LUT-Based FPGAs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2007

Fine grain 3D integration for microarchitecture design through cube packing exploration.
Proceedings of the 25th International Conference on Computer Design, 2007

Improved SAT-based Boolean matching using implicants for LUT-based FPGAs.
Proceedings of the ACM/SIGDA 15th International Symposium on Field Programmable Gate Arrays, 2007

Synthesis of an application-specific soft multiprocessor system.
Proceedings of the ACM/SIGDA 15th International Symposium on Field Programmable Gate Arrays, 2007

Micro-architecture Pipelining Optimization with Throughput-Aware Floorplanning.
Proceedings of the 12th Conference on Asia South Pacific Design Automation, 2007

Thermal-Aware 3D IC Placement Via Transformation.
Proceedings of the 12th Conference on Asia South Pacific Design Automation, 2007

High-Level Power Estimation and Low-Power Design Space Exploration for FPGAs.
Proceedings of the 12th Conference on Asia South Pacific Design Automation, 2007

Locality and Utilization in Placement Suboptimality.
Proceedings of the Modern Circuit Placement, Best Practices and Results, 2007

2006
Architecture and Compiler Optimizations for Data Bandwidth Improvement in Configurable Processors.
IEEE Trans. Very Large Scale Integr. Syst., 2006

Optimal simultaneous module and multivoltage assignment for low power.
ACM Trans. Design Autom. Electr. Syst., 2006

Simultaneous placement with clustering and duplication.
ACM Trans. Design Autom. Electr. Syst., 2006

Protecting Combinational Logic Synthesis Solutions.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2006

Fast floorplanning by look-ahead enabled recursive bipartitioning.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2006

FPGA Design Automation: A Survey.
Found. Trends Electron. Des. Autom., 2006

Platform-Based Behavior-Level and System-Level Synthesis.
Proceedings of the 2006 IEEE International SOC Conference, Austin, Texas, USA, 2006

mPL6: enhanced multilevel mixed-size placement.
Proceedings of the 2006 International Symposium on Physical Design, 2006

Platform-based resource binding using a distributed register-file microarchitecture.
Proceedings of the 2006 International Conference on Computer-Aided Design, 2006

Optimal simultaneous mapping and clustering for FPGA delay optimization.
Proceedings of the 43rd Design Automation Conference, 2006

An efficient and versatile scheduling algorithm based on SDC formulation.
Proceedings of the 43rd Design Automation Conference, 2006

Behavior and communication co-optimization for systems with sequential communication media.
Proceedings of the 43rd Design Automation Conference, 2006

Optimality study of resource binding with multi-Vdds.
Proceedings of the 43rd Design Automation Conference, 2006

A robust detailed placement for mixed-size IC designs.
Proceedings of the 2006 Conference on Asia South Pacific Design Automation: ASP-DAC 2006, 2006

An automated design flow for 3D microarchitecture evaluation.
Proceedings of the 2006 Conference on Asia South Pacific Design Automation: ASP-DAC 2006, 2006

2005
Large-scale circuit placement.
ACM Trans. Design Autom. Electr. Syst., 2005

Technology mapping and architecture evalution for <i>k/m</i>-macrocell-based FPGAs.
ACM Trans. Design Autom. Electr. Syst., 2005

Power modeling and characteristics of field programmable gate arrays.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2005

MARS-a multilevel full-chip gridless routing system.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2005

Multilevel generalized force-directed method for circuit placement.
Proceedings of the 2005 International Symposium on Physical Design, 2005

mPL6: a robust multilevel mixed-size placement engine.
Proceedings of the 2005 International Symposium on Physical Design, 2005

Understanding the energy efficiency of SMT and CMP with multiclustering.
Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005

Lower-bound estimation for multi-bitwidth scheduling.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2005), 2005

Thermal via planning for 3-D ICs.
Proceedings of the 2005 International Conference on Computer-Aided Design, 2005

Robust mixed-size placement under tight white-space constraints.
Proceedings of the 2005 International Conference on Computer-Aided Design, 2005

Architecture and compilation for data bandwidth improvement in configurable embedded processors.
Proceedings of the 2005 International Conference on Computer-Aided Design, 2005

Instruction set extension with shadow registers for configurable processors.
Proceedings of the ACM/SIGDA 13th International Symposium on Field Programmable Gate Arrays, 2005

Simultaneous timing-driven placement and duplication.
Proceedings of the ACM/SIGDA 13th International Symposium on Field Programmable Gate Arrays, 2005

Microarchitecture evaluation with floorplanning and interconnect pipelining.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

Thermal-driven multilevel routing for 3-D ICs.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

Are we ready for system-level synthesis?
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

Bitwidth-aware scheduling and binding in high-level synthesis.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

Optimal module and voltage assignment for low-power.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

2004
Retiming-based timing analysis with an application to mincut-based global placement.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2004

Edge separability-based circuit clustering with application to multilevel circuit partitioning.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2004

Architecture and synthesis for on-chip multicycle communication.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2004

Optimality and scalability study of existing placement algorithms.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2004

An area-optimality study of floorplanning.
Proceedings of the 2004 International Symposium on Physical Design, 2004

Delay optimal low-power circuit clustering for FPGAs with dual supply voltages.
Proceedings of the 2004 International Symposium on Low Power Electronics and Design, 2004

A thermal-driven floorplanning algorithm for 3D ICs.
Proceedings of the 2004 International Conference on Computer-Aided Design, 2004

DAOmap: a depth-optimal area optimization mapping algorithm for FPGA designs.
Proceedings of the 2004 International Conference on Computer-Aided Design, 2004

Simultaneous Timing Driven Clustering and Placement for FPGAs.
Proceedings of the Field Programmable Logic and Application, 2004

Low-power FPGA using pre-defined dual-Vdd/dual-Vt fabrics.
Proceedings of the ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays, 2004

Application-specific instruction generation for configurable processor architectures.
Proceedings of the ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays, 2004

Low-power technology mapping for FPGA architectures with dual supply voltages.
Proceedings of the ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays, 2004

What happened to ASIC?: Go (recon)figure?
Proceedings of the 41th Design Automation Conference, 2004

Architecture-level synthesis for automatic interconnect pipelining.
Proceedings of the 41th Design Automation Conference, 2004

Register binding and port assignment for multiplexer optimization.
Proceedings of the 2004 Conference on Asia South Pacific Design Automation: Electronic Design and Solution Fair 2004, 2004

2003
Performance-driven mapping for CPLD architectures.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2003

Multilevel global placement with congestion control.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2003

Optimality, scalability and stability study of partitioning and placement algorithms.
Proceedings of the 2003 International Symposium on Physical Design, 2003

Architecture and synthesis for multi-cycle communication.
Proceedings of the 2003 International Symposium on Physical Design, 2003

Low-power high-level synthesis for FPGA architectures.
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

Gradual Relaxation Techniques with Applications to Behavioral Synthesis.
Proceedings of the 2003 International Conference on Computer-Aided Design, 2003

Optimality and Stability Study of Timing-Driven Placement Algorithms.
Proceedings of the 2003 International Conference on Computer-Aided Design, 2003

Large-Scale Circuit Placement: Gap and Promise.
Proceedings of the 2003 International Conference on Computer-Aided Design, 2003

Architectural Synthesis Integrated with Global Placement for Multi-Cycle Communication.
Proceedings of the 2003 International Conference on Computer-Aided Design, 2003

An Enhanced Multilevel Algorithm for Circuit Placement.
Proceedings of the 2003 International Conference on Computer-Aided Design, 2003

Architecture evaluation for power-efficient FPGAs.
Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2003

Multilevel global placement with retiming.
Proceedings of the 40th Design Automation Conference, 2003

Microarchitecture evaluation with physical planning.
Proceedings of the 40th Design Automation Conference, 2003

Architecture and synthesis for multi-cycle on-chip communication.
Proceedings of the 1st IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2003

Multi-level placement for large-scale mixed-size IC designs.
Proceedings of the 2003 Asia and South Pacific Design Automation Conference, 2003

Optimality and scalability study of existing placement algorithms.
Proceedings of the 2003 Asia and South Pacific Design Automation Conference, 2003

2002
An interconnect energy model considering coupling effects.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2002

Wire width planning for interconnect performance optimization.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2002

Enhanced SPFD Rewiring on Improving Rewiring Ability.
Proceedings of the 11th IEEE/ACM International Workshop on Logic & Synthesis, 2002

Global clustering-based performance-driven circuit partitioning.
Proceedings of 2002 International Symposium on Physical Design, 2002

Timing closure based on physical hierarchy.
Proceedings of 2002 International Symposium on Physical Design, 2002

Physical hierarchy generation with routing congestion control.
Proceedings of 2002 International Symposium on Physical Design, 2002

An enhanced multilevel routing system.
Proceedings of the 2002 IEEE/ACM International Conference on Computer-aided Design, 2002

A new enhanced SPFD rewiring algorithm.
Proceedings of the 2002 IEEE/ACM International Conference on Computer-aided Design, 2002

SPFD-based global rewiring.
Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2002

2001
Buffer block planning for interconnect planning and prediction.
IEEE Trans. Very Large Scale Integr. Syst., 2001

Interconnect performance estimation models for design planning.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2001

Interconnect layout optimization under higher order RLC model forMCM designs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2001

Interconnect sizing and spacing with consideration of couplingcapacitance.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2001

Boolean matching for LUT-based logic blocks with applications toarchitecture evaluation and technology mapping.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2001

DUNE-a multilayer gridless routing system.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2001

Pseudopin assignment with crosstalk noise control.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2001

An interconnect-centric design flow for nanometer technologies.
Proc. IEEE, 2001

Multilevel Approach to Full-Chip Gridless Routing.
Proceedings of the 2001 IEEE/ACM International Conference on Computer-Aided Design, 2001

Simultaneous logic decomposition with technology mapping in FPGA designs.
Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2001

Performance-Driven Multi-Level Clustering with Application to Hierarchical FPGA Mapping.
Proceedings of the 38th Design Automation Conference, 2001

Improved crosstalk modeling for noise constrained interconnect optimization.
Proceedings of ASP-DAC 2001, 2001

2000
Structural gate decomposition for depth-optimal technology mapping in LUT-based FPGA designs.
ACM Trans. Design Autom. Electr. Syst., 2000

Performance-driven technology mapping for heterogeneous FPGAs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2000

Via design rule consideration in multilayer maze routing algorithms.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2000

Incremental physical design.
Proceedings of the 2000 International Symposium on Physical Design, 2000

DUNE: a multi-layer gridless routing system with wire planning.
Proceedings of the 2000 International Symposium on Physical Design, 2000

Pseudo pin assignment with crosstalk noise control.
Proceedings of the 2000 International Symposium on Physical Design, 2000

Incremental CAD.
Proceedings of the 2000 IEEE/ACM International Conference on Computer-Aided Design, 2000

Physical Planning with Retiming.
Proceedings of the 2000 IEEE/ACM International Conference on Computer-Aided Design, 2000

Multilevel Optimization for Large-Scale Circuit Placement.
Proceedings of the 2000 IEEE/ACM International Conference on Computer-Aided Design, 2000

Synthesis for FPGAs with embedded memory blocks.
Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2000

Technology mapping for k/m-macrocell based FPGAs.
Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2000

Routing tree construction under fixed buffer locations.
Proceedings of the 37th Conference on Design Automation, 2000

Performance driven multi-level and multiway partitioning with retiming.
Proceedings of the 37th Conference on Design Automation, 2000

Depth optimal incremental mapping for field programmable gate arrays.
Proceedings of the 37th Conference on Design Automation, 2000

Multi-way partitioning using bi-partition heuristics.
Proceedings of ASP-DAC 2000, 2000

Invited talk: synthesis challenges for next-generation high-performance and high-density PLDs.
Proceedings of ASP-DAC 2000, 2000

Performance driven multiway partitioning.
Proceedings of ASP-DAC 2000, 2000

Edge separability based circuit clustering with application to circuit partitioning.
Proceedings of ASP-DAC 2000, 2000

Dynamic weighting Monte Carlo for constrained floorplan designs in mixed signal application.
Proceedings of ASP-DAC 2000, 2000

1999
Optimal FPGA mapping and retiming with efficient initial state computation.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1999

Theory and algorithm of local-refinement-based optimization with application to device and interconnect sizing.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1999

An efficient approach to multilayer layer assignment with anapplication to via minimization.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1999

VIA design rule consideration in multi-layer maze routing algorithms.
Proceedings of the 1999 International Symposium on Physical Design, 1999

Buffer block planning for interconnect-driven floorplanning.
Proceedings of the 1999 IEEE/ACM International Conference on Computer-Aided Design, 1999

An implicit connection graph maze routing algorithm for ECO routing.
Proceedings of the 1999 IEEE/ACM International Conference on Computer-Aided Design, 1999

Cut Ranking and Pruning: Enabling a General and Efficient FPGA Mapping Solution.
Proceedings of the 1999 ACM/SIGDA Seventh International Symposium on Field Programmable Gate Arrays, 1999

Interconnect Estimation and Dlanning for Deep Submicron Designs.
Proceedings of the 36th Conference on Design Automation, 1999

Simultaneous Circuit Partitioning/Clustering with Retiming for Performance Optimization.
Proceedings of the 36th Conference on Design Automation, 1999

Technology Mapping for FPGAs with Nonuniform Pin Delays and Fast Interconnections.
Proceedings of the 36th Conference on Design Automation, 1999

Interconnect Delay Estimation Models for Synthesis and Design Planning.
Proceedings of the 1999 Conference on Asia South Pacific Design Automation, 1999

Relaxed Simulated Tempering for VLSI Floorplan Designs.
Proceedings of the 1999 Conference on Asia South Pacific Design Automation, 1999

1998
Bounded-skew clock and Steiner routing.
ACM Trans. Design Autom. Electr. Syst., 1998

An efficient algorithm for performance-optimal FPGA technology mapping with retiming.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1998

Efficient algorithms for the minimum shortest path Steiner arborescence problem with applications to VLSI physical design.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1998

An efficient technique for device and interconnect optimization in deep submicron designs.
Proceedings of the 1998 International Symposium on Physical Design, 1998

Intellectual property protection by watermarking combinational logic synthesis solutions.
Proceedings of the 1998 IEEE/ACM International Conference on Computer-Aided Design, 1998

Delay-oriented technology mapping for heterogeneous FPGAs with bounded resources.
Proceedings of the 1998 IEEE/ACM International Conference on Computer-Aided Design, 1998

Multiway partitioning with pairwise movement.
Proceedings of the 1998 IEEE/ACM International Conference on Computer-Aided Design, 1998

How will CAD handle billion-transistor systems? (panel).
Proceedings of the 1998 IEEE/ACM International Conference on Computer-Aided Design, 1998

Technology Mapping for FPGAs with Embedded Memory Blocks.
Proceedings of the 1998 ACM/SIGDA Sixth International Symposium on Field Programmable Gate Arrays, 1998

Boolean Matching for Complex PLBs in LUT-based FPGAs with Application to Architecture Evaluation.
Proceedings of the 1998 ACM/SIGDA Sixth International Symposium on Field Programmable Gate Arrays, 1998

Delay-Optimal Technology Mapping for FPGAs with Heterogeneous LUTs.
Proceedings of the 35th Conference on Design Automation, 1998

Performance Driven Multi-Layer General Area Routing for PCB/MCM Designs.
Proceedings of the 35th Conference on Design Automation, 1998

1997
Performance-driven routing with multiple sources.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1997

Performance driven global routing for standard cell design.
Proceedings of the 1997 International Symposium on Physical Design, 1997

Efficient heuristics for the minimum shortest path Steiner arborescence problem with applications to VLSI physical design.
Proceedings of the 1997 International Symposium on Physical Design, 1997

Interconnect design for deep submicron ICs.
Proceedings of the 1997 IEEE/ACM International Conference on Computer-Aided Design, 1997

Large scale circuit partitioning with loose/stable net removal and signal flow based clustering.
Proceedings of the 1997 IEEE/ACM International Conference on Computer-Aided Design, 1997

Interconnect layout optimization under higher-order RLC model.
Proceedings of the 1997 IEEE/ACM International Conference on Computer-Aided Design, 1997

Global interconnect sizing and spacing with consideration of coupling capacitance.
Proceedings of the 1997 IEEE/ACM International Conference on Computer-Aided Design, 1997

Partially-Dependent Functional Decomposition with Applications in FPGA Synthesis and Mapping.
Proceedings of the 1997 ACM/SIGDA Fifth International Symposium on Field Programmable Gate Arrays, 1997

On acceleration of the check tautology logic synthesis algorithm using an FPGA-based reconfigurable coprocessor.
Proceedings of the 5th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '97), 1997

FPGA Synthesis with Retiming and Pipelining for Clock Period Minimization of Sequential Circuits.
Proceedings of the 34st Conference on Design Automation, 1997

Analysis and Justification of a Simple, Practical 2 1/2-D Capacitance Extraction Methodology.
Proceedings of the 34st Conference on Design Automation, 1997

An Efficient Approach to Multi-Layer Layer Assignment with Application to Via Minimization.
Proceedings of the 34st Conference on Design Automation, 1997

1996
Optimal wiresizing for interconnects with multiple sources.
ACM Trans. Design Autom. Electr. Syst., 1996

Combinational logic synthesis for LUT based field programmable gate arrays.
ACM Trans. Design Autom. Electr. Syst., 1996

Multiway VLSI circuit partitioning based on dual net representation.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1996

Performance optimization of VLSI interconnect layout.
Integr., 1996

Simultaneous buffer and wire sizing for performance and power optimization.
Proceedings of the 1996 International Symposium on Low Power Electronics and Design, 1996

An Improved Algorithm for Performance Optimal Technology Mapping with Retiming in LUT-Based FPGA Desig.
Proceedings of the 1996 International Conference on Computer Design (ICCD '96), 1996

Buffered Steiner tree construction with wire sizing for interconnect layout optimization.
Proceedings of the 1996 IEEE/ACM International Conference on Computer-Aided Design, 1996

An efficient approach to simultaneous transistor and interconnect sizing.
Proceedings of the 1996 IEEE/ACM International Conference on Computer-Aided Design, 1996

RASP: A General Logic Synthesis System for SRAM-Based FPGAs.
Proceedings of the 1996 Fourth International Symposium on Field Programmable Gate Arrays, 1996

Structural Gate Decomposition for Depth-Optimal Technology Mapping in LUT-based FPGA Design.
Proceedings of the 33st Conference on Design Automation, 1996

1995
An efficient multilayer MCM router based on four-via routing.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1995

Optimal wiresizing under Elmore delay model.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1995

The Supercomputer Supernet: A Scalable Distributed Terabit Network.
J. High Speed Networks, 1995

Performance Driven Routing with Mulitiple Sources.
Proceedings of the 1995 IEEE International Symposium on Circuits and Systems, ISCAS 1995, Seattle, Washington, USA, April 30, 1995

Minimum-Cost Bounded-Skew Clock Routing.
Proceedings of the 1995 IEEE International Symposium on Circuits and Systems, ISCAS 1995, Seattle, Washington, USA, April 30, 1995

Bounded-skew clock and Steiner routing under Elmore delay.
Proceedings of the 1995 IEEE/ACM International Conference on Computer-Aided Design, 1995

Simultaneous Depth and Area Minimization in LUT-based FPGA Mapping.
Proceedings of the Third International ACM Symposium on Field-Programmable Gate Arrays, 1995

Exploitation signal flow and logic dependency in standard cell placement.
Proceedings of the 1995 Conference on Asia Pacific Design Automation, Makuhari, Massa, Chiba, Japan, August 29, 1995

1994
On the Minimum Density Interconnection Tree Problem.
VLSI Design, 1994

Channel Density Minimization by Pin Permutation.
VLSI Design, 1994

Simultaneous driver and wire sizing for performance and power optimization.
IEEE Trans. Very Large Scale Integr. Syst., 1994

On area/depth trade-off in LUT-based FPGA technology mapping.
IEEE Trans. Very Large Scale Integr. Syst., 1994

FlowMap: an optimal technology mapping algorithm for delay optimization in lookup-table based FPGA designs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1994

On nominal delay minimization in LUT-based FPGA technology mapping.
Integr., 1994

LUT-based FPGA technology mapping under arbitrary net-delay models.
Comput. Graph., 1994

Parallel logic level simulation of VLSI circuits.
Proceedings of the 26th conference on Winter simulation, 1994

Multi-way VLSI circuit partitioning based on dual net representation.
Proceedings of the 1994 IEEE/ACM International Conference on Computer-Aided Design, 1994

Acyclic Multi-Way Partitioning of Boolean Networks.
Proceedings of the 31st Conference on Design Automation, 1994

1993
Physical models and efficient algorithms for over-the-cell routing in standard cell design.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1993

Matching-based methods for high-performance clock routing.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1993

A provably good multilayer topological planar routing algorithm in IC layout designs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1993

A Provably Good Algorithm for <i>k</i>-Layer Topological Planar Routing Problems.
Proceedings of the Sixth International Conference on VLSI Design, 1993

A Two-pole Circuit Model for VLSI High-speed Interconnection.
Proceedings of the 1993 IEEE International Symposium on Circuits and Systems, 1993

Minimum Density Interconneciton Trees.
Proceedings of the 1993 IEEE International Symposium on Circuits and Systems, 1993

Optimal wiresizing under the distributed Elmore delay model.
Proceedings of the 1993 IEEE/ACM International Conference on Computer-Aided Design, 1993

Beyond the combinatorial limit in depth minimization for LUT-based FPGA designs.
Proceedings of the 1993 IEEE/ACM International Conference on Computer-Aided Design, 1993

A Parallel Bottom-Up Clustering Algorithm with Applications to Circuit Partitioning in VLSI Design.
Proceedings of the 30th Design Automation Conference. Dallas, 1993

Performance-Driven Interconnect Design Based on Distributed RC Delay Model.
Proceedings of the 30th Design Automation Conference. Dallas, 1993

1992
Provably good performance-driven global routing.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1992

A new algorithm for standard cell global routing.
Integr., 1992

DAG-Map: Graph-Based FPGA Technology Mapping for Delay Optimization.
IEEE Des. Test Comput., 1992

An Improved Graph-Based FPGA Techology Mapping Algorithm For Delay Optimization.
Proceedings of the Proceedings 1992 IEEE International Conference on Computer Design: VLSI in Computer & Processors, 1992

An optimal technology mapping algorithm for delay optimization in lookup-table based FPGA designs.
Proceedings of the 1992 IEEE/ACM International Conference on Computer-Aided Design, 1992

A fast multilayer general area router for MCM designs.
Proceedings of the conference on European design automation, 1992

Maximal reduction of lookup-table based FPGAs.
Proceedings of the conference on European design automation, 1992

Net Partitions Yield Better Module Partitions.
Proceedings of the 29th Design Automation Conference, 1992

1991
A layout modification approach to via minimization.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1991

On the k-layer planar subset and topological via minimization problems.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1991

Pin assignment with global routing for general cell designs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1991

Performance-Driven Global Routing for Cell Based ICs.
Proceedings of the Proceedings 1991 IEEE International Conference on Computer Design: VLSI in Computer & Processors, 1991

A Provable Near-Optimal Algorithm for the Channel Pin Assignment Problem.
Proceedings of the Proceedings 1991 IEEE International Conference on Computer Design: VLSI in Computer & Processors, 1991

High-Performance Clock Routing Based on Recursive Geometric Aatching.
Proceedings of the 28th Design Automation Conference, 1991

1990
Over-the-cell channel routing.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1990

On the k-layer planar subset and via minimization problems.
Proceedings of the European Design Automation Conference, 1990

General Models and Algorithms for Over-the-Cell Routing in Standard Cell Design.
Proceedings of the 27th ACM/IEEE Design Automation Conference. Orlando, 1990

1989
Constrained floorplan design for flexible blocks.
Proceedings of the 1989 IEEE International Conference on Computer-Aided Design, 1989

Pin assignment with global routing.
Proceedings of the 1989 IEEE International Conference on Computer-Aided Design, 1989

VIA Minimization by Layout Modification.
Proceedings of the 26th ACM/IEEE Design Automation Conference, 1989

1988
A new approach to three- or four-layer channel routing.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1988

A new formulation of yield enhancement problems for reconfigurable chips.
Proceedings of the 1988 IEEE International Conference on Computer-Aided Design, 1988


  Loading...