Zhiru Zhang

Orcid: 0000-0002-0778-0308

According to our database1, Zhiru Zhang authored at least 156 papers between 2003 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
UniSparse: An Intermediate Language for General Sparse Format Customization.
Proc. ACM Program. Lang., 2024

Allo: A Programming Model for Composable Accelerator Design.
Proc. ACM Program. Lang., 2024

RapidStream IR: Infrastructure for FPGA High-Level Physical Synthesis.
CoRR, 2024

Rapid GPU-Based Pangenome Graph Layout.
CoRR, 2024

Radial Networks: Dynamic Layer Routing for High-Performance Large Language Models.
CoRR, 2024

SAGMAN: Stability Analysis of Graph Neural Networks on the Manifolds.
CoRR, 2024

Trainable Fixed-Point Quantization for Deep Learning Acceleration on FPGAs.
CoRR, 2024

Supporting a Virtual Vector Instruction Set on a Commercial Compute-in-SRAM Accelerator.
IEEE Comput. Archit. Lett., 2024

Sabre: Hardware-Accelerated Snapshot Compression for Serverless MicroVMs.
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

Scalable, Programmable and Dense: The HammerBlade Open-Source RISC-V Manycore.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Differentiable Combinatorial Scheduling at Scale.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Exploring the Limits of Semantic Image Compression at Micro-bits per Pixel.
Proceedings of the Second Tiny Papers Track at ICLR 2024, 2024

Polynormer: Polynomial-Expressive Graph Transformer in Linear Time.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

LibPreemptible: Enabling Fast, Adaptive, and Hardware-Assisted User-Space Scheduling.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

Formal Verification of Source-to-Source Transformations for HLS.
Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2024

A Comprehensive Evaluation of FPGA-Based Spatial Acceleration of LLMs.
Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2024

ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Less is More: Hop-Wise Graph Attention for Scalable and Generalizable Learning on Circuits.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
RapidStream 2.0: Automated Parallel Implementation of Latency-Insensitive FPGA Designs Through Partial Reconfiguration.
ACM Trans. Reconfigurable Technol. Syst., December, 2023

TAPA: A Scalable Task-parallel Dataflow Programming Framework for Modern FPGAs with Co-optimization of HLS and Physical Design.
ACM Trans. Reconfigurable Technol. Syst., December, 2023

A 28-nm 8-bit Floating-Point Tensor Core-Based Programmable CNN Training Processor With Dynamic Structured Sparsity.
IEEE J. Solid State Circuits, 2023

Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference.
CoRR, 2023

Comprehensive Benchmarking of Binary Neural Networks on NVM Crossbar Architectures.
CoRR, 2023

FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search.
CoRR, 2023

Towards Fast, Adaptive, and Hardware-Assisted User-Space Scheduling.
CoRR, 2023

Decoupled Model Schedule for Deep Learning Training.
CoRR, 2023

Resilient Baseband Processing in Virtualized RANs with Slingshot.
Proceedings of the ACM SIGCOMM 2023 Conference, 2023

Binarized Neural Machine Translation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

A Case for Open EDA Verticals.
Proceedings of the 2023 International Symposium on Physical Design, 2023

Equality Saturation for Datapath Synthesis: A Pathway to Pareto Optimality.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Special Session: Machine Learning for Embedded System Design.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2023

2022
FPGA HLS Today: Successes, Challenges, and Opportunities.
ACM Trans. Reconfigurable Technol. Syst., 2022

A Tensor Processing Framework for CPU-Manycore Heterogeneous Systems.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

Reverse-Engineering CNN Models Using Side-Channel Attacks.
IEEE Des. Test, 2022

Benchmarking GNN-Based Recommender Systems on Intel Optane Persistent Memory.
CoRR, 2022

Structured Pruning is All You Need for Pruning CNNs at Initialization.
CoRR, 2022

Understanding Hyperdimensional Computing for Parallel Single-Pass Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

GARNET: Reduced-Rank Topology Learning for Robust and Scalable Graph Neural Networks.
Proceedings of the Learning on Graphs Conference, 2022

SoftVN: efficient memory protection via software-provided version numbers.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

MGX: near-zero overhead memory protection for data-intensive accelerators.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Exact Memory- and Communication-aware Scheduling of DNNs on Pipelined Edge TPUs.
Proceedings of the 7th IEEE/ACM Symposium on Edge Computing, 2022

HeteroFlow: An Accelerator Programming Model with Decoupled Data Placement for Software-Defined FPGAs.
Proceedings of the FPGA '22: The 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Virtual Event, USA, 27 February 2022, 2022

RapidStream: Parallel Physical Implementation of FPGA HLS Designs.
Proceedings of the FPGA '22: The 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Virtual Event, USA, 27 February 2022, 2022

High-Performance Sparse Linear Algebra on HBM-Equipped FPGAs Using HLS: A Case Study on SpMV.
Proceedings of the FPGA '22: The 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Virtual Event, USA, 27 February 2022, 2022

IMpress: Large Integer Multiplication Expression Rewriting for FPGA HLS.
Proceedings of the 30th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2022

A 28nm 8-bit Floating-Point Tensor Core based CNN Training Processor with Dynamic Activation/Weight Sparsification.
Proceedings of the 48th IEEE European Solid State Circuits Conference, 2022

Accelerator design with decoupled hardware customizations: benefits and challenges: invited.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

GuardNN: secure accelerator architecture for privacy-preserving deep learning.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

PokeBNN: A Binary Pursuit of Lightweight Accuracy.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Programming and Synthesis for Software-defined FPGA Acceleration: Status and Future Prospects.
ACM Trans. Reconfigurable Technol. Syst., 2021

Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Codesign.
IEEE Des. Test, 2021

Guest Editors' Introduction: Machine Intelligence at the Edge.
IEEE Des. Test, 2021

A Roadmap for Enabling a Future-Proof In-Network Computing Data Plane Ecosystem.
CoRR, 2021

Dense Pruning of Pointwise Convolutions in the Frequency Domain.
CoRR, 2021

Dagger: Accelerating RPCs in Cloud Microservices Through Tightly-Coupled Reconfigurable NICs.
CoRR, 2021

Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Co-design.
CoRR, 2021

BulletTrain: Accelerating Robust Neural Network Training via Boundary Example Mining.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

SPADE: A Spectral Method for Black-Box Adversarial Robustness Evaluation.
Proceedings of the 38th International Conference on Machine Learning, 2021

GraphLily: Accelerating Graph Linear Algebra on HBM-Equipped FPGAs.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

AutoBridge: Coupling Coarse-Grained Floorplanning and Pipelining for High-Frequency HLS Design on Multi-Die FPGAs.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

Scaling Up Hardware Accelerator Verification using A-QED with Functional Decomposition.
Proceedings of the Formal Methods in Computer Aided Design, 2021


GLAIVE: Graph Learning Assisted Instruction Vulnerability Estimation.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

Distilling Arbitration Logic from Traces using Machine Learning: A Case Study on NoC.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

Dagger: efficient and fast RPCs in cloud microservices with near-memory reconfigurable NICs.
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

Layout Symmetry Annotation for Analog Circuits with Graph Neural Networks.
Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021

2020
GuardNN: Secure DNN Accelerator for Privacy-Preserving Deep Learning.
CoRR, 2020

MgX: Near-Zero Overhead Memory Protection with an Application to Secure DNN Acceleration.
CoRR, 2020

Dagger: Towards Efficient RPCs in Cloud Microservices With Near-Memory Reconfigurable NICs.
IEEE Comput. Archit. Lett., 2020

FeatGraph: a flexible and efficient backend for graph neural network systems.
Proceedings of the International Conference for High Performance Computing, 2020

Predictable accelerator design with time-sensitive affine types.
Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2020

MatRaptor: A Sparse-Sparse Matrix Multiplication Accelerator Based on Row-Wise Product.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations.
Proceedings of the 8th International Conference on Learning Representations, 2020

GraphZoom: A Multi-level Spectral Approach for Accurate and Scalable Graph Embedding.
Proceedings of the 8th International Conference on Learning Representations, 2020

Accurate Operation Delay Prediction for FPGA HLS Using Graph Neural Networks.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020

SuSy: A Programming Model for Productive Construction of High-Performance Systolic Arrays on FPGAs.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020

Tensaurus: A Versatile Accelerator for Mixed Sparse-Dense Tensor Computations.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

A-QED Verification of Hardware Accelerators.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Analysis and Optimization of the Implicit Broadcasts in FPGA HLS to Improve Maximum Frequency.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

2019
PIMap: A Flexible Framework for Improving LUT-Based Technology Mapping via Parallelized Iterative Optimization.
ACM Trans. Reconfigurable Technol. Syst., 2019

Overwrite Quantization: Opportunistic Outlier Handling for Neural Network Accelerators.
CoRR, 2019

A 1.4 GHz 695 Giga Risc-V Inst/s 496-Core Manycore Processor With Mesh On-Chip Network and an All-Digital Synthesized PLL in 16nm CMOS.
Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019

Channel Gating Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Boosting the Performance of CNN Accelerators with Dynamic Fine-Grained Channel Gating.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Improving Neural Network Quantization without Retraining using Outlier Channel Splitting.
Proceedings of the 36th International Conference on Machine Learning, 2019

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Computing.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

LAMDA: Learning-Assisted Multi-stage Autotuning for FPGA Design Closure.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

T2S-Tensor: Productively Generating High-Performance Spatial Hardware for Dense Tensor Computations.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

PRIMAL: Power Inference using Machine Learning.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Painting on Placement: Forecasting Routing Congestion using Conditional Generative Adversarial Nets.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Rapid Generation of High-Qality RISC-V Processors from Functional Instruction Set Specifications.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Designing Secure Cryptographic Accelerators with Information Flow Enforcement: A Case Study on AES.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Improving Scalability of Exact Modulo Scheduling with Specialized Conflict-Driven Learning.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Building Efficient Deep Neural Networks With Unitary Group Convolutions.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
The Celerity Open-Source 511-Core RISC-V Tiered Accelerator Fabric: Fast Architectures and Design Methodologies for Fast Chips.
IEEE Micro, 2018

Channel Gating Neural Networks.
CoRR, 2018

High-level synthesis with timing-sensitive information flow enforcement.
Proceedings of the International Conference on Computer-Aided Design, 2018

Rosetta: A Realistic High-Level Synthesis Benchmark Suite for Software Programmable FPGAs.
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

DATuner: An Extensible Distributed Autotuning Framework for FPGA Design and Design Automation: (Abstract Only).
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

A Scalable Approach to Exact Resource-Constrained Scheduling Based on a Joint SDC and SAT Formulation.
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

Fast and Accurate Estimation of Quality of Results in High-Level Synthesis with Machine Learning.
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

Reverse engineering convolutional neural networks through side-channel information leaks.
Proceedings of the 55th Annual Design Automation Conference, 2018

2017
Architecture and Synthesis for Area-Efficient Pipelining of Irregular Loop Nests.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2017

Statistically certified approximate logic synthesis.
Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design, 2017

A New Approach to Automatic Memory Banking using Trace-Based Address Mining.
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs.
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

A Parallel Bandit-Based Approach for Autotuning FPGA Compilation.
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

Accelerating Face Detection on Programmable SoC Using C-Based Synthesis.
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

A Parallelized Iterative Improvement Approach to Area Optimization for LUT-Based Technology Mapping.
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

Dynamic Hazard Resolution for Pipelining Irregular Loops in High-Level Synthesis.
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

FPGA-Based Real-Time Charged Particle Trajectory Reconstruction at the Large Hadron Collider.
Proceedings of the 25th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2017

Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017

Enabling adaptive loop pipelining in high-level synthesis.
Proceedings of the 51st Asilomar Conference on Signals, Systems, and Computers, 2017

2016
Platform choices and design demands for IoT platforms: cost, power, and performance tradeoffs.
IET Cyper-Phys. Syst.: Theory & Appl., 2016

Characterizing the Benefits and Limitations of Smart Building Meeting Room Scheduling.
Proceedings of the 7th ACM/IEEE International Conference on Cyber-Physical Systems, 2016

Improving high-level synthesis with decoupled data structure optimization.
Proceedings of the 53rd Annual Design Automation Conference, 2016

2015
High-level Synthesis for Low-power Design.
IPSJ Trans. Syst. LSI Des. Methodol., 2015

ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2015

DA Systemization of Knowledge: A Catalog of Prior Forward-Looking Initiatives.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2015

Mapping-Aware Constrained Scheduling for LUT-Based FPGAs.
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

Area-efficient pipelining for FPGA-targeted high-level synthesis.
Proceedings of the 52nd Annual Design Automation Conference, 2015

A reconfigurable analog substrate for highly efficient maximum flow computation.
Proceedings of the 52nd Annual Design Automation Conference, 2015

2014
Architectural Specialization for Inter-Iteration Loop Dependence Patterns.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

CASA: correlation-aware speculative adders.
Proceedings of the International Symposium on Low Power Electronics and Design, 2014

Multithreaded pipeline synthesis for data-parallel kernels.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2014

Flushing-Enabled Loop Pipelining for High-Level Synthesis.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

2013
SDC-based modulo scheduling for pipeline synthesis.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2013

2012
ESL Design Methodology.
J. Electr. Comput. Eng., 2012

2011
High-Level Synthesis for FPGAs: From Prototyping to Deployment.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2011

2010
Behavior-Level Observability Analysis for Operation Gating in Low-Power Behavioral Synthesis.
ACM Trans. Design Autom. Electr. Syst., 2010

Image classification with spectral and texture features based on SVM.
Proceedings of the 18th International Conference on Geoinformatics: GIScience in Change, 2010

Bit-level optimization for high-level synthesis and FPGA-based acceleration.
Proceedings of the ACM/SIGDA 18th International Symposium on Field Programmable Gate Arrays, 2010

2009
Behavior-level observability don't-cares and application to low-power behavioral synthesis.
Proceedings of the 2009 International Symposium on Low Power Electronics and Design, 2009

Scheduling with soft constraints.
Proceedings of the 2009 International Conference on Computer-Aided Design, 2009

Revisiting bitwidth optimizations.
Proceedings of the ACM/SIGDA 17th International Symposium on Field Programmable Gate Arrays, 2009

Evaluation of Static Analysis Techniques for Fixed-Point Precision Optimization.
Proceedings of the FCCM 2009, 2009

2008
Scheduling with integer time budgeting for low-power optimization.
Proceedings of the 13th Asia South Pacific Design Automation Conference, 2008

Behavioral synthesis with activating unused flip-flops for reducing glitch power in FPGA.
Proceedings of the 13th Asia South Pacific Design Automation Conference, 2008

2007
High-Level Power Estimation and Low-Power Design Space Exploration for FPGAs.
Proceedings of the 12th Conference on Asia South Pacific Design Automation, 2007

2006
Architecture and Compiler Optimizations for Data Bandwidth Improvement in Configurable Processors.
IEEE Trans. Very Large Scale Integr. Syst., 2006

Platform-Based Behavior-Level and System-Level Synthesis.
Proceedings of the 2006 IEEE International SOC Conference, Austin, Texas, USA, 2006

An efficient and versatile scheduling algorithm based on SDC formulation.
Proceedings of the 43rd Design Automation Conference, 2006

Behavior and communication co-optimization for systems with sequential communication media.
Proceedings of the 43rd Design Automation Conference, 2006

2005
Architecture and compilation for data bandwidth improvement in configurable embedded processors.
Proceedings of the 2005 International Conference on Computer-Aided Design, 2005

Instruction set extension with shadow registers for configurable processors.
Proceedings of the ACM/SIGDA 13th International Symposium on Field Programmable Gate Arrays, 2005

Bitwidth-aware scheduling and binding in high-level synthesis.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

2004
Architecture and synthesis for on-chip multicycle communication.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2004

Application-specific instruction generation for configurable processor architectures.
Proceedings of the ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays, 2004

Architecture-level synthesis for automatic interconnect pipelining.
Proceedings of the 41th Design Automation Conference, 2004

2003
Architecture and synthesis for multi-cycle communication.
Proceedings of the 2003 International Symposium on Physical Design, 2003

Gradual Relaxation Techniques with Applications to Behavioral Synthesis.
Proceedings of the 2003 International Conference on Computer-Aided Design, 2003

Architectural Synthesis Integrated with Global Placement for Multi-Cycle Communication.
Proceedings of the 2003 International Conference on Computer-Aided Design, 2003

Architecture and synthesis for multi-cycle on-chip communication.
Proceedings of the 1st IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2003


  Loading...