2025
SmartQCache: Fast and Precise Pulse Control With Near-Quantum Cache Design on FPGA.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., May, 2025
QuST: Optimizing Quantum Neural Network Against Spatial and Temporal Noise Biases.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., April, 2025
Triton-distributed: Programming Overlapping Kernels on Distributed AI Systems with the Triton Compiler.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, April, 2025
ARTERY: Fast Quantum Feedback using Branch Prediction.
Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025
Qtenon: Towards Low-Latency Architecture Integration for Accelerating Hybrid Quantum-Classical Computing.
Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025
Choco-Q: Commute Hamiltonian-based QAOA for Constrained Binary Optimization.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025
Empowering Quantum Error Traceability with MoE for Automatic Calibration.
Proceedings of the Design, Automation & Test in Europe Conference, 2025
2024
Rubick: A Unified Infrastructure for Analyzing, Exploring, and Implementing Spatial Architectures via Dataflow Decomposition.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., April, 2024
SpREM: Exploiting Hamming Sparsity for Fast Quantum Readout Error Mitigation.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024
MorphQPV: Exploiting Isomorphism in Quantum Programs to Facilitate Confident Verification.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
QuFEM: Fast and Accurate Quantum Readout Calibration Using the Finite Element Method.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
2023
Automatic Generation of Spatial Accelerator for Tensor Algebra.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., June, 2023
QuCT: A Framework for Analyzing Quantum Circuit by Extracting Contextual and Topological Features.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023
QPulseLib: Accelerating the Pulse Generation of Quantum Circuit with Reusable Patterns.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023
HyQSAT: A Hybrid Approach for 3-SAT Problems by Integrating Quantum Annealer with CDCL.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023
SSiMD: Supporting Six Signed Multiplications in a DSP Block for Low-Precision CNN on FPGAs.
Proceedings of the International Conference on Field Programmable Technology, 2023
Calabash: Accelerating Attention Using a Systolic Array Chain on FPGAs.
Proceedings of the 33rd International Conference on Field-Programmable Logic and Applications, 2023
Rubick: A Synthesis Framework for Spatial Architectures via Dataflow Decomposition.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
2022
Morphling: A Reconfigurable Architecture for Tensor Computation.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022
FCNNLib: A Flexible Convolution Algorithm Library for Deep Learning on FPGAs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022
An Efficient Hardware Design for Accelerating Sparse CNNs With NAS-Based Models.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022
AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022
2021
OMNI: A Framework for Integrating Hardware and Software Optimizations for Sparse CNNs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021
Sanger: A Co-Design Framework for Enabling Sparse Attention using Reconfigurable Architecture.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021
Analyzing the Design Space of Spatial Tensor Accelerators on FPGAs.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2021
TENET: A Framework for Modeling Tensor Dataflow Based on Relation-centric Notation.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021
TensorLib: A Spatial Accelerator Generation Framework for Tensor Algebra.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021
2020
Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020
Enabling Efficient Fast Convolution Algorithms on GPUs via MegaKernels.
IEEE Trans. Computers, 2020
Generating Systolic Array Accelerators With Reusable Blocks.
IEEE Micro, 2020
FCNNLib: An Efficient and Flexible Convolution Algorithm Library on FPGAs.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020
2019
Speedy: An Accelerator for Sparse Convolutional Neural Networks on FPGAs.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019
An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019
2018
SpWA: an efficient sparse winograd convolutional neural networks accelerator on FPGAs.
Proceedings of the 55th Annual Design Automation Conference, 2018
2017
Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs.
Proceedings of the 54th Annual Design Automation Conference, 2017