Ang Li

Orcid: 0000-0003-3734-9137

Affiliations:
  • Pacific Northwest National Laboratory, Richland, WA, USA


According to our database1, Ang Li authored at least 147 papers between 2014 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
FPGA-Accelerated Range-Limited Molecular Dynamics.
IEEE Trans. Computers, June, 2024

Acceleration of Graph Neural Network-Based Prediction Models in Chemistry via Co-Design Optimization on Intelligence Processing Units.
J. Chem. Inf. Model., March, 2024

Quantum-centric supercomputing for materials science: A perspective on challenges and future directions.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Future Gener. Comput. Syst., 2024

Light-Weight Fault Tolerant Attention for Large Language Model Training.
CoRR, 2024

A GPU accelerated mixed-precision Finite Difference informed Random Walker (FDiRW) solver for strongly inhomogeneous diffusion problems.
CoRR, 2024

Diff-PIC: Revolutionizing Particle-In-Cell Simulation for Advancing Nuclear Fusion with Diffusion Models.
CoRR, 2024

Inertial Confinement Fusion Forecasting via LLMs.
CoRR, 2024

On Scaling Up 3D Gaussian Splatting Training.
CoRR, 2024

Scalable Circuit Cutting and Scheduling in a Resource-constrained and Distributed Quantum System.
CoRR, 2024

Accurate and Data-Efficient Micro-XRD Phase Identification Using Multi-Task Learning: Application to Hydrothermal Fluids.
CoRR, 2024

A Quantum-Classical Collaborative Training Architecture Based on Quantum State Fidelity.
CoRR, 2024

A GPU accelerated mixed-precision Smoothed Particle Hydrodynamics framework with cell-based relative coordinates.
CoRR, 2024

How Much Can We Gain From Tensor Kernel Fusion on GPUs?
IEEE Access, 2024

Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs.
Proceedings of the Companion of the 15th ACM/SPEC International Conference on Performance Engineering, 2024

OPER: Optimality-Guided Embedding Table Parallelization for Large-scale Recommendation Model.
Proceedings of the 2024 USENIX Annual Technical Conference, 2024

Surf-Deformer: Mitigating Dynamic Defects on Surface Code via Adaptive Deformation.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

Bridging the Gap Between LLMs and LNS with Dynamic Data Format and Architecture Codesign.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

DS-GL: Advancing Graph Learning via Harnessing Nature's Power within Scalable Dynamical Systems.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

SmartFuse: Reconfigurable Smart Switches to Accelerate Fused Collectives in HPC Applications.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024

Extending Power of Nature from Binary to Real-Valued Graph Learning in Real World.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

QUAPPROX: A Framework for Benchmarking the Approximability of Variational Quantum Circuit.
Proceedings of the IEEE International Conference on Acoustics, 2024

Understanding Mixed Precision GEMM with MPGemmFI: Insights into Fault Resilience.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

Discovery of Floating-Point Differences Between NVIDIA and AMD GPUs.
Proceedings of the 24th IEEE International Symposium on Cluster, 2024

FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators.
Proceedings of the 24th IEEE International Symposium on Cluster, 2024

RAP: Resource-aware Automated GPU Sharing for Multi-GPU Recommendation Model Training and Input Preprocessing.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Red-QAOA: Efficient Variational Optimization through Circuit Reduction.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
Accelerating matrix-centric graph processing on GPUs through bit-level optimizations.
J. Parallel Distributed Comput., July, 2023

Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numeric Behaviors.
IEEE Trans. Parallel Distributed Syst., 2023

Elastic Resource Management for Deep Learning Applications in a Container Cluster.
IEEE Trans. Cloud Comput., 2023

MPGemmFI: A Fault Injection Technique for Mixed Precision GEMM in ML Applications.
CoRR, 2023

Microarchitectures for Heterogeneous Superconducting Quantum Computers.
CoRR, 2023

Machine Learning Automated Approach for Enormous Synchrotron X-Ray Diffraction Data Interpretation.
CoRR, 2023

MEMQSim: Highly Memory-Efficient and Modularized Quantum State-Vector Simulation.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

FASDA: An FPGA-Aided, Scalable and Distributed Accelerator for Range-Limited Molecular Dynamics.
Proceedings of the International Conference for High Performance Computing, 2023

Enabling Scalable VQE Simulation on Leading HPC Systems.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

A Reference Implementation for a Quantum Message Passing Interface.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

QASMTrans: A QASM Quantum Transpiler Framework for NISQ Devices.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

A Novel Spatial-Temporal Variational Quantum Circuit to Enable Deep Learning on NISQ Devices.
Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2023

MGG: Accelerating Graph Neural Networks with Fine-Grained Intra-Kernel Communication-Computation Pipelining on Multi-GPU Platforms.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

QuComm: Optimizing Collective Communication for Distributed Quantum Computing.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

HetArch: Heterogeneous Microarchitectures for Superconducting Quantum Systems.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Q-BEEP: Quantum Bayesian Error Mitigation Employing Poisson Modeling over the Hamming Spectrum.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

FLASH: FPGA-Accelerated Smart Switches with GCN Case Study.
Proceedings of the 37th International Conference on Supercomputing, 2023

Software-Hardware Co-design of Heterogeneous SmartNIC System for Recommendation Models Inference and Training.
Proceedings of the 37th International Conference on Supercomputing, 2023

BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs.
Proceedings of the 37th International Conference on Supercomputing, 2023

AutoReP: Automatic ReLU Replacement for Fast Private Network Inference.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Design and Evaluation of GPU-FPX: A Low-Overhead tool for Floating-Point Exception Detection in NVIDIA GPUs.
Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, 2023

A Pulse Generation Framework with Augmented Program-aware Basis Gates and Criticality Analysis.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

ML-CGRA: An Integrated Compilation Framework to Enable Efficient Machine Learning Acceleration on CGRAs.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Ising-CF: A Pathbreaking Collaborative Filtering Method Through Efficient Ising Machine Learning.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Distributed Quantum Learning with co-Management in a Multi-tenant Quantum System.
Proceedings of the IEEE International Conference on Big Data, 2023

Ising-Traffic: Using Ising Machine Learning to Predict Traffic Congestion under Uncertainty.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Extreme Acceleration of Graph Neural Network-based Prediction Models for Quantum Chemistry.
CoRR, 2022

MSREP: A Fast yet Light Sparse Matrix Framework for Multi-GPU Systems.
CoRR, 2022

Empowering GNNs with Fine-grained Communication-Computation Pipelining on Multi-GPU Platforms.
CoRR, 2022

CollComm: Enabling Efficient Collective Quantum Communication Based on EPR buffering.
CoRR, 2022

A Synergistic Compilation Workflow for Tackling Crosstalk in Quantum Machines.
CoRR, 2022

GMI-DRL: Empowering Multi-GPU Deep Reinforcement Learning with GPU Spatial Multiplexing.
CoRR, 2022

GAAF: Searching Activation Functions for Binary Neural Networks through Genetic Algorithm.
CoRR, 2022

Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numerical Behaviors.
CoRR, 2022

BNS-GCN: Efficient Full-Graph Training of Graph Convolutional Networks with Partition-Parallelism and Random Boundary Node Sampling.
Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

QuClassi: A Hybrid Deep Neural Network Architecture based on Quantum State Fidelity.
Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

EQC: ensembled quantum computing for variational quantum algorithms.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Benchmarking Quantum Processor Performance through Quantum Distance Metrics Over An Algorithm Suite.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Improving Variational Quantum Algorithms performance through Weighted Quantum Ensembles.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

MARS: Malleable Actor-Critic Reinforcement Learning Scheduler.
Proceedings of the IEEE International Performance, 2022

Quantum Noise in the Flow of Time: A Temporal Study of the Noise in Quantum Computers.
Proceedings of the 28th IEEE International Symposium on On-Line Testing and Robust System Design, 2022

QuCNN: A Quantum Convolutional Neural Network with Entanglement Based Backpropagation.
Proceedings of the 7th IEEE/ACM Symposium on Edge Computing, 2022

ASAP: automatic synthesis of area-efficient and precision-aware CGRAs.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

CEAZ: accelerating parallel I/O via hardware-algorithm co-designed adaptive lossy compression.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Design.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

DRIPS: Dynamic Rebalancing of Pipelined Streaming Applications on CGRAs.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

SO(DA)<sup>2</sup>: End-to-end Generation of Specialized Reconfigurable Architectures (Invited Talk).
Proceedings of the 13th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 11th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms, 2022

Towards Precision-Aware Fault Tolerance Approaches for Mixed-Precision Applications.
Proceedings of the 12th IEEE/ACM Workshop on Fault Tolerance for HPC at eXtreme Scale, 2022

A Framework for Neural Network Inference on FPGA-Centric SmartNICs.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

FCsN: A FPGA-Centric SmartNIC Framework for Neural Networks.
Proceedings of the 30th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2022

A length adaptive algorithm-hardware co-design of transformer on FPGA through sparse attention and dynamic pipelining.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Efficient Hierarchical State Vector Simulation of Quantum Circuits via Acyclic Graph Partitioning.
Proceedings of the IEEE International Conference on Cluster Computing, 2022

2021
BCNN: Binary complex neural network.
Microprocess. Microsystems, November, 2021

ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing.
IEEE Trans. Parallel Distributed Syst., 2021

Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs.
IEEE Trans. Parallel Distributed Syst., 2021

O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference.
IEEE Trans. Parallel Distributed Syst., 2021

Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search.
CoRR, 2021

Binary Complex Neural Network Acceleration on FPGA.
CoRR, 2021

CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Design of Efficient and Adaptive Lossy Compression.
CoRR, 2021

SV-sim: scalable PGAS-based state vector simulation of quantum circuits.
Proceedings of the International Conference for High Performance Computing, 2021

APNN-TC: accelerating arbitrary precision neural networks on ampere GPU tensor cores.
Proceedings of the International Conference for High Performance Computing, 2021

QuGAN: A Quantum State Fidelity based Generative Adversarial Network.
Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2021

I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through Islandization.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Accelerating Transformer-based Deep Learning Models on FPGAs using Column Balanced Block Pruning.
Proceedings of the 22nd International Symposium on Quality Electronic Design, 2021

A Hybrid System for Learning Classical Data in Quantum States.
Proceedings of the IEEE International Performance, 2021

Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

DynPaC: Coarse-Grained, Dynamic, and Partially Reconfigurable Array for Streaming Applications.
Proceedings of the 39th IEEE International Conference on Computer Design, 2021

G-CoS: GNN-Accelerator Co-Search Towards Both Better Accuracy and Efficiency.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search (Special Session Paper).
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

FL-DISCO: Federated Generative Adversarial Network for Graph-based Molecule Drug Discovery: Special Session Paper.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

A Survey: Handling Irregularities in Neural Network Acceleration with FPGAs.
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021

TQEA: Temporal Quantum Error Analysis.
Proceedings of the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2021

AURORA: Automated Refinement of Coarse-Grained Reconfigurable Accelerators.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

Guarding Numerics Amidst Rising Heterogeneity.
Proceedings of the 5th IEEE/ACM International Workshop on Software Correctness for HPC Applications, 2021

Binary Complex Neural Network Acceleration on FPGA : (Invited Paper).
Proceedings of the 32nd IEEE International Conference on Application-specific Systems, 2021

OpenCGRA: Democratizing Coarse-Grained Reconfigurable Arrays.
Proceedings of the 32nd IEEE International Conference on Application-specific Systems, 2021

2020
Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect.
IEEE Trans. Parallel Distributed Syst., 2020

FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters.
IEEE Trans. Computers, 2020

ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing.
CoRR, 2020

Density matrix quantum circuit simulation via the BSP machine on modern GPU clusters.
Proceedings of the International Conference for High Performance Computing, 2020

A parallel sparse tensor benchmark suite on CPUs and GPUs.
Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

A Sparse Tensor Benchmark Suite for CPUs and GPUs.
Proceedings of the IEEE International Symposium on Workload Characterization, 2020

CSB-RNN: a faster-than-realtime RNN acceleration framework with compressed structured blocks.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

Detecting Anomalous Computation with RNNs on GPU-Accelerated HPC Machines.
Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

OpenCGRA: An Open-Source Unified Framework for Modeling, Testing, and Evaluating CGRAs.
Proceedings of the 38th IEEE International Conference on Computer Design, 2020

CQNN: a CGRA-based QNN Framework.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

On the Feasibility of Using Reduced-Precision Tensor Core Operations for Graph Analytics.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

Indicator-Directed Dynamic Power Management for Iterative Workloads on GPU-Accelerated Systems.
Proceedings of the 20th IEEE/ACM International Symposium on Cluster, 2020

2019
UWB-GCN: Hardware Acceleration of Graph-Convolution-Network through Runtime Workload Rebalancing.
CoRR, 2019

A Scalable Framework for Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters with Weight and Workload Balancing.
CoRR, 2019

PASTA: a parallel sparse tensor algorithm benchmark suite.
CCF Trans. High Perform. Comput., 2019

BSTC: a novel binarized-soft-tensor-core design for accelerating bit-based approximated neural nets.
Proceedings of the International Conference for High Performance Computing, 2019

Fingerprinting Anomalous Computation with RNN for GPU-accelerated HPC Machines.
Proceedings of the IEEE International Symposium on Workload Characterization, 2019

O3BNN: an out-of-order architecture for high-performance binarized neural network inference with fine-grained pruning.
Proceedings of the ACM International Conference on Supercomputing, 2019

PIM-VR: Erasing Motion Anomalies In Highly-Interactive Virtual Reality World with Customized Memory Cube.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

LP-BNN: Ultra-low-Latency BNN Inference with Layer Parallelism.
Proceedings of the 30th IEEE International Conference on Application-specific Systems, 2019

2018
Superneurons: dynamic GPU memory management for training deep neural networks.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Introduction to HPPAC 2018.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite.
Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

Warp-Consolidation: A Novel Execution Model for GPUs.
Proceedings of the 32nd International Conference on Supercomputing, 2018

CUDAAdvisor: LLVM-based runtime profiling for modern GPUs.
Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018

2017
Fast synchronization-free algorithms for parallel sparse triangular solves with multiple right-hand sides.
Concurr. Comput. Pract. Exp., 2017

Exploring and analyzing the real impact of modern on-package memory on HPC scientific kernels.
Proceedings of the International Conference for High Performance Computing, 2017

BVF: enabling significant on-chip power savings via bit-value-favor for throughput processors.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Locality-Aware CTA Clustering for Modern GPUs.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

Analysis and design of energy-efficient data-dependent SRAM.
Proceedings of the 12th IEEE International Conference on ASIC, 2017

2016
X: A Comprehensive Analytic Model for Parallel Machines.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

SFU-Driven Transparent Approximation Acceleration on GPUs.
Proceedings of the 2016 International Conference on Supercomputing, 2016

A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves.
Proceedings of the Euro-Par 2016: Parallel Processing, 2016

Critical points based register-concurrency autotuning for GPUs.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

2015
Correlation ratio based volume image registration on GPUs.
Microprocess. Microsystems, 2015

Adaptive and transparent cache bypassing for GPUs.
Proceedings of the International Conference for High Performance Computing, 2015

Fine-Grained Synchronizations and Dataflow Programming on GPUs.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Transit: A Visual Analytical Model for Multithreaded Machines.
Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015

A Locality Aware Convolutional Neural Networks Accelerator.
Proceedings of the 2015 Euromicro Conference on Digital System Design, 2015

Accelerating non-volatile/hybrid processor cache design space exploration for application specific embedded systems.
Proceedings of the 20th Asia and South Pacific Design Automation Conference, 2015

2014
A heterogeneous platform with GPU and FPGA for power efficient high performance computing.
Proceedings of the 2014 International Symposium on Integrated Circuits (ISIC), 2014

Accelerating Volume Image Registration through Correlation Ratio Based Methods on GPUs.
Proceedings of the 17th Euromicro Conference on Digital System Design, 2014


  Loading...