Ang Li
Orcid: 0000-0003-3734-9137Affiliations:
- Pacific Northwest National Laboratory, Richland, WA, USA
According to our database1,
Ang Li
authored at least 142 papers
between 2014 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on pnnl.gov
-
on linkedin.com
-
on orcid.org
-
on angliphd.com
On csauthors.net:
Bibliography
2024
Acceleration of Graph Neural Network-Based Prediction Models in Chemistry via Co-Design Optimization on Intelligence Processing Units.
J. Chem. Inf. Model., March, 2024
Quantum-centric supercomputing for materials science: A perspective on challenges and future directions.
Future Gener. Comput. Syst., 2024
A GPU accelerated mixed-precision Finite Difference informed Random Walker (FDiRW) solver for strongly inhomogeneous diffusion problems.
CoRR, 2024
Diff-PIC: Revolutionizing Particle-In-Cell Simulation for Advancing Nuclear Fusion with Diffusion Models.
CoRR, 2024
Scalable Circuit Cutting and Scheduling in a Resource-constrained and Distributed Quantum System.
CoRR, 2024
A Quantum-Classical Collaborative Training Architecture Based on Quantum State Fidelity.
CoRR, 2024
A GPU accelerated mixed-precision Smoothed Particle Hydrodynamics framework with cell-based relative coordinates.
CoRR, 2024
Proceedings of the Companion of the 15th ACM/SPEC International Conference on Performance Engineering, 2024
OPER: Optimality-Guided Embedding Table Parallelization for Large-scale Recommendation Model.
Proceedings of the 2024 USENIX Annual Technical Conference, 2024
DS-GL: Advancing Graph Learning via Harnessing Nature's Power within Scalable Dynamical Systems.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024
SmartFuse: Reconfigurable Smart Switches to Accelerate Fused Collectives in HPC Applications.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
QUAPPROX: A Framework for Benchmarking the Approximability of Variational Quantum Circuit.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the 24th IEEE International Symposium on Cluster, 2024
FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators.
Proceedings of the 24th IEEE International Symposium on Cluster, 2024
RAP: Resource-aware Automated GPU Sharing for Multi-GPU Recommendation Model Training and Input Preprocessing.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
2023
Accelerating matrix-centric graph processing on GPUs through bit-level optimizations.
J. Parallel Distributed Comput., July, 2023
Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numeric Behaviors.
IEEE Trans. Parallel Distributed Syst., 2023
IEEE Trans. Cloud Comput., 2023
CoRR, 2023
Machine Learning Automated Approach for Enormous Synchrotron X-Ray Diffraction Data Interpretation.
CoRR, 2023
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
FASDA: An FPGA-Aided, Scalable and Distributed Accelerator for Range-Limited Molecular Dynamics.
Proceedings of the International Conference for High Performance Computing, 2023
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
A Novel Spatial-Temporal Variational Quantum Circuit to Enable Deep Learning on NISQ Devices.
Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2023
MGG: Accelerating Graph Neural Networks with Fine-Grained Intra-Kernel Communication-Computation Pipelining on Multi-GPU Platforms.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023
Q-BEEP: Quantum Bayesian Error Mitigation Employing Poisson Modeling over the Hamming Spectrum.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023
Proceedings of the 37th International Conference on Supercomputing, 2023
Software-Hardware Co-design of Heterogeneous SmartNIC System for Recommendation Models Inference and Training.
Proceedings of the 37th International Conference on Supercomputing, 2023
BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs.
Proceedings of the 37th International Conference on Supercomputing, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Design and Evaluation of GPU-FPX: A Low-Overhead tool for Floating-Point Exception Detection in NVIDIA GPUs.
Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, 2023
A Pulse Generation Framework with Augmented Program-aware Basis Gates and Criticality Analysis.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023
ML-CGRA: An Integrated Compilation Framework to Enable Efficient Machine Learning Acceleration on CGRAs.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
Ising-CF: A Pathbreaking Collaborative Filtering Method Through Efficient Ising Machine Learning.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
Proceedings of the IEEE International Conference on Big Data, 2023
Ising-Traffic: Using Ising Machine Learning to Predict Traffic Congestion under Uncertainty.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
2022
Extreme Acceleration of Graph Neural Network-based Prediction Models for Quantum Chemistry.
CoRR, 2022
Empowering GNNs with Fine-grained Communication-Computation Pipelining on Multi-GPU Platforms.
CoRR, 2022
CollComm: Enabling Efficient Collective Quantum Communication Based on EPR buffering.
CoRR, 2022
CoRR, 2022
GMI-DRL: Empowering Multi-GPU Deep Reinforcement Learning with GPU Spatial Multiplexing.
CoRR, 2022
GAAF: Searching Activation Functions for Binary Neural Networks through Genetic Algorithm.
CoRR, 2022
Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numerical Behaviors.
CoRR, 2022
BNS-GCN: Efficient Full-Graph Training of Graph Convolutional Networks with Partition-Parallelism and Random Boundary Node Sampling.
Proceedings of the Fifth Conference on Machine Learning and Systems, 2022
Proceedings of the Fifth Conference on Machine Learning and Systems, 2022
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022
Benchmarking Quantum Processor Performance through Quantum Distance Metrics Over An Algorithm Suite.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022
Improving Variational Quantum Algorithms performance through Weighted Quantum Ensembles.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022
Proceedings of the IEEE International Performance, 2022
Quantum Noise in the Flow of Time: A Temporal Study of the Noise in Quantum Computers.
Proceedings of the 28th IEEE International Symposium on On-Line Testing and Robust System Design, 2022
QuCNN: A Quantum Convolutional Neural Network with Entanglement Based Backpropagation.
Proceedings of the 7th IEEE/ACM Symposium on Edge Computing, 2022
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022
CEAZ: accelerating parallel I/O via hardware-algorithm co-designed adaptive lossy compression.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022
GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Design.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022
SO(DA)<sup>2</sup>: End-to-end Generation of Specialized Reconfigurable Architectures (Invited Talk).
Proceedings of the 13th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 11th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms, 2022
Proceedings of the 12th IEEE/ACM Workshop on Fault Tolerance for HPC at eXtreme Scale, 2022
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022
Proceedings of the 30th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2022
A length adaptive algorithm-hardware co-design of transformer on FPGA through sparse attention and dynamic pipelining.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022
Efficient Hierarchical State Vector Simulation of Quantum Circuits via Acyclic Graph Partitioning.
Proceedings of the IEEE International Conference on Cluster Computing, 2022
2021
ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing.
IEEE Trans. Parallel Distributed Syst., 2021
IEEE Trans. Parallel Distributed Syst., 2021
O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference.
IEEE Trans. Parallel Distributed Syst., 2021
Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search.
CoRR, 2021
CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Design of Efficient and Adaptive Lossy Compression.
CoRR, 2021
Proceedings of the International Conference for High Performance Computing, 2021
APNN-TC: accelerating arbitrary precision neural networks on ampere GPU tensor cores.
Proceedings of the International Conference for High Performance Computing, 2021
Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2021
I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through Islandization.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021
Accelerating Transformer-based Deep Learning Models on FPGAs using Column Balanced Block Pruning.
Proceedings of the 22nd International Symposium on Quality Electronic Design, 2021
Proceedings of the IEEE International Performance, 2021
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021
DynPaC: Coarse-Grained, Dynamic, and Partially Reconfigurable Array for Streaming Applications.
Proceedings of the 39th IEEE International Conference on Computer Design, 2021
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021
Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search (Special Session Paper).
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021
FL-DISCO: Federated Generative Adversarial Network for Graph-based Molecule Drug Discovery: Special Session Paper.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021
Proceedings of the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2021
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021
Proceedings of the 5th IEEE/ACM International Workshop on Software Correctness for HPC Applications, 2021
Proceedings of the 32nd IEEE International Conference on Application-specific Systems, 2021
Proceedings of the 32nd IEEE International Conference on Application-specific Systems, 2021
2020
IEEE Trans. Parallel Distributed Syst., 2020
IEEE Trans. Computers, 2020
ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing.
CoRR, 2020
Density matrix quantum circuit simulation via the BSP machine on modern GPU clusters.
Proceedings of the International Conference for High Performance Computing, 2020
Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020
AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020
Proceedings of the IEEE International Symposium on Workload Characterization, 2020
CSB-RNN: a faster-than-realtime RNN acceleration framework with compressed structured blocks.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020
Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020
OpenCGRA: An Open-Source Unified Framework for Modeling, Testing, and Evaluating CGRAs.
Proceedings of the 38th IEEE International Conference on Computer Design, 2020
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020
On the Feasibility of Using Reduced-Precision Tensor Core Operations for Graph Analytics.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020
Indicator-Directed Dynamic Power Management for Iterative Workloads on GPU-Accelerated Systems.
Proceedings of the 20th IEEE/ACM International Symposium on Cluster, 2020
2019
UWB-GCN: Hardware Acceleration of Graph-Convolution-Network through Runtime Workload Rebalancing.
CoRR, 2019
A Scalable Framework for Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters with Weight and Workload Balancing.
CoRR, 2019
CCF Trans. High Perform. Comput., 2019
BSTC: a novel binarized-soft-tensor-core design for accelerating bit-based approximated neural nets.
Proceedings of the International Conference for High Performance Computing, 2019
Proceedings of the IEEE International Symposium on Workload Characterization, 2019
O3BNN: an out-of-order architecture for high-performance binarized neural network inference with fine-grained pruning.
Proceedings of the ACM International Conference on Supercomputing, 2019
PIM-VR: Erasing Motion Anomalies In Highly-Interactive Virtual Reality World with Customized Memory Cube.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019
Proceedings of the 30th IEEE International Conference on Application-specific Systems, 2019
2018
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018
Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018
Proceedings of the 32nd International Conference on Supercomputing, 2018
Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018
2017
Fast synchronization-free algorithms for parallel sparse triangular solves with multiple right-hand sides.
Concurr. Comput. Pract. Exp., 2017
Exploring and analyzing the real impact of modern on-package memory on HPC scientific kernels.
Proceedings of the International Conference for High Performance Computing, 2017
BVF: enabling significant on-chip power savings via bit-value-favor for throughput processors.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017
Proceedings of the 12th IEEE International Conference on ASIC, 2017
2016
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016
Proceedings of the 2016 International Conference on Supercomputing, 2016
Proceedings of the Euro-Par 2016: Parallel Processing, 2016
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016
2015
Microprocess. Microsystems, 2015
Proceedings of the International Conference for High Performance Computing, 2015
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015
Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015
Proceedings of the 2015 Euromicro Conference on Digital System Design, 2015
Accelerating non-volatile/hybrid processor cache design space exploration for application specific embedded systems.
Proceedings of the 20th Asia and South Pacific Design Automation Conference, 2015
2014
A heterogeneous platform with GPU and FPGA for power efficient high performance computing.
Proceedings of the 2014 International Symposium on Integrated Circuits (ISIC), 2014
Accelerating Volume Image Registration through Correlation Ratio Based Methods on GPUs.
Proceedings of the 17th Euromicro Conference on Digital System Design, 2014