Tong Geng
Orcid: 0000-0002-3644-2922
According to our database1,
Tong Geng
authored at least 100 papers
between 2016 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
Diff-PIC: Revolutionizing Particle-In-Cell Simulation for Advancing Nuclear Fusion with Diffusion Models.
CoRR, 2024
Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression.
CoRR, 2024
Accurate and Data-Efficient Micro-XRD Phase Identification Using Multi-Task Learning: Application to Hydrothermal Fluids.
CoRR, 2024
Briefings Bioinform., 2024
Proceedings of the Companion of the 15th ACM/SPEC International Conference on Performance Engineering, 2024
DS-GL: Advancing Graph Learning via Harnessing Nature's Power within Scalable Dynamical Systems.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024
Proceedings of the International Joint Conference on Neural Networks, 2024
SmartFuse: Reconfigurable Smart Switches to Accelerate Fused Collectives in HPC Applications.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
2023
Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numeric Behaviors.
IEEE Trans. Parallel Distributed Syst., 2023
Machine Learning Automated Approach for Enormous Synchrotron X-Ray Diffraction Data Interpretation.
CoRR, 2023
RRNet: Towards ReLU-Reduced Neural Network for Two-party Computation Based Private Inference.
CoRR, 2023
FASDA: An FPGA-Aided, Scalable and Distributed Accelerator for Range-Limited Molecular Dynamics.
Proceedings of the International Conference for High Performance Computing, 2023
MGG: Accelerating Graph Neural Networks with Fine-Grained Intra-Kernel Communication-Computation Pipelining on Multi-GPU Platforms.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023
LinGCN: Structural Linearized Graph Convolutional Network for Homomorphically Encrypted Inference.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Supporting Energy-based Learning with an Ising Machine substrate: a Case Study on RBM.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023
Proceedings of the 37th International Conference on Supercomputing, 2023
Software-Hardware Co-design of Heterogeneous SmartNIC System for Recommendation Models Inference and Training.
Proceedings of the 37th International Conference on Supercomputing, 2023
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023
PASNet: Polynomial Architecture Search Framework for Two-party Computation-based Secure Neural Network Deployment.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
ML-CGRA: An Integrated Compilation Framework to Enable Efficient Machine Learning Acceleration on CGRAs.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
Ising-CF: A Pathbreaking Collaborative Filtering Method Through Efficient Ising Machine Learning.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Ising-Traffic: Using Ising Machine Learning to Predict Traffic Congestion under Uncertainty.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
2022
IEEE Trans. Geosci. Remote. Sens., 2022
Comput. Geosci., 2022
Empowering GNNs with Fine-grained Communication-Computation Pipelining on Multi-GPU Platforms.
CoRR, 2022
GMI-DRL: Empowering Multi-GPU Deep Reinforcement Learning with GPU Spatial Multiplexing.
CoRR, 2022
GAAF: Searching Activation Functions for Binary Neural Networks through Genetic Algorithm.
CoRR, 2022
Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numerical Behaviors.
CoRR, 2022
Concurr. Comput. Pract. Exp., 2022
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022
CEAZ: accelerating parallel I/O via hardware-algorithm co-designed adaptive lossy compression.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022
Proceedings of the IEEE 40th International Conference on Computer Design, 2022
CoDG-ReRAM: An Algorithm-Hardware Co-design to Accelerate Semi-Structured GNNs on ReRAM.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022
On the Design of Quantum Graph Convolutional Neural Network in the NISQ-Era and Beyond.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022
Proceedings of the IEEE 40th International Conference on Computer Design, 2022
The Viability of Using Online Prediction to Perform Extra Work while Executing BSP Applications.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2022
GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Design.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022
Optimized Mappings for Symmetric Range-Limited Molecular Force Calculations on FPGAs.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022
Proceedings of the 30th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2022
A length adaptive algorithm-hardware co-design of transformer on FPGA through sparse attention and dynamic pipelining.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022
2021
ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing.
IEEE Trans. Parallel Distributed Syst., 2021
O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference.
IEEE Trans. Parallel Distributed Syst., 2021
DEM Generation with ICESat-2 Altimetry Data for the Three Antarctic Ice Shelves: Ross, Filchner-Ronne and Amery.
Remote. Sens., 2021
Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search.
CoRR, 2021
CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Design of Efficient and Adaptive Lossy Compression.
CoRR, 2021
APNN-TC: accelerating arbitrary precision neural networks on ampere GPU tensor cores.
Proceedings of the International Conference for High Performance Computing, 2021
I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through Islandization.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021
Accelerating Transformer-based Deep Learning Models on FPGAs using Column Balanced Block Pruning.
Proceedings of the 22nd International Symposium on Quality Electronic Design, 2021
DynPaC: Coarse-Grained, Dynamic, and Partially Reconfigurable Array for Streaming Applications.
Proceedings of the 39th IEEE International Conference on Computer Design, 2021
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021
Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search (Special Session Paper).
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021
FL-DISCO: Federated Generative Adversarial Network for Graph-based Molecule Drug Discovery: Special Session Paper.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021
Workload Imbalance in HPC Applications: Effect on Performance of In-Network Processing.
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021
Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021
Proceedings of the 32nd IEEE International Conference on Application-specific Systems, 2021
Proceedings of the 32nd IEEE International Conference on Application-specific Systems, 2021
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
2020
IEEE Trans. Computers, 2020
Estimating Arctic Sea Ice Thickness with CryoSat-2 Altimetry Data Using the Least Squares Adjustment Method.
Sensors, 2020
AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020
CSB-RNN: a faster-than-realtime RNN acceleration framework with compressed structured blocks.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020
A Reconfigurable Compute-in-the-Network FPGA Assistant for High-Level Collective Support with Distributed Matrix Multiply Case Study.
Proceedings of the International Conference on Field-Programmable Technology, 2020
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020
Proceedings of the 28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2020
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
2019
UWB-GCN: Hardware Acceleration of Graph-Convolution-Network through Runtime Workload Rebalancing.
CoRR, 2019
A Scalable Framework for Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters with Weight and Workload Balancing.
CoRR, 2019
Proceedings of the International Conference for High Performance Computing, 2019
BSTC: a novel binarized-soft-tensor-core design for accelerating bit-based approximated neural nets.
Proceedings of the International Conference for High Performance Computing, 2019
O3BNN: an out-of-order architecture for high-performance binarized neural network inference with fine-grained pruning.
Proceedings of the ACM International Conference on Supercomputing, 2019
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019
Proceedings of the 30th IEEE International Conference on Application-specific Systems, 2019
Accelerating AP3M-Based Computational Astrophysics Simulations with Reconfigurable Clusters.
Proceedings of the 30th IEEE International Conference on Application-specific Systems, 2019
Proceedings of the 30th IEEE International Conference on Application-specific Systems, 2019
2018
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018
An Access-Pattern-Aware On-Chip Vector Memory System with Automatic Loading for SIMD Architectures.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018
A Framework for Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters with Work and Weight Load Balancing.
Proceedings of the 28th International Conference on Field Programmable Logic and Applications, 2018
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018
2016
Proceedings of the International Conference on Embedded Computer Systems: Architectures, 2016
Proceedings of the 2016 Euromicro Conference on Digital System Design, 2016