Tong Geng

Orcid: 0000-0002-3644-2922

According to our database1, Tong Geng authored at least 103 papers between 2016 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
FPGA-Accelerated Range-Limited Molecular Dynamics.
IEEE Trans. Computers, June, 2024

Visual Fourier Prompt Tuning.
CoRR, 2024

Diff-PIC: Revolutionizing Particle-In-Cell Simulation for Advancing Nuclear Fusion with Diffusion Models.
CoRR, 2024

Inertial Confinement Fusion Forecasting via LLMs.
CoRR, 2024

Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression.
CoRR, 2024

Accurate and Data-Efficient Micro-XRD Phase Identification Using Multi-Task Learning: Application to Hydrothermal Fluids.
CoRR, 2024

A systematic evaluation of computational methods for cell segmentation.
Briefings Bioinform., 2024

Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs.
Proceedings of the Companion of the 15th ACM/SPEC International Conference on Performance Engineering, 2024

Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression.
Proceedings of the International Conference for High Performance Computing, 2024

Bridging the Gap Between LLMs and LNS with Dynamic Data Format and Architecture Codesign.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

DS-GL: Advancing Graph Learning via Harnessing Nature's Power within Scalable Dynamical Systems.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Semi-supervised Crowd Counting Based on Hard Pseudo-labels.
Proceedings of the International Joint Conference on Neural Networks, 2024

SmartFuse: Reconfigurable Smart Switches to Accelerate Fused Collectives in HPC Applications.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024

Prototypical Transformer As Unified Motion Learners.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Extending Power of Nature from Binary to Real-Valued Graph Learning in Real World.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numeric Behaviors.
IEEE Trans. Parallel Distributed Syst., 2023

ClusterFormer: Clustering As A Universal Visual Learner.
CoRR, 2023

Machine Learning Automated Approach for Enormous Synchrotron X-Ray Diffraction Data Interpretation.
CoRR, 2023

RRNet: Towards ReLU-Reduced Neural Network for Two-party Computation Based Private Inference.
CoRR, 2023

FASDA: An FPGA-Aided, Scalable and Distributed Accelerator for Range-Limited Molecular Dynamics.
Proceedings of the International Conference for High Performance Computing, 2023

MGG: Accelerating Graph Neural Networks with Fine-Grained Intra-Kernel Communication-Computation Pipelining on Multi-GPU Platforms.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

LinGCN: Structural Linearized Graph Convolutional Network for Homomorphically Encrypted Inference.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

ClusterFomer: Clustering As A Universal Visual Learner.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Supporting Energy-based Learning with an Ising Machine substrate: a Case Study on RBM.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

FLASH: FPGA-Accelerated Smart Switches with GCN Case Study.
Proceedings of the 37th International Conference on Supercomputing, 2023

Software-Hardware Co-design of Heterogeneous SmartNIC System for Recommendation Models Inference and Training.
Proceedings of the 37th International Conference on Supercomputing, 2023

Accel-GCN: High-Performance GPU Accelerator Design for Graph Convolution Networks.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

PASNet: Polynomial Architecture Search Framework for Two-party Computation-based Secure Neural Network Deployment.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

ML-CGRA: An Integrated Compilation Framework to Enable Efficient Machine Learning Acceleration on CGRAs.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Ising-CF: A Pathbreaking Collaborative Filtering Method Through Efficient Ising Machine Learning.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

TransFlow: Transformer as Flow Learner.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Ising-Traffic: Using Ising Machine Learning to Predict Traffic Congestion under Uncertainty.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Arctic Sea Ice Freeboard Estimation and Variations From Operation IceBridge.
IEEE Trans. Geosci. Remote. Sens., 2022

An improved algorithm for extracting crossovers of satellite ground tracks.
Comput. Geosci., 2022

Empowering GNNs with Fine-grained Communication-Computation Pipelining on Multi-GPU Platforms.
CoRR, 2022

GMI-DRL: Empowering Multi-GPU Deep Reinforcement Learning with GPU Spatial Multiplexing.
CoRR, 2022

GAAF: Searching Activation Functions for Binary Neural Networks through Genetic Algorithm.
CoRR, 2022

Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numerical Behaviors.
CoRR, 2022

Reconfigurable switches for high performance and flexible MPI collectives.
Concurr. Comput. Pract. Exp., 2022

ASAP: automatic synthesis of area-efficient and precision-aware CGRAs.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

CEAZ: accelerating parallel I/O via hardware-algorithm co-designed adaptive lossy compression.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Towards Sparsification of Graph Neural Networks.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022

CoDG-ReRAM: An Algorithm-Hardware Co-design to Accelerate Semi-Structured GNNs on ReRAM.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022

On the Design of Quantum Graph Convolutional Neural Network in the NISQ-Era and Beyond.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022

Towards Real-Time Temporal Graph Learning.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022

The Viability of Using Online Prediction to Perform Extra Work while Executing BSP Applications.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2022

GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Design.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

DRIPS: Dynamic Rebalancing of Pipelined Streaming Applications on CGRAs.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

Optimized Mappings for Symmetric Range-Limited Molecular Force Calculations on FPGAs.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

A Framework for Neural Network Inference on FPGA-Centric SmartNICs.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

FCsN: A FPGA-Centric SmartNIC Framework for Neural Networks.
Proceedings of the 30th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2022

A length adaptive algorithm-hardware co-design of transformer on FPGA through sparse attention and dynamic pipelining.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

2021
BCNN: Binary complex neural network.
Microprocess. Microsystems, November, 2021

FPGA-based high-performance neural network acceleration
PhD thesis, 2021

ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing.
IEEE Trans. Parallel Distributed Syst., 2021

O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference.
IEEE Trans. Parallel Distributed Syst., 2021

Arctic Sea Ice Freeboard Retrieval from Envisat Altimetry Data.
Remote. Sens., 2021

DEM Generation with ICESat-2 Altimetry Data for the Three Antarctic Ice Shelves: Ross, Filchner-Ronne and Amery.
Remote. Sens., 2021

Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search.
CoRR, 2021

Binary Complex Neural Network Acceleration on FPGA.
CoRR, 2021

CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Design of Efficient and Adaptive Lossy Compression.
CoRR, 2021

APNN-TC: accelerating arbitrary precision neural networks on ampere GPU tensor cores.
Proceedings of the International Conference for High Performance Computing, 2021

I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through Islandization.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Accelerating Transformer-based Deep Learning Models on FPGAs using Column Balanced Block Pruning.
Proceedings of the 22nd International Symposium on Quality Electronic Design, 2021

DynPaC: Coarse-Grained, Dynamic, and Partially Reconfigurable Array for Streaming Applications.
Proceedings of the 39th IEEE International Conference on Computer Design, 2021

G-CoS: GNN-Accelerator Co-Search Towards Both Better Accuracy and Efficiency.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search (Special Session Paper).
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

FL-DISCO: Federated Generative Adversarial Network for Graph-based Molecule Drug Discovery: Special Session Paper.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

System-Level Modeling of GPU/FPGA Clusters for Molecular Dynamics Simulations.
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021

Workload Imbalance in HPC Applications: Effect on Performance of In-Network Processing.
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021

A Survey: Handling Irregularities in Neural Network Acceleration with FPGAs.
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021

Upgrade of FPGA Range-Limited Molecular Dynamics to Handle Hundreds of Processors.
Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021

Binary Complex Neural Network Acceleration on FPGA : (Invited Paper).
Proceedings of the 32nd IEEE International Conference on Application-specific Systems, 2021

OpenCGRA: Democratizing Coarse-Grained Reconfigurable Arrays.
Proceedings of the 32nd IEEE International Conference on Application-specific Systems, 2021

Comparison Lift: Bandit-based Experimentation System for Online Advertising.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters.
IEEE Trans. Computers, 2020

Estimating Arctic Sea Ice Thickness with CryoSat-2 Altimetry Data Using the Least Squares Adjustment Method.
Sensors, 2020

AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

CSB-RNN: a faster-than-realtime RNN acceleration framework with compressed structured blocks.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

A Reconfigurable Compute-in-the-Network FPGA Assistant for High-Level Collective Support with Distributed Matrix Multiply Case Study.
Proceedings of the International Conference on Field-Programmable Technology, 2020

A Communication-Efficient Multi-Chip Design for Range-Limited Molecular Dynamics.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

FPGAs in the Network and Novel Communicator Support Accelerate MPI Collectives.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

CQNN: a CGRA-based QNN Framework.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

FP-AMG: FPGA-Based Acceleration Framework for Algebraic Multigrid Solvers.
Proceedings of the 28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2020

Online Evaluation of Audiences for Targeted Advertising via Bandit Experiments.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
UWB-GCN: Hardware Acceleration of Graph-Convolution-Network through Runtime Workload Rebalancing.
CoRR, 2019

Fully Integrated On-FPGA Molecular Dynamics Simulations.
CoRR, 2019

A Scalable Framework for Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters with Weight and Workload Balancing.
CoRR, 2019

Fully integrated FPGA molecular dynamics simulations.
Proceedings of the International Conference for High Performance Computing, 2019

BSTC: a novel binarized-soft-tensor-core design for accelerating bit-based approximated neural nets.
Proceedings of the International Conference for High Performance Computing, 2019

O3BNN: an out-of-order architecture for high-performance binarized neural network inference with fine-grained pruning.
Proceedings of the ACM International Conference on Supercomputing, 2019

GhostSZ: A Transparent FPGA-Accelerated Lossy Compression Framework.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

FP-AMR: A Reconfigurable Fabric Framework for Adaptive Mesh Refinement Applications.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

Molecular Dynamics Range-Limited Force Evaluation Optimized for FPGAs.
Proceedings of the 30th IEEE International Conference on Application-specific Systems, 2019

Accelerating AP3M-Based Computational Astrophysics Simulations with Reconfigurable Clusters.
Proceedings of the 30th IEEE International Conference on Application-specific Systems, 2019

LP-BNN: Ultra-low-Latency BNN Inference with Layer Parallelism.
Proceedings of the 30th IEEE International Conference on Application-specific Systems, 2019

2018
Soft-Core. Multiple-Lane, FPGA-based ADCs for a Liquid Helium Environment.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018

An Access-Pattern-Aware On-Chip Vector Memory System with Automatic Loading for SIMD Architectures.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018

A Framework for Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters with Work and Weight Load Balancing.
Proceedings of the 28th International Conference on Field Programmable Logic and Applications, 2018

FPDeep: Acceleration and Load Balancing of CNN Training on FPGA Clusters.
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

2016
A configurable SIMD architecture with explicit datapath for intelligent learning.
Proceedings of the International Conference on Embedded Computer Systems: Architectures, 2016

MacSim: A MAC-Enabled High-Performance Low-Power SIMD Architecture.
Proceedings of the 2016 Euromicro Conference on Digital System Design, 2016


  Loading...