Zeke Wang

Orcid: 0000-0001-8550-9241

According to our database1, Zeke Wang authored at least 59 papers between 1984 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Staleness-Reduction Mini-Batch K-Means.
IEEE Trans. Neural Networks Learn. Syst., October, 2024

AIbench: a tool for benchmarking Huawei ascend AI processors.
CCF Trans. High Perform. Comput., April, 2024

SparseACC: A Generalized Linear Model Accelerator for Sparse Datasets.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., March, 2024

LuWu: An End-to-End In-Network Out-of-Core Optimizer for 100B-Scale Model-in-Network Data-Parallel Training on Distributed GPUs.
CoRR, 2024

TorchGT: A Holistic System for Large-scale Graph Transformer Training.
CoRR, 2024

DeFT: Flash Tree-attention with IO-Awareness for Efficient Tree-search-based LLM Inference.
CoRR, 2024

Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU.
CoRR, 2024

Demystifying Datapath Accelerator Enhanced Off-path SmartNIC.
CoRR, 2024

Understanding Routable PCIe Performance for Composable Infrastructures.
Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

DmRPC: Disaggregated Memory-aware Datacenter RPC for Data-intensive Applications.
Proceedings of the 40th IEEE International Conference on Data Engineering, 2024

2023
P4SGD: Programmable Switch Enhanced Model-Parallel Training on Generalized Linear Models on Distributed FPGAs.
IEEE Trans. Parallel Distributed Syst., August, 2023

cuZK: Accelerating Zero-Knowledge Proof with A Faster Parallel Multi-Scalar Multiplication Algorithm on GPUs.
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2023

Helios: An Efficient Out-of-core GNN Training System on Terabyte-scale Graphs with In-memory Performance.
CoRR, 2023

PyHGL: A Python-based Hardware Generation Language Framework.
CoRR, 2023

Legion: Automatically Pushing the Envelope of Multi-GPU System for Billion-Scale GNN Training.
Proceedings of the 2023 USENIX Annual Technical Conference, 2023

Achelous: Enabling Programmability, Elasticity, and Reliability in Hyperscale Cloud Networks.
Proceedings of the ACM SIGCOMM 2023 Conference, 2023

SmartDS: Middle-Tier-centric SmartNIC Enabling Application-aware Message Split for Disaggregated Block Storage.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

BM-Store: A Transparent and High-performance Local Storage Architecture for Bare-metal Clouds Enabling Large-scale Deployment.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

SSiMD: Supporting Six Signed Multiplications in a DSP Block for Low-Precision CNN on FPGAs.
Proceedings of the International Conference on Field Programmable Technology, 2023

MARS: Exploiting Multi-Level Parallelism for DNN Workloads on Adaptive Multi-Accelerator Systems.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

2022
Parallel and Distributed Structured SVM Training.
IEEE Trans. Parallel Distributed Syst., 2022

Shuhai: A Tool for Benchmarking High Bandwidth Memory on FPGAs.
IEEE Trans. Computers, 2022

cuZK: Accelerating Zero-Knowledge Proof with A Faster Parallel Multi-Scalar Multiplication Algorithm on GPUs.
IACR Cryptol. ePrint Arch., 2022

FpgaNIC: An FPGA-based Versatile 100Gb SmartNIC for GPUs.
Proceedings of the 2022 USENIX Annual Technical Conference, 2022

Terminator on SkyNet: a practical DVFS attack on DNN hardware IP for UAV object detection.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Multi-objective Meta-return Reinforcement Learning for Sequential Recommendation.
Proceedings of the Artificial Intelligence - Second CAAI International Conference, 2022

2021
Understanding and Optimizing Conjunctive Predicates Under Memory-Efficient Storage Layouts.
IEEE Trans. Knowl. Data Eng., 2021

ScalaBFS: A Scalable BFS Accelerator on HBM-Enhanced FPGAs.
CoRR, 2021

Graph Sampling with Fast Random Walker on HBM-enabled FPGA Accelerators.
Proceedings of the 31st International Conference on Field-Programmable Logic and Applications, 2021

2020
Optimizing Memory Performance of Xilinx FPGAs under Vitis.
CoRR, 2020

Benchmarking High Bandwidth Memory on FPGAs.
CoRR, 2020

Boyi: A Systematic Framework for Automatically Deciding the Right Execution Model of OpenCL Applications on FPGAs.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

BiS-KM: Enabling Any-Precision K-Means on FPGAs.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

Shuhai: Benchmarking High Bandwidth Memory On FPGAS.
Proceedings of the 28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2020

StRoM: smart remote memory.
Proceedings of the EuroSys '20: Fifteenth EuroSys Conference 2020, 2020

Tackling Hardware/Software co-design from a database perspective.
Proceedings of the 10th Conference on Innovative Data Systems Research, 2020

2019
Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning.
Proc. VLDB Endow., 2019

doppioDB 2.0: Hardware Techniques for Improved Integration of Machine Learning into Databases.
Proc. VLDB Endow., 2019

Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning (Technical Report).
CoRR, 2019

DPI: The Data Processing Interface for Modern Networks.
Proceedings of the 9th Biennial Conference on Innovative Data Systems Research, 2019

2018
G-NET: Effective GPU Sharing in NFV Systems.
Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation, 2018

Hebe: An Order-Oblivious and High-Performance Execution Scheme for Conjunctive Predicates.
Proceedings of the 34th IEEE International Conference on Data Engineering, 2018

2017
Multikernel Data Partitioning With Channel on OpenCL-Based FPGAs.
IEEE Trans. Very Large Scale Integr. Syst., 2017

Design and FPGA Implementation of a Reconfigurable Digital Down Converter for Wideband Applications.
IEEE Trans. Very Large Scale Integr. Syst., 2017

FPGA implementation of a reconfigurable channelization for simultaneous multichannel DRM30/FM receiver.
IEEE Trans. Consumer Electron., 2017

2016
Design and FPGA Implementation of a Reconfigurable 1024-Channel Channelization Architecture for SDR Application.
IEEE Trans. Very Large Scale Integr. Syst., 2016

Melia: A MapReduce Framework on OpenCL-Based FPGAs.
IEEE Trans. Parallel Distributed Syst., 2016

A performance analysis framework for optimizing OpenCL applications on FPGAs.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Relational query processing on OpenCL-based FPGAs.
Proceedings of the 26th International Conference on Field Programmable Logic and Applications, 2016

Accelerating Database Query Processing on OpenCL-based FPGAs (Abstract Only).
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

2015
A Combined SDC-SDF Architecture for Normal I/O Pipelined Radix-2 FFT.
IEEE Trans. Very Large Scale Integr. Syst., 2015

A study of data partitioning on OpenCL-based FPGAs.
Proceedings of the 25th International Conference on Field Programmable Logic and Applications, 2015

Improving Data Partitioning Performance on OpenCL-Based FPGAs.
Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2015

2014
High-speed, fixed-latency serial links with Xilinx FPGAs.
J. Zhejiang Univ. Sci. C, 2014

2013
Efficient Utilization of Vector Registers to Improve FFT Performance on SIMD Microprocessors.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2013

Block Processor: A resource-distributed architecture.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2013

2011
An efficient radix-2 fast Fourier transform processor with ganged butterfly engines on field programmable gate arrays.
J. Zhejiang Univ. Sci. C, 2011

A pipelined architecture for normal I/O order FFT.
J. Zhejiang Univ. Sci. C, 2011

1984
On the cost of computing roots of polynomials.
Math. Program., 1984


  Loading...