Yao Chen

Orcid: 0000-0002-5798-2282

Affiliations:
  • Advanced Digital Sciences Center, Illinois at Singapore, Singapore


According to our database1, Yao Chen authored at least 46 papers between 2014 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Winols: A Large-Tiling Sparse Winograd CNN Accelerator on FPGAs.
ACM Trans. Archit. Code Optim., June, 2024

Aggressive Post-Training Compression on Extremely Large Language Models.
CoRR, 2024

Deep Feature Surgery: Towards Accurate and Efficient Multi-exit Networks.
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
HongTu: Scalable Full-Graph GNN Training on Multiple GPUs.
Proc. ACM Manag. Data, December, 2023

NIOT: A Novel Inference Optimization of Transformers on Modern CPUs.
IEEE Trans. Parallel Distributed Syst., June, 2023

LightRW: FPGA Accelerated Graph Dynamic Random Walks.
Proc. ACM Manag. Data, 2023

Cybersecurity for Modern Smart Grid Against Emerging Threats.
Found. Trends Priv. Secur., 2023

HongTu: Scalable Full-Graph GNN Training on Multiple GPUs (via communication-optimized CPU data offloading).
CoRR, 2023

2022
ThunderGP: Resource-Efficient Graph Processing Framework on FPGAs with HLS.
ACM Trans. Reconfigurable Technol. Syst., 2022

HiKonv: Maximizing the Throughput of Quantized Convolution With Novel Bit-wise Management and Computation.
CoRR, 2022

Efficient Machine Learning, Compilers, and Optimizations for Embedded Systems.
CoRR, 2022

YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

ReGraph: Scaling Graph Processing on HBM-enabled FPGAs with Heterogeneous Pipelines.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

HiKonv: High Throughput Quantized Convolution With Novel Bit-wise Management and Computation.
Proceedings of the 27th Asia and South Pacific Design Automation Conference, 2022

2021
Learning-Based Simultaneous Detection and Characterization of Time Delay Attack in Cyber-Physical Systems.
IEEE Trans. Smart Grid, 2021

VecQ: Minimal Loss DNN Model Compression With Vectorized Weight Quantization.
IEEE Trans. Computers, 2021

Compressing Large-Scale Transformer-Based Models: A Case Study on BERT.
Trans. Assoc. Comput. Linguistics, 2021

Free Lunch for Co-Saliency Detection: Context Adjustment.
CoRR, 2021

3U-EdgeAI: Ultra-Low Memory Training, Ultra-Low BitwidthQuantization, and Ultra-Low Latency Acceleration.
CoRR, 2021

ThundeRiNG: generating multiple independent random number sequences on FPGAs.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

3U-EdgeAI: Ultra-Low Memory Training, Ultra-Low Bitwidth Quantization, and Ultra-Low Latency Acceleration.
Proceedings of the GLSVLSI '21: Great Lakes Symposium on VLSI 2021, 2021

ThunderGP: HLS-based Graph Processing Framework on FPGAs.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

Skew-Oblivious Data Routing for Data Intensive Applications on FPGAs with HLS.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

MELOPPR: Software/Hardware Co-design for Memory-efficient Low-latency Personalized PageRank.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs.
Proceedings of the 32nd IEEE International Conference on Application-specific Systems, 2021

2020
HaoCL: Harnessing Large-scale Heterogeneous Processors Made Easy.
Proceedings of the 40th IEEE International Conference on Distributed Computing Systems, 2020

Effective Algorithm-Accelerator Co-design for AI Solutions on Edge Devices.
Proceedings of the GLSVLSI '20: Great Lakes Symposium on VLSI 2020, 2020

EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Is FPGA Useful for Hash Joins?
Proceedings of the 10th Conference on Innovative Data Systems Research, 2020

TAG : Type Auxiliary Guiding for Code Comment Generation.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019
A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices.
CoRR, 2019

T-DLA: An Open-source Deep Learning Accelerator for Ternarized DNN Models on Embedded FPGA.
Proceedings of the 2019 IEEE Computer Society Annual Symposium on VLSI, 2019

Pico-Ampere Voltage References for IoT Systems.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2019

µL2Q: An Ultra-Low Loss Quantization Method for DNN Compression.
Proceedings of the International Joint Conference on Neural Networks, 2019

NAIS: Neural Architecture and Implementation Search and its Applications in Autonomous Driving.
Proceedings of the International Conference on Computer-Aided Design, 2019

On-The-Fly Parallel Data Shuffling for Graph Processing on OpenCL-Based FPGAs.
Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019

Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

2018
A locality-aware shuffle optimization on fat-tree data centers.
Future Gener. Comput. Syst., 2018

HaaS: Cloud-Based Real-Time Data Analytics with Heterogeneity-Aware Scheduling.
Proceedings of the 38th IEEE International Conference on Distributed Computing Systems, 2018

HASS: High Accuracy Spike Sorting with Wavelet Package Decomposition and Mutual Information.
Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2018

2016
FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow.
IEEE Trans. Very Large Scale Integr. Syst., 2016

FCUDA-HB: Hierarchical and Scalable Bus Architecture Generation on FPGAs With the FCUDA Flow.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2016

SoC, NoC and Hierarchical Bus Implementations of Applications on FPGAs Using the FCUDA Flow.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2016

High Level Synthesis of Complex Applications: An H.264 Video Decoder.
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

2015
System-level design solutions: Enabling the IoT explosion.
Proceedings of the 2015 IEEE 11th International Conference on ASIC, 2015

2014
Integrated CUDA-to-FPGA Synthesis with Network-on-Chip.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014


  Loading...