Chao Wang
Orcid: 0000-0002-9403-5575Affiliations:
- University of Science and Technology of China, Department of Computer Science, Hefei, China (PhD 2011)
According to our database1,
Chao Wang
authored at least 194 papers
between 2011 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on zbmath.org
-
on orcid.org
On csauthors.net:
Bibliography
2024
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2024
Unleashing Network/Accelerator Co-Exploration Potential on FPGAs: A Deeper Joint Search.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., October, 2024
Enhancing Graph Random Walk Acceleration via Efficient Dataflow and Hybrid Memory Architecture.
IEEE Trans. Computers, March, 2024
IEEE Trans. Parallel Distributed Syst., January, 2024
PriorNet: A Novel Lightweight Network with Multidimensional Interactive Attention for Efficient Image Dehazing.
CoRR, 2024
Two Methods With Bidirectional Similarity for Optimal Selections of Supplier Portfolio and Supplier Substitute Based on TOPSIS and IFS.
IEEE Access, 2024
R4D-planes: Remapping Planes For Novel View Synthesis and Self-Supervised Decoupling of Monocular Videos.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024
Enhancing Long Sequence Input Processing in FPGA-Based Transformer Accelerators through Attention Fusion.
Proceedings of the Great Lakes Symposium on VLSI 2024, 2024
Proceedings of the 32nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2024
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., December, 2023
Enabling Fast and Memory-Efficient Acceleration for Pattern Matching Workloads: The Lightweight Automata Processing Engine.
IEEE Trans. Computers, April, 2023
Proceedings of the Pattern Recognition and Computer Vision - 6th Chinese Conference, 2023
hAP: A Spatial-von Neumann Heterogeneous Automata Processor with Optimized Resource and IO Overhead on FPGA.
Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2023
Proceedings of the 31st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2023
Enabling Elastic Resource Management in Cloud FPGAs via A Multi-layer Collaborative Approach.
Proceedings of the 31st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2023
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023
FastRW: A Dataflow-Efficient and Memory-Aware Accelerator for Graph Random Walk on FPGAs.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023
Work-in-Progress: NAPMAE: Generalized Data-Efficient Neural Architecture Predictor with Masked Autoencoder.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2023
Sparse-HeteroCL: From Sparse Tensor Algebra to Highly Customized Accelerators on FPGAs.
Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, 2023
Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, 2023
2022
KDnet-RUL: A Knowledge Distillation Framework to Compress Deep Neural Networks for Machine Remaining Useful Life Prediction.
IEEE Trans. Ind. Electron., 2022
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022
Enabling One-Size-Fits-All Compilation Optimization for Inference Across Machine Learning Computers.
IEEE Trans. Computers, 2022
OctCNN: A High Throughput FPGA Accelerator for CNNs Using Octave Convolution Algorithm.
IEEE Trans. Computers, 2022
Conv-inheritance: A hardware-efficient method to compress convolutional neural networks for edge applications.
Neurocomputing, 2022
Contrastive adversarial knowledge distillation for deep model compression in time-series regression tasks.
Neurocomputing, 2022
Multi-clusters: An Efficient Design Paradigm of NN Accelerator Architecture Based on FPGA.
Proceedings of the Network and Parallel Computing, 2022
Proceedings of the Machine Learning for Cyber Security - 4th International Conference, 2022
Proceedings of the IEEE Intl Conf on Parallel & Distributed Processing with Applications, 2022
Proceedings of the International Conference on Machine Learning, 2022
SDMA: An Efficient and Flexible Sparse-Dense Matrix-Multiplication Architecture for GNNs.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022
Work-in-Progress: BloCirNN: An Efficient Software/hardware Codesign Approach for Neural Network Accelerators with Block-Circulant Matrix.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2022
Work-in-Progress: Scheduler for Collaborated FPGA-GPU-CPU Based on Intermediate Language.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2022
Work-in-Progress: HeteroRW: A Generalized and Efficient Framework for Random Walks in Graph Analysis.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2022
2021
IEEE Trans. Serv. Comput., 2021
LKSM: Light Weight Key-Value Store for Efficient Application Services on Local Distributed Mobile Devices.
IEEE Trans. Serv. Comput., 2021
Improving HW/SW Adaptability for Accelerating CNNs on FPGAs Through A Dynamic/Static Co-Reconfiguration Approach.
IEEE Trans. Parallel Distributed Syst., 2021
IEEE ACM Trans. Comput. Biol. Bioinform., 2021
Tinker: A Middleware for Deploying Multiple NN-Based Applications on a Single Machine.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021
IEEE Trans. Computers, 2021
Int. J. Softw. Informatics, 2021
Deployment and verification of machine learning tool-chain based on kubernetes distributed clusters.
CCF Trans. High Perform. Comput., 2021
FEAS: A Faster Event-driven Accelerator Supporting Inhibitory Spiking Neural Network.
Proceedings of the 12th International Symposium on Parallel Architectures, 2021
UH-JLS: A Parallel Ultra-High Throughput JPEG-LS Encoding Architecture for Lossless Image Compression.
Proceedings of the 39th IEEE International Conference on Computer Design, 2021
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021
2020
IEEE Trans. Parallel Distributed Syst., 2020
WinoNN: Optimizing FPGA-Based Convolutional Neural Network Accelerators Using Sparse Winograd Algorithm.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020
Addressing Irregularity in Sparse Neural Networks Through a Cooperative Software/Hardware Approach.
IEEE Trans. Computers, 2020
WooKong: A Ubiquitous Accelerator for Recommendation Algorithms With Custom Instruction Sets on FPGA.
IEEE Trans. Computers, 2020
Proceedings of the 22nd IEEE International Conference on High Performance Computing and Communications; 18th IEEE International Conference on Smart City; 6th IEEE International Conference on Data Science and Systems, 2020
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020
WiderFrame: An Automatic Customization Framework for Building CNN Accelerators on FPGAs: Work-in-Progress.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2020
OctCNN: An Energy-Efficient FPGA Accelerator for CNNs using Octave Convolution Algorithm.
Proceedings of the IEEE International Conference on Cluster Computing, 2020
2019
DCW: A Reactive and Predictable Programming Framework for LET-Based Distributed Real-Time Systems.
ACM Trans. Design Autom. Electr. Syst., 2019
WGAN-Based Synthetic Minority Over-Sampling Technique: Improving Semantic Fine-Grained Classification for Lung Nodules in CT Images.
IEEE Access, 2019
Proceedings of the International Conference on Field-Programmable Technology, 2019
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019
Drama: A high efficient neural network accelerator on FPGA using dynamic reconfiguration: work-in-progress.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis Companion, 2019
Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019
Higher-order Transfer Learning for Pulmonary Nodule Attribute Prediction in Chest CT Images.
Proceedings of the 2019 IEEE International Conference on Bioinformatics and Biomedicine, 2019
Proceedings of the Advanced Parallel Processing Technologies, 2019
2018
Performance Evaluation and Optimization of HBM-Enabled GPU for Data-Intensive Applications.
IEEE Trans. Very Large Scale Integr. Syst., 2018
MALOC: A Fully Pipelined FPGA Accelerator for Convolutional Neural Networks With All Layers Mapped on Chip.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018
Int. J. Parallel Program., 2018
SparseNN: A Performance-Efficient Accelerator for Large-Scale Sparse Neural Networks.
Int. J. Parallel Program., 2018
Chinese Language Processing Based on Stroke Representation and Multidimensional Representation.
IEEE Access, 2018
Cambricon-S: Addressing Irregularity in Sparse Neural Networks through A Cooperative Software/Hardware Approach.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018
Proceedings of the 2018 IEEE International Conference on Web Services, 2018
Proceedings of the Neural Information Processing - 25th International Conference, 2018
MuDBN: An Energy-Efficient and High-Performance Multi-FPGA Accelerator for Deep Belief Networks.
Proceedings of the 2018 on Great Lakes Symposium on VLSI, 2018
Domino: An Asynchronous and Energy-efficient Accelerator for Graph Processing: (Abstract Only).
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018
RTMUS<sup><i>RT</i></sup>: a real-time testbed for empirically comparing real-time multicore schedulers: work-in-progress.
Proceedings of the International Conference on Embedded Software, 2018
WinoNN: optimising FPGA-based neural network accelerators using fast winograd algorithm (work-in-progress).
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2018
Furion: alleviating overheads for deep learning framework on single machine (work-in-progress).
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2018
Multi-order Transfer Learning for Pathologic Diagnosis of Pulmonary Nodule Malignancy.
Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2018
2017
IEEE Trans. Parallel Distributed Syst., 2017
SuperMIC: Analyzing Large Biological Datasets in Bioinformatics with Maximal Information Coefficient.
IEEE ACM Trans. Comput. Biol. Bioinform., 2017
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2017
Hot spots profiling and dataflow analysis in custom dataflow computing SoftProcessors.
J. Syst. Softw., 2017
CoRR, 2017
Proceedings of the 2017 IEEE Real-Time Systems Symposium, 2017
Implementation and Optimization of the Accelerator Based on FPGA Hardware for LSTM Network.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017
Rethinking Energy-Efficiency of Heterogeneous Computing for CNN-Based Mobile Applications.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017
Exploiting Aperiodic Server to Improve Aperiodic Responsiveness for LET-Based Real-Time Systems.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017
Natural Language Processing Service Based on Stroke-Level Convolutional Networks for Chinese Text Classification.
Proceedings of the 2017 IEEE International Conference on Web Services, 2017
Proceedings of the 2017 IEEE International Conference on Web Services, 2017
Proceedings of the 2017 IEEE International Conference on Web Services, 2017
Proceedings of the 2017 IEEE International Conference on Web Services, 2017
Light Weight Key-Value Store for Efficient Services on Local Distributed Mobile Devices.
Proceedings of the 2017 IEEE International Conference on Web Services, 2017
Proceedings of the 19th IEEE International Conference on High Performance Computing and Communications; 15th IEEE International Conference on Smart City; 3rd IEEE International Conference on Data Science and Systems, 2017
FPGA Based Big Data Accelerator Design in Teaching Computer Architecture and Organization.
Proceedings of the Cyber Physical Systems. Design, Modeling, and Evaluation, 2017
A power-efficient and high performance FPGA accelerator for convolutional neural networks: work-in-progress.
Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion, 2017
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017
Mermaid: Integrating Vertex-Centric with Edge-Centric for Real-World Graph Processing.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017
TuNao: A High-Performance and Energy-Efficient Reconfigurable Accelerator for Graph Processing.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017
Proceedings of the 2017 International Conference on Compilers, 2017
Proceedings of the 2017 IEEE International Conference on Bioinformatics and Biomedicine, 2017
Proceedings of the 24th Asia-Pacific Software Engineering Conference, 2017
2016
Evaluation and Tradeoffs for Out-of-Order Execution on Reconfigurable Heterogeneous MPSoC.
IEEE Trans. Very Large Scale Integr. Syst., 2016
IEEE Trans. Parallel Distributed Syst., 2016
Guest Editorial for Special Section on Big Data Computing and Processing in Computational Biology and Bioinformatics.
IEEE ACM Trans. Comput. Biol. Bioinform., 2016
Int. J. Parallel Program., 2016
A Parallel Yet Pipelined Architecture for Efficient Implementation of the Advanced Encryption Standard Algorithm on Reconfigurable Hardware.
Int. J. Parallel Program., 2016
Parallel Implementations of the Cooperative Particle Swarm Optimization on Many-core and Multi-core Architectures.
Int. J. Parallel Program., 2016
Int. J. High Perform. Syst. Archit., 2016
CNNLab: a Novel Parallel Framework for Neural Networks using GPU and FPGA-a Practical Study with Trade-off Analysis.
CoRR, 2016
Appl. Soft Comput., 2016
Proceedings of the 2016 IEEE Trustcom/BigDataSE/ISPA, 2016
Brief Announcement: MIC++: Accelerating Maximal Information Coefficient Calculation with GPUs and FPGAs.
Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, 2016
Proceedings of the 24th IEEE International Symposium on Modeling, 2016
FairPlay: Services Migration with Lock-Free Mechanisms for Load Balancing in Cloud Architectures.
Proceedings of the IEEE International Conference on Web Services, 2016
Proceedings of the IEEE International Conference on Web Services, 2016
PIE: A Pipeline Energy-Efficient Accelerator for Inference Process in Deep Neural Networks.
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016
Proceedings of the 26th edition on Great Lakes Symposium on VLSI, 2016
Proceedings of the 27th IEEE International Conference on Application-specific Systems, 2016
2015
FreeRider: Non-Local Adaptive Network-on-Chip Routing with Packet-Carried Propagation of Congestion Information.
IEEE Trans. Parallel Distributed Syst., 2015
IEEE ACM Trans. Comput. Biol. Bioinform., 2015
IEEE Trans. Computers, 2015
J. Parallel Distributed Comput., 2015
J. Comput. Sci. Technol., 2015
Int. J. High Perform. Syst. Archit., 2015
Int. J. Comput. Sci. Eng., 2015
SAKMA: Specialized FPGA-Based Accelerator Architecture for Data-Intensive K-Means Algorithms.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2015
RapidPath: Accelerating Constrained Shortest Path Finding in Graphs on FPGA (Abstract Only).
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015
An FPGA-Based Accelerator for Neighborhood-Based Collaborative Filtering Recommendation Algorithms.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015
2014
IEEE ACM Trans. Comput. Biol. Bioinform., 2014
Colored Petri Net model with automatic parallelization on real-time multicore architectures.
J. Syst. Archit., 2014
Amdahl's and Hill-Marty laws revisited for FPGA-based MPSoCs: from theory to practice.
Int. J. High Perform. Syst. Archit., 2014
Memory power optimisation on low-bit multi-access cross memory address mapping schema.
Int. J. Embed. Syst., 2014
Proceedings of the 2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications, 2014
Proceedings of the 2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications, 2014
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2014
Proceedings of the 2014 International Symposium on Integrated Circuits (ISIC), 2014
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014
Co-processing with dynamic reconfiguration on heterogeneous MPSoC: practices and design tradeoffs (abstract only).
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014
Proceedings of the Reconfigurable Computing: Architectures, Tools, and Applications, 2014
2013
MP-Tomasulo: A Dependency-Aware Automatic Parallel Execution Engine for Sequential Programs.
ACM Trans. Archit. Code Optim., 2013
Heterothread: hybrid thread level parallelism on heterogeneous multicore architectures.
SIGBED Rev., 2013
Int. J. High Perform. Syst. Archit., 2013
Proceedings of the 12th IEEE International Conference on Trust, 2013
Proceedings of the 12th IEEE International Conference on Trust, 2013
Proceedings of the 12th IEEE International Conference on Trust, 2013
Coordinate page allocation and thread group for improving main memory power efficiency.
Proceedings of the Workshop on Power-Aware Computing and Systems, 2013
SOBA: A Services-Oriented Browser Architecture with Distributed URL-Filtering Mechanisms for Teenagers.
Proceedings of the IEEE Ninth World Congress on Services, 2013
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013
Proceedings of the Algorithms and Architectures for Parallel Processing, 2013
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013
A FPGA-Based High Performance Acceleration Platform for the Next Generation Long Read Mapping.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013
Hardware acceleration for the banded Smith-Waterman algorithm with the cycled systolic array.
Proceedings of the 2013 International Conference on Field-Programmable Technology, 2013
Genome sequencing using mapreduce on FPGA with multiple hardware accelerators (abstract only).
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013
Custom instruction generation and mapping for reconfigurable instruction set processors (abstract only).
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013
Proceedings of the Reconfigurable Computing: Architectures, Tools and Applications, 2013
Proceedings of the 2013 IEEE International Conference on Services Computing, Santa Clara, CA, USA, June 28, 2013
2012
J. Supercomput., 2012
Proceedings of the 20th IEEE International Symposium on Modeling, 2012
Proceedings of the 20th IEEE International Symposium on Modeling, 2012
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012
Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012
A Dependency Aware Task Partitioning and Scheduling Algorithm for Hardware-Software Codesign on MPSoCs.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2012
Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012
Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012
Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL), 2012
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012
Memory Affinity: Balancing Performance, Power, Thermal and Fairness for Multi-core Systems.
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012
Proceedings of the 2012 IEEE International Conference on Cluster Computing Workshops, 2012
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012
Proceedings of the 2012 IEEE International Conference on Cluster Computing Workshops, 2012
Proceedings of the Reconfigurable Computing: Architectures, Tools and Applications, 2012
Proceedings of the 2012 IEEE Ninth International Conference on Services Computing, 2012
2011
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2011
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2011
Proceedings of the IEEE International Conference on Services Computing, 2011