Chao Wang

Orcid: 0000-0002-9403-5575

Affiliations:
  • University of Science and Technology of China, Department of Computer Science, Hefei, China (PhD 2011)


According to our database1, Chao Wang authored at least 194 papers between 2011 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
FlexBCM: Hybrid Block-Circulant Neural Network and Accelerator Co-Search on FPGAs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2024

Unleashing Network/Accelerator Co-Exploration Potential on FPGAs: A Deeper Joint Search.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., October, 2024

Enhancing Graph Random Walk Acceleration via Efficient Dataflow and Hybrid Memory Architecture.
IEEE Trans. Computers, March, 2024

Flexible and Efficient Memory Swapping Across Mobile Devices With LegoSwap.
IEEE Trans. Parallel Distributed Syst., January, 2024

PriorNet: A Novel Lightweight Network with Multidimensional Interactive Attention for Efficient Image Dehazing.
CoRR, 2024

Two Methods With Bidirectional Similarity for Optimal Selections of Supplier Portfolio and Supplier Substitute Based on TOPSIS and IFS.
IEEE Access, 2024

R4D-planes: Remapping Planes For Novel View Synthesis and Self-Supervised Decoupling of Monocular Videos.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Local Deep Learning Quantization for Approximate Nearest Neighbor Search.
Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

Prompt Learning with Extended Kalman Filter for Pre-trained Language Models.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Enhancing Long Sequence Input Processing in FPGA-Based Transformer Accelerators through Attention Fusion.
Proceedings of the Great Lakes Symposium on VLSI 2024, 2024

Ph.D. Project: Achieving Low-Latency Acceleration on Multi-FPGA for GPT Application.
Proceedings of the 32nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2024

Emergent Communication for Numerical Concepts Generalization.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Algorithm/Hardware Co-Optimization for Sparsity-Aware SpMM Acceleration of GNNs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., December, 2023

Enabling Fast and Memory-Efficient Acceleration for Pattern Matching Workloads: The Lightweight Automata Processing Engine.
IEEE Trans. Computers, April, 2023

NeuralMAE: Data-Efficient Neural Architecture Predictor with Masked Autoencoder.
Proceedings of the Pattern Recognition and Computer Vision - 6th Chinese Conference, 2023

hAP: A Spatial-von Neumann Heterogeneous Automata Processor with Optimized Resource and IO Overhead on FPGA.
Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2023

DataMaster: A GNN-based Data Type Optimizer for Dataflow Design in FPGA.
Proceedings of the 31st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2023

Enabling Elastic Resource Management in Cloud FPGAs via A Multi-layer Collaborative Approach.
Proceedings of the 31st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2023

NAF: Deeper Network/Accelerator Co-Exploration for Customizing CNNs on FPGA.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

FastRW: A Dataflow-Efficient and Memory-Aware Accelerator for Graph Random Walk on FPGAs.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

Work-in-Progress: NAPMAE: Generalized Data-Efficient Neural Architecture Predictor with Masked Autoencoder.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2023

Sparse-HeteroCL: From Sparse Tensor Algebra to Highly Customized Accelerators on FPGAs.
Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, 2023

A flexible dataflow CNN accelerator on FPGA.
Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, 2023

2022
KDnet-RUL: A Knowledge Distillation Framework to Compress Deep Neural Networks for Machine Remaining Useful Life Prediction.
IEEE Trans. Ind. Electron., 2022

ViA: A Novel Vision-Transformer Accelerator Based on FPGA.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

Enabling One-Size-Fits-All Compilation Optimization for Inference Across Machine Learning Computers.
IEEE Trans. Computers, 2022

OctCNN: A High Throughput FPGA Accelerator for CNNs Using Octave Convolution Algorithm.
IEEE Trans. Computers, 2022

Conv-inheritance: A hardware-efficient method to compress convolutional neural networks for edge applications.
Neurocomputing, 2022

Contrastive adversarial knowledge distillation for deep model compression in time-series regression tasks.
Neurocomputing, 2022

Multi-clusters: An Efficient Design Paradigm of NN Accelerator Architecture Based on FPGA.
Proceedings of the Network and Parallel Computing, 2022

A Event Extraction Method of Document-Level Based on the Self-attention Mechanism.
Proceedings of the Machine Learning for Cyber Security - 4th International Conference, 2022

WGeod: A General and Efficient FPGA Accelerator for Object Detection.
Proceedings of the IEEE Intl Conf on Parallel & Distributed Processing with Applications, 2022

BabelTower: Learning to Auto-parallelized Program Translation.
Proceedings of the International Conference on Machine Learning, 2022

SDMA: An Efficient and Flexible Sparse-Dense Matrix-Multiplication Architecture for GNNs.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

Work-in-Progress: BloCirNN: An Efficient Software/hardware Codesign Approach for Neural Network Accelerators with Block-Circulant Matrix.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2022

Work-in-Progress: Scheduler for Collaborated FPGA-GPU-CPU Based on Intermediate Language.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2022

Work-in-Progress: HeteroRW: A Generalized and Efficient Framework for Random Walks in Graph Analysis.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2022

2021
SOLAR: Services-Oriented Deep Learning Architectures-Deep Learning as a Service.
IEEE Trans. Serv. Comput., 2021

LKSM: Light Weight Key-Value Store for Efficient Application Services on Local Distributed Mobile Devices.
IEEE Trans. Serv. Comput., 2021

Improving HW/SW Adaptability for Accelerating CNNs on FPGAs Through A Dynamic/Static Co-Reconfiguration Approach.
IEEE Trans. Parallel Distributed Syst., 2021

GenSeq+: A Scalable High-Performance Accelerator for Genome Sequencing.
IEEE ACM Trans. Comput. Biol. Bioinform., 2021

Tinker: A Middleware for Deploying Multiple NN-Based Applications on a Single Machine.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

An FPGA Based Accelerator for Clustering Algorithms With Custom Instructions.
IEEE Trans. Computers, 2021

Neural Network Instruction Set Extension and Code Mapping Mechanism.
Int. J. Softw. Informatics, 2021

Deployment and verification of machine learning tool-chain based on kubernetes distributed clusters.
CCF Trans. High Perform. Comput., 2021

FEAS: A Faster Event-driven Accelerator Supporting Inhibitory Spiking Neural Network.
Proceedings of the 12th International Symposium on Parallel Architectures, 2021

UH-JLS: A Parallel Ultra-High Throughput JPEG-LS Encoding Architecture for Lossless Image Compression.
Proceedings of the 39th IEEE International Conference on Computer Design, 2021

LAP: A Lightweight Automata Processor for Pattern Matching Tasks.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

2020
The Max-Re-LOGMPA for SCMA Demodulation.
Wirel. Pers. Commun., 2020

A Ubiquitous Machine Learning Accelerator With Automatic Parallelization on FPGA.
IEEE Trans. Parallel Distributed Syst., 2020

WinoNN: Optimizing FPGA-Based Convolutional Neural Network Accelerators Using Sparse Winograd Algorithm.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Addressing Irregularity in Sparse Neural Networks Through a Cooperative Software/Hardware Approach.
IEEE Trans. Computers, 2020

WooKong: A Ubiquitous Accelerator for Recommendation Algorithms With Custom Instruction Sets on FPGA.
IEEE Trans. Computers, 2020

Chameleon: Image Style Transfer Based on Image Classification Networks.
Proceedings of the 22nd IEEE International Conference on High Performance Computing and Communications; 18th IEEE International Conference on Smart City; 6th IEEE International Conference on Data Science and Systems, 2020

ConvCloud: An Adaptive Convolutional Neural Network Accelerator on Cloud FPGAs.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

WiderFrame: An Automatic Customization Framework for Building CNN Accelerators on FPGAs: Work-in-Progress.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2020

OctCNN: An Energy-Efficient FPGA Accelerator for CNNs using Octave Convolution Algorithm.
Proceedings of the IEEE International Conference on Cluster Computing, 2020

2019
DCW: A Reactive and Predictable Programming Framework for LET-Based Distributed Real-Time Systems.
ACM Trans. Design Autom. Electr. Syst., 2019

WGAN-Based Synthetic Minority Over-Sampling Technique: Improving Semantic Fine-Grained Classification for Lung Nodules in CT Images.
IEEE Access, 2019

FPNet: Customized Convolutional Neural Network for FPGA Platforms.
Proceedings of the International Conference on Field-Programmable Technology, 2019

An Overview of FPGA Based Deep Learning Accelerators: Challenges and Opportunities.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

GRU-ES: Resource Usage Prediction of Cloud Workloads Using a Novel Hybrid Method.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

Drama: A high efficient neural network accelerator on FPGA using dynamic reconfiguration: work-in-progress.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis Companion, 2019

Design Exploration of Multi-FPGAs for Accelerating Deep Learning.
Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

Higher-order Transfer Learning for Pulmonary Nodule Attribute Prediction in Chest CT Images.
Proceedings of the 2019 IEEE International Conference on Bioinformatics and Biomedicine, 2019

RV-CNN: Flexible and Efficient Instruction Set for CNNs Based on RISC-V Processors.
Proceedings of the Advanced Parallel Processing Technologies, 2019

2018
Performance Evaluation and Optimization of HBM-Enabled GPU for Data-Intensive Applications.
IEEE Trans. Very Large Scale Integr. Syst., 2018

MALOC: A Fully Pipelined FPGA Accelerator for Convolutional Neural Networks With All Layers Mapped on Chip.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

UniCNN: A Pipelined Accelerator Towards Uniformed Computing for CNNs.
Int. J. Parallel Program., 2018

SparseNN: A Performance-Efficient Accelerator for Large-Scale Sparse Neural Networks.
Int. J. Parallel Program., 2018

Chinese Language Processing Based on Stroke Representation and Multidimensional Representation.
IEEE Access, 2018

Cambricon-S: Addressing Irregularity in Sparse Neural Networks through A Cooperative Software/Hardware Approach.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Domino: Graph Processing Services on Energy-Efficient Hardware Accelerator.
Proceedings of the 2018 IEEE International Conference on Web Services, 2018

Low-Shot Multi-label Incremental Learning for Thoracic Diseases Diagnosis.
Proceedings of the Neural Information Processing - 25th International Conference, 2018

MuDBN: An Energy-Efficient and High-Performance Multi-FPGA Accelerator for Deep Belief Networks.
Proceedings of the 2018 on Great Lakes Symposium on VLSI, 2018

Domino: An Asynchronous and Energy-efficient Accelerator for Graph Processing: (Abstract Only).
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

RTMUS<sup><i>RT</i></sup>: a real-time testbed for empirically comparing real-time multicore schedulers: work-in-progress.
Proceedings of the International Conference on Embedded Software, 2018

WinoNN: optimising FPGA-based neural network accelerators using fast winograd algorithm (work-in-progress).
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2018

Furion: alleviating overheads for deep learning framework on single machine (work-in-progress).
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2018

Multi-order Transfer Learning for Pathologic Diagnosis of Pulmonary Nodule Malignancy.
Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2018

2017
A Classroom Scheduling Service for Smart Classes.
IEEE Trans. Serv. Comput., 2017

Service-Oriented Architecture on FPGA-Based MPSoC.
IEEE Trans. Parallel Distributed Syst., 2017

SuperMIC: Analyzing Large Biological Datasets in Bioinformatics with Maximal Information Coefficient.
IEEE ACM Trans. Comput. Biol. Bioinform., 2017

DLAU: A Scalable Deep Learning Accelerator Unit on FPGA.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2017

Hot spots profiling and dataflow analysis in custom dataflow computing SoftProcessors.
J. Syst. Softw., 2017

New trends for pattern recognition: Theory and applications.
Neurocomputing, 2017

Reconfigurable Hardware Accelerators: Opportunities, Trends, and Challenges.
CoRR, 2017

Editorial Soft Computing Applied to Swarm Robotics.
Appl. Soft Comput., 2017

Work-in-Progress: TTI: A Timing ISA for LET Model in Safety-Critical Systems.
Proceedings of the 2017 IEEE Real-Time Systems Symposium, 2017

Implementation and Optimization of the Accelerator Based on FPGA Hardware for LSTM Network.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

Rethinking Energy-Efficiency of Heterogeneous Computing for CNN-Based Mobile Applications.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

Building a Game Benchmark for Cooperative CPU-GPU with Pseudo User-Interaction.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

A Predictable Servant-Based Execution Model for Safety-Critical Systems.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

Exploiting Aperiodic Server to Improve Aperiodic Responsiveness for LET-Based Real-Time Systems.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

A High-Performance Accelerator for Large-Scale Convolutional Neural Networks.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

Tickwerk: Design of a LET-Based SoC for Temporal Programming.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

Natural Language Processing Service Based on Stroke-Level Convolutional Networks for Chinese Text Classification.
Proceedings of the 2017 IEEE International Conference on Web Services, 2017

Evaluation and Trade-offs of Graph Processing for Cloud Services.
Proceedings of the 2017 IEEE International Conference on Web Services, 2017

xFilter: A Temporal Locality Accelerator for Intrusion Detection System Services.
Proceedings of the 2017 IEEE International Conference on Web Services, 2017

GenServ: Genome Sequencing Services on Scalable Energy Efficient Accelerators.
Proceedings of the 2017 IEEE International Conference on Web Services, 2017

Light Weight Key-Value Store for Efficient Services on Local Distributed Mobile Devices.
Proceedings of the 2017 IEEE International Conference on Web Services, 2017

A Time-Aware Programming Framework for Constructing Predictable Real-Time Systems.
Proceedings of the 19th IEEE International Conference on High Performance Computing and Communications; 15th IEEE International Conference on Smart City; 3rd IEEE International Conference on Data Science and Systems, 2017

FPGA Based Big Data Accelerator Design in Teaching Computer Architecture and Organization.
Proceedings of the Cyber Physical Systems. Design, Modeling, and Evaluation, 2017

A power-efficient and high performance FPGA accelerator for convolutional neural networks: work-in-progress.
Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion, 2017

A Power-Efficient Accelerator Based on FPGAs for LSTM Network.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

OmniGraph: A Scalable Hardware Accelerator for Graph Processing.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

A Power-Efficient Accelerator for Convolutional Neural Networks.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

Mermaid: Integrating Vertex-Centric with Edge-Centric for Real-World Graph Processing.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

TuNao: A High-Performance and Energy-Efficient Reconfigurable Accelerator for Graph Processing.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

A high-performance FPGA accelerator for sparse neural networks: work-in-progress.
Proceedings of the 2017 International Conference on Compilers, 2017

Distributed gene clinical decision support system based on cloud computing.
Proceedings of the 2017 IEEE International Conference on Bioinformatics and Biomedicine, 2017

Clockwerk: A Predictable and Efficient Extension of Logical Execution Time Model.
Proceedings of the 24th Asia-Pacific Software Engineering Conference, 2017

2016
Evaluation and Tradeoffs for Out-of-Order Execution on Reconfigurable Heterogeneous MPSoC.
IEEE Trans. Very Large Scale Integr. Syst., 2016

Hardware Implementation on FPGA for Task-Level Parallel Dataflow Execution Engine.
IEEE Trans. Parallel Distributed Syst., 2016

Guest Editorial for Special Section on Big Data Computing and Processing in Computational Biology and Bioinformatics.
IEEE ACM Trans. Comput. Biol. Bioinform., 2016

Definitions of predictability for Cyber Physical Systems.
J. Syst. Archit., 2016

Preface to the Special Issue on Sequential Code Parallelization.
Int. J. Parallel Program., 2016

A Parallel Yet Pipelined Architecture for Efficient Implementation of the Advanced Encryption Standard Algorithm on Reconfigurable Hardware.
Int. J. Parallel Program., 2016

Parallel Implementations of the Cooperative Particle Swarm Optimization on Many-core and Multi-core Architectures.
Int. J. Parallel Program., 2016

KUMMS: optimising DRAM locality with Kernel-user behaviours.
Int. J. High Perform. Syst. Archit., 2016

CNNLab: a Novel Parallel Framework for Neural Networks using GPU and FPGA-a Practical Study with Trade-off Analysis.
CoRR, 2016

Soft computing in big data intelligent transportation systems.
Appl. Soft Comput., 2016

SCADIS: A Scalable Accelerator for Data-Intensive String Set Matching on FPGAs.
Proceedings of the 2016 IEEE Trustcom/BigDataSE/ISPA, 2016

Brief Announcement: MIC++: Accelerating Maximal Information Coefficient Calculation with GPUs and FPGAs.
Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, 2016

Behavior-Aware Integrated CPU-GPU Power Management for Mobile Games.
Proceedings of the 24th IEEE International Symposium on Modeling, 2016

FairPlay: Services Migration with Lock-Free Mechanisms for Load Balancing in Cloud Architectures.
Proceedings of the IEEE International Conference on Web Services, 2016

SOLAR: Services-Oriented Learning Architectures.
Proceedings of the IEEE International Conference on Web Services, 2016

PIE: A Pipeline Energy-Efficient Accelerator for Inference Process in Deep Neural Networks.
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016

FCM: Towards Fine-Grained GPU Power Management for Closed Source Mobile Games.
Proceedings of the 26th edition on Great Lakes Symposium on VLSI, 2016

Display power reduction for mobile closed-source games.
Proceedings of the 27th IEEE International Conference on Application-specific Systems, 2016

2015
FreeRider: Non-Local Adaptive Network-on-Chip Routing with Packet-Carried Propagation of Congestion Information.
IEEE Trans. Parallel Distributed Syst., 2015

Heterogeneous Cloud Framework for Big Data Genome Sequencing.
IEEE ACM Trans. Comput. Biol. Bioinform., 2015

Architecture Support for Task Out-of-Order Execution in MPSoCs.
IEEE Trans. Computers, 2015

A case study of parallel JPEG encoding on an FPGA.
J. Parallel Distributed Comput., 2015

CRAIS: A Crossbar-Based Interconnection Scheme on FPGA for Big Data.
J. Comput. Sci. Technol., 2015

XEMU: a cross-ISA full-system emulator on multiple processor architectures.
Int. J. High Perform. Syst. Archit., 2015

Fast approximate hash table using extended counting Bloom filter.
Int. J. Comput. Sci. Eng., 2015

SAKMA: Specialized FPGA-Based Accelerator Architecture for Data-Intensive K-Means Algorithms.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2015

RapidPath: Accelerating Constrained Shortest Path Finding in Graphs on FPGA (Abstract Only).
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

SODA: software defined FPGA based accelerators for big data.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

An FPGA-Based Accelerator for Neighborhood-Based Collaborative Filtering Recommendation Algorithms.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

A Deep Learning Prediction Process Accelerator Based FPGA.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

2014
Accelerating the Next Generation Long Read Mapping with the FPGA-Based System.
IEEE ACM Trans. Comput. Biol. Bioinform., 2014

Colored Petri Net model with automatic parallelization on real-time multicore architectures.
J. Syst. Archit., 2014

Amdahl's and Hill-Marty laws revisited for FPGA-based MPSoCs: from theory to practice.
Int. J. High Perform. Syst. Archit., 2014

Memory power optimisation on low-bit multi-access cross memory address mapping schema.
Int. J. Embed. Syst., 2014

Memory power optimization on different memory address mapping schemas.
Proceedings of the 2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications, 2014

Multi-objective aware design flow for coarse-grained systems on chip.
Proceedings of the 2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications, 2014

Kernel-User Space Separation in DRAM Memory.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2014

Trade-offs between the sensitivity and the speed of the FPGA-based sequence aligner.
Proceedings of the 2014 International Symposium on Integrated Circuits (ISIC), 2014

Big data genome sequencing on Zynq based clusters (abstract only).
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

Co-processing with dynamic reconfiguration on heterogeneous MPSoC: practices and design tradeoffs (abstract only).
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

Instruction Extension and Generation for Adaptive Processors.
Proceedings of the Reconfigurable Computing: Architectures, Tools, and Applications, 2014

2013
MP-Tomasulo: A Dependency-Aware Automatic Parallel Execution Engine for Sequential Programs.
ACM Trans. Archit. Code Optim., 2013

Heterothread: hybrid thread level parallelism on heterogeneous multicore architectures.
SIGBED Rev., 2013

Services-oriented URL filtering and verification.
Int. J. High Perform. Syst. Archit., 2013

Static or Dynamic: Trade-Offs for Task Dependency Analysis for Heterogeneous MPSoC.
Proceedings of the 12th IEEE International Conference on Trust, 2013

Automatic Loop-Based Pipeline Optimization on Reconfigurable Platform.
Proceedings of the 12th IEEE International Conference on Trust, 2013

Detecting Associations in Large Dataset on MapReduce.
Proceedings of the 12th IEEE International Conference on Trust, 2013

Coordinate page allocation and thread group for improving main memory power efficiency.
Proceedings of the Workshop on Power-Aware Computing and Systems, 2013

SOBA: A Services-Oriented Browser Architecture with Distributed URL-Filtering Mechanisms for Teenagers.
Proceedings of the IEEE Ninth World Congress on Services, 2013

FPGA implementation of a scheduler supporting parallel dataflow execution.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

Coordinate Task and Memory Management for Improving Power Efficiency.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2013

SmartMal: A Service-Oriented Behavioral Malware Detection Framework for Smartphones.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

An Intelligent Transportation System Using RFID Based Sensors.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

Group Scheduling for Improving Both CPU and Memory Power Efficiency Simultaneously.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

A FPGA-Based High Performance Acceleration Platform for the Next Generation Long Read Mapping.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

Hardware acceleration for the banded Smith-Waterman algorithm with the cycled systolic array.
Proceedings of the 2013 International Conference on Field-Programmable Technology, 2013

Genome sequencing using mapreduce on FPGA with multiple hardware accelerators (abstract only).
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

Custom instruction generation and mapping for reconfigurable instruction set processors (abstract only).
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

Acceleration of the long read mapping on a PC-FPGA architecture (abstract only).
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

Pipeline Optimization for Loops on Reconfigurable Platform.
Proceedings of the Reconfigurable Computing: Architectures, Tools and Applications, 2013

SmartClass: A Services-Oriented Approach for University Resource Scheduling.
Proceedings of the 2013 IEEE International Conference on Services Computing, Santa Clara, CA, USA, June 28, 2013

2012
A star network approach in heterogeneous multiprocessors system on chip.
J. Supercomput., 2012

Analyzing Parallelization and Program Performance in Heterogeneous MPSoCs.
Proceedings of the 20th IEEE International Symposium on Modeling, 2012

Frequency Affinity: Analyzing and Maximizing Power Efficiency in Multi-core Systems.
Proceedings of the 20th IEEE International Symposium on Modeling, 2012

FPM: A Flexible Programming Model for MPSoC on FPGA.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Detecting Data Hazards in Multi-Processor System-on-Chips on FPGA.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Behavior Aware Data Locality for Caches.
Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012

A Dependency Aware Task Partitioning and Scheduling Algorithm for Hardware-Software Codesign on MPSoCs.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2012

A task-level OoO framework for heterogeneous systems.
Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012

Parallel dataflow execution for sequential programs on reconfigurable hybrid MPSoCs.
Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012

CaaS: Core as a service realizing hardware sercices on reconfigurable MPSoCS.
Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL), 2012

Phase Detection for Loop-Based Programs on Multicore Architectures.
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

Memory Affinity: Balancing Performance, Power, Thermal and Fairness for Multi-core Systems.
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

Cache Promotion Policy Using Re-reference Interval Prediction.
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

DTS: Using Dynamic Time-Slice Scaling to Address the OS Problem Incurred by DVFS.
Proceedings of the 2012 IEEE International Conference on Cluster Computing Workshops, 2012

Cloud Based Short Read Mapping Service.
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

Sedna: A Memory Based Key-Value Storage System for Realtime Processing in Cloud.
Proceedings of the 2012 IEEE International Conference on Cluster Computing Workshops, 2012

CRAIS: A Crossbar Based Adaptive Interconnection Scheme.
Proceedings of the Reconfigurable Computing: Architectures, Tools and Applications, 2012

Regarding Processors and Reconfigurable IP Cores as Services.
Proceedings of the 2012 IEEE Ninth International Conference on Services Computing, 2012

2011
Tool Chain Support with Dynamic Profiling for RISP.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2011

A Flexible High Speed Star Network Based on Peer to Peer Links on FPGA.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2011

SOMP: Service-Oriented Multi Processors.
Proceedings of the IEEE International Conference on Services Computing, 2011


  Loading...