Xuehai Zhou

Orcid: 0000-0002-8360-3143

According to our database1, Xuehai Zhou authored at least 248 papers between 1999 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Graph-Transformer with spatial-spectral features fusion for hyperspectral image classification.
Expert Syst. Appl., 2025

2024
FlexBCM: Hybrid Block-Circulant Neural Network and Accelerator Co-Search on FPGAs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2024

Arch2End: Two-Stage Unified System-Level Modeling for Heterogeneous Intelligent Devices.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2024

NebulaFL: Self-Organizing Efficient Multilayer Federated Learning Framework With Adaptive Load Tuning in Heterogeneous Edge Systems.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2024

Unleashing Network/Accelerator Co-Exploration Potential on FPGAs: A Deeper Joint Search.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., October, 2024

FedStar: Efficient Federated Learning on Heterogeneous Communication Networks.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., June, 2024

GOFL: An Accurate and Efficient Federated Learning Framework Based on Gradient Optimization in Heterogeneous IoT Systems.
IEEE Internet Things J., April, 2024

Enhancing Graph Random Walk Acceleration via Efficient Dataflow and Hybrid Memory Architecture.
IEEE Trans. Computers, March, 2024

Ace-Sniper: Cloud-Edge Collaborative Scheduling Framework With DNN Inference Latency Modeling on Heterogeneous Devices.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., February, 2024

Flexible and Efficient Memory Swapping Across Mobile Devices With LegoSwap.
IEEE Trans. Parallel Distributed Syst., January, 2024

Heter-Train: A Distributed Training Framework Based on Semi-Asynchronous Parallel Mechanism for Heterogeneous Intelligent Transportation Systems.
IEEE Trans. Intell. Transp. Syst., January, 2024

Advancing tracking-by-detection with MultiMap: Towards occlusion-resilient online multiclass strawberry counting.
Expert Syst. Appl., 2024

MFNAS: Multi-fidelity Exploration in Neural Architecture Search with Stable Zero-Shot Proxy.
Proceedings of the PRICAI 2024: Trends in Artificial Intelligence, 2024

Enhancing Long Sequence Input Processing in FPGA-Based Transformer Accelerators through Attention Fusion.
Proceedings of the Great Lakes Symposium on VLSI 2024, 2024

FlexWalker: An Efficient Multi-Objective Design Space Exploration Framework for HLS Design.
Proceedings of the 34th International Conference on Field-Programmable Logic and Applications, 2024

LORA: A Latency-Oriented Recurrent Architecture for GPT Model on Multi-FPGA Platform with Communication Optimization.
Proceedings of the 34th International Conference on Field-Programmable Logic and Applications, 2024

SoGraph: A State-Aware Architecture for Out-of-Memory Graph Processing on HBM-Equipped FPGAs.
Proceedings of the 34th International Conference on Field-Programmable Logic and Applications, 2024

PowerLens: An Adaptive DVFS Framework for Optimizing Energy Efficiency in Deep Neural Networks.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

Beyond Training: A Zero-Shot Framework to Neural Architecture and Accelerator Co-Exploration.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

Emergent Communication for Numerical Concepts Generalization.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Algorithm/Hardware Co-Optimization for Sparsity-Aware SpMM Acceleration of GNNs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., December, 2023

Enabling Fast and Memory-Efficient Acceleration for Pattern Matching Workloads: The Lightweight Automata Processing Engine.
IEEE Trans. Computers, April, 2023

Distributed and deep vertical federated learning with big data.
Concurr. Comput. Pract. Exp., 2023

OrthoDETR: A Streamlined Transformer-Based Approach for Precision Detection of Orthopedic Medical Devices.
Algorithms, 2023

NeuralMAE: Data-Efficient Neural Architecture Predictor with Masked Autoencoder.
Proceedings of the Pattern Recognition and Computer Vision - 6th Chinese Conference, 2023

Emergent Communication for Rules Reasoning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

FlashDAM: Flexible I/O Throttling for the User Experience of Mobile Systems.
Proceedings of the 41st IEEE International Conference on Computer Design, 2023

hAP: A Spatial-von Neumann Heterogeneous Automata Processor with Optimized Resource and IO Overhead on FPGA.
Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2023

DataMaster: A GNN-based Data Type Optimizer for Dataflow Design in FPGA.
Proceedings of the 31st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2023

Enabling Elastic Resource Management in Cloud FPGAs via A Multi-layer Collaborative Approach.
Proceedings of the 31st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2023

NAF: Deeper Network/Accelerator Co-Exploration for Customizing CNNs on FPGA.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

FastRW: A Dataflow-Efficient and Memory-Aware Accelerator for Graph Random Walk on FPGAs.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

Work-in-Progress: NAPMAE: Generalized Data-Efficient Neural Architecture Predictor with Masked Autoencoder.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2023

Sparse-HeteroCL: From Sparse Tensor Algebra to Highly Customized Accelerators on FPGAs.
Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, 2023

A flexible dataflow CNN accelerator on FPGA.
Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, 2023

2022
ViA: A Novel Vision-Transformer Accelerator Based on FPGA.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

Enabling One-Size-Fits-All Compilation Optimization for Inference Across Machine Learning Computers.
IEEE Trans. Computers, 2022

OctCNN: A High Throughput FPGA Accelerator for CNNs Using Octave Convolution Algorithm.
IEEE Trans. Computers, 2022

Conv-inheritance: A hardware-efficient method to compress convolutional neural networks for edge applications.
Neurocomputing, 2022

Heterogeneous computing on mobile GPU-FPGA cooperation platform.
Int. J. High Perform. Syst. Archit., 2022

BabelTower: Learning to Auto-parallelized Program Translation.
Proceedings of the International Conference on Machine Learning, 2022

FedNorm: An Efficient Federated Learning Framework with Dual Heterogeneity Coexistence on Edge Intelligence Systems.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022

SDMA: An Efficient and Flexible Sparse-Dense Matrix-Multiplication Architecture for GNNs.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

Work-in-Progress: Scheduler for Collaborated FPGA-GPU-CPU Based on Intermediate Language.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2022

Work-in-Progress: HeteroRW: A Generalized and Efficient Framework for Random Walks in Graph Analysis.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2022

2021
SOLAR: Services-Oriented Deep Learning Architectures-Deep Learning as a Service.
IEEE Trans. Serv. Comput., 2021

LKSM: Light Weight Key-Value Store for Efficient Application Services on Local Distributed Mobile Devices.
IEEE Trans. Serv. Comput., 2021

Improving HW/SW Adaptability for Accelerating CNNs on FPGAs Through A Dynamic/Static Co-Reconfiguration Approach.
IEEE Trans. Parallel Distributed Syst., 2021

GenSeq+: A Scalable High-Performance Accelerator for Genome Sequencing.
IEEE ACM Trans. Comput. Biol. Bioinform., 2021

Tinker: A Middleware for Deploying Multiple NN-Based Applications on a Single Machine.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

An FPGA Based Accelerator for Clustering Algorithms With Custom Instructions.
IEEE Trans. Computers, 2021

Neural Network Instruction Set Extension and Code Mapping Mechanism.
Int. J. Softw. Informatics, 2021

Deployment and verification of machine learning tool-chain based on kubernetes distributed clusters.
CCF Trans. High Perform. Comput., 2021

FEAS: A Faster Event-driven Accelerator Supporting Inhibitory Spiking Neural Network.
Proceedings of the 12th International Symposium on Parallel Architectures, 2021

Vapor: A GPU Sharing Scheduler with Communication and Computation Pipeline for Distributed Deep Learning.
Proceedings of the 2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), New York City, NY, USA, September 30, 2021

UH-JLS: A Parallel Ultra-High Throughput JPEG-LS Encoding Architecture for Lossless Image Compression.
Proceedings of the 39th IEEE International Conference on Computer Design, 2021

LAP: A Lightweight Automata Processor for Pattern Matching Tasks.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

CVFCC: CV-Based Framework for Container Consolidation in Cloud Data Centers.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

SAP-SGD: Accelerating Distributed Parallel Training with High Communication Efficiency on Heterogeneous Clusters.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

2020
A Ubiquitous Machine Learning Accelerator With Automatic Parallelization on FPGA.
IEEE Trans. Parallel Distributed Syst., 2020

ParaML: A Polyvalent Multicore Accelerator for Machine Learning.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

WinoNN: Optimizing FPGA-Based Convolutional Neural Network Accelerators Using Sparse Winograd Algorithm.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Addressing Irregularity in Sparse Neural Networks Through a Cooperative Software/Hardware Approach.
IEEE Trans. Computers, 2020

WooKong: A Ubiquitous Accelerator for Recommendation Algorithms With Custom Instruction Sets on FPGA.
IEEE Trans. Computers, 2020

ConvCloud: An Adaptive Convolutional Neural Network Accelerator on Cloud FPGAs.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

WiderFrame: An Automatic Customization Framework for Building CNN Accelerators on FPGAs: Work-in-Progress.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2020

OctCNN: An Energy-Efficient FPGA Accelerator for CNNs using Octave Convolution Algorithm.
Proceedings of the IEEE International Conference on Cluster Computing, 2020

2019
DCW: A Reactive and Predictable Programming Framework for LET-Based Distributed Real-Time Systems.
ACM Trans. Design Autom. Electr. Syst., 2019

Power Optimization of WiFi Networks based on RSSI-awareness.
EAI Endorsed Trans. Mob. Commun. Appl., 2019

WGAN-Based Synthetic Minority Over-Sampling Technique: Improving Semantic Fine-Grained Classification for Lung Nodules in CT Images.
IEEE Access, 2019

FPNet: Customized Convolutional Neural Network for FPGA Platforms.
Proceedings of the International Conference on Field-Programmable Technology, 2019

An Overview of FPGA Based Deep Learning Accelerators: Challenges and Opportunities.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

GRU-ES: Resource Usage Prediction of Cloud Workloads Using a Novel Hybrid Method.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

Drama: A high efficient neural network accelerator on FPGA using dynamic reconfiguration: work-in-progress.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis Companion, 2019

Design Exploration of Multi-FPGAs for Accelerating Deep Learning.
Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

Higher-order Transfer Learning for Pulmonary Nodule Attribute Prediction in Chest CT Images.
Proceedings of the 2019 IEEE International Conference on Bioinformatics and Biomedicine, 2019

RV-CNN: Flexible and Efficient Instruction Set for CNNs Based on RISC-V Processors.
Proceedings of the Advanced Parallel Processing Technologies, 2019

2018
MALOC: A Fully Pipelined FPGA Accelerator for Convolutional Neural Networks With All Layers Mapped on Chip.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

Model checking of MARTE/CCSL time behaviors using timed I/O automata.
J. Syst. Archit., 2018

UniCNN: A Pipelined Accelerator Towards Uniformed Computing for CNNs.
Int. J. Parallel Program., 2018

SparseNN: A Performance-Efficient Accelerator for Large-Scale Sparse Neural Networks.
Int. J. Parallel Program., 2018

Chinese Language Processing Based on Stroke Representation and Multidimensional Representation.
IEEE Access, 2018

Cambricon-S: Addressing Irregularity in Sparse Neural Networks through A Cooperative Software/Hardware Approach.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

CCRS: Web Service for Chinese Character Recognition.
Proceedings of the 2018 IEEE International Conference on Web Services, 2018

Domino: Graph Processing Services on Energy-Efficient Hardware Accelerator.
Proceedings of the 2018 IEEE International Conference on Web Services, 2018

Low-Shot Multi-label Incremental Learning for Thoracic Diseases Diagnosis.
Proceedings of the Neural Information Processing - 25th International Conference, 2018

MuDBN: An Energy-Efficient and High-Performance Multi-FPGA Accelerator for Deep Belief Networks.
Proceedings of the 2018 on Great Lakes Symposium on VLSI, 2018

Domino: An Asynchronous and Energy-efficient Accelerator for Graph Processing: (Abstract Only).
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

RTMUS<sup><i>RT</i></sup>: a real-time testbed for empirically comparing real-time multicore schedulers: work-in-progress.
Proceedings of the International Conference on Embedded Software, 2018

Delayed Wake-Up Mechanism Under Suspend Mode of Smartphone.
Proceedings of the Collaborative Computing: Networking, Applications and Worksharing, 2018

WinoNN: optimising FPGA-based neural network accelerators using fast winograd algorithm (work-in-progress).
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2018

Furion: alleviating overheads for deep learning framework on single machine (work-in-progress).
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2018

Multi-order Transfer Learning for Pathologic Diagnosis of Pulmonary Nodule Malignancy.
Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2018

2017
A Classroom Scheduling Service for Smart Classes.
IEEE Trans. Serv. Comput., 2017

Service-Oriented Architecture on FPGA-Based MPSoC.
IEEE Trans. Parallel Distributed Syst., 2017

SuperMIC: Analyzing Large Biological Datasets in Bioinformatics with Maximal Information Coefficient.
IEEE ACM Trans. Comput. Biol. Bioinform., 2017

DLAU: A Scalable Deep Learning Accelerator Unit on FPGA.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2017

Hot spots profiling and dataflow analysis in custom dataflow computing SoftProcessors.
J. Syst. Softw., 2017

安全关键信息物理系统的时序可预测性 (Temporal Predictability in Safety Critical Cyber Physical System).
计算机科学, 2017

Reconfigurable Hardware Accelerators: Opportunities, Trends, and Challenges.
CoRR, 2017

PowerSensor: A method for power optimization of smartphone through sensing wakelock application.
Proceedings of the 9th International Conference on Wireless Communications and Signal Processing, 2017

Work-in-Progress: TTI: A Timing ISA for LET Model in Safety-Critical Systems.
Proceedings of the 2017 IEEE Real-Time Systems Symposium, 2017

Implementation and Optimization of the Accelerator Based on FPGA Hardware for LSTM Network.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

Rethinking Energy-Efficiency of Heterogeneous Computing for CNN-Based Mobile Applications.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

Building a Game Benchmark for Cooperative CPU-GPU with Pseudo User-Interaction.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

A Predictable Servant-Based Execution Model for Safety-Critical Systems.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

Exploiting Aperiodic Server to Improve Aperiodic Responsiveness for LET-Based Real-Time Systems.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

A High-Performance Accelerator for Large-Scale Convolutional Neural Networks.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

Natural Language Processing Service Based on Stroke-Level Convolutional Networks for Chinese Text Classification.
Proceedings of the 2017 IEEE International Conference on Web Services, 2017

Evaluation and Trade-offs of Graph Processing for Cloud Services.
Proceedings of the 2017 IEEE International Conference on Web Services, 2017

xFilter: A Temporal Locality Accelerator for Intrusion Detection System Services.
Proceedings of the 2017 IEEE International Conference on Web Services, 2017

GenServ: Genome Sequencing Services on Scalable Energy Efficient Accelerators.
Proceedings of the 2017 IEEE International Conference on Web Services, 2017

Light Weight Key-Value Store for Efficient Services on Local Distributed Mobile Devices.
Proceedings of the 2017 IEEE International Conference on Web Services, 2017

A Time-Aware Programming Framework for Constructing Predictable Real-Time Systems.
Proceedings of the 19th IEEE International Conference on High Performance Computing and Communications; 15th IEEE International Conference on Smart City; 3rd IEEE International Conference on Data Science and Systems, 2017

FPGA Based Big Data Accelerator Design in Teaching Computer Architecture and Organization.
Proceedings of the Cyber Physical Systems. Design, Modeling, and Evaluation, 2017

A power-efficient and high performance FPGA accelerator for convolutional neural networks: work-in-progress.
Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion, 2017

A Power-Efficient Accelerator Based on FPGAs for LSTM Network.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

OmniGraph: A Scalable Hardware Accelerator for Graph Processing.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

A Power-Efficient Accelerator for Convolutional Neural Networks.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

Mermaid: Integrating Vertex-Centric with Edge-Centric for Real-World Graph Processing.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

TuNao: A High-Performance and Energy-Efficient Reconfigurable Accelerator for Graph Processing.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

DSA: Scalable Distributed Sequence Alignment System Using SIMD Instructions.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

A high-performance FPGA accelerator for sparse neural networks: work-in-progress.
Proceedings of the 2017 International Conference on Compilers, 2017

Distributed gene clinical decision support system based on cloud computing.
Proceedings of the 2017 IEEE International Conference on Bioinformatics and Biomedicine, 2017

Clockwerk: A Predictable and Efficient Extension of Logical Execution Time Model.
Proceedings of the 24th Asia-Pacific Software Engineering Conference, 2017

Building step-by-step practical curriculum system for computer systemic ability training.
Proceedings of the ACM Turing 50th Celebration Conference, 2017

Efficient Distributed Smith-Waterman Algorithm Based on Apache Spark.
Proceedings of the 2017 IEEE 10th International Conference on Cloud Computing (CLOUD), 2017

2016
Evaluation and Tradeoffs for Out-of-Order Execution on Reconfigurable Heterogeneous MPSoC.
IEEE Trans. Very Large Scale Integr. Syst., 2016

Hardware Implementation on FPGA for Task-Level Parallel Dataflow Execution Engine.
IEEE Trans. Parallel Distributed Syst., 2016

Definitions of predictability for Cyber Physical Systems.
J. Syst. Archit., 2016

Memory Power Optimization on Different Memory Address Mapping Schemas.
J. Inf. Sci. Eng., 2016

KUMMS: optimising DRAM locality with Kernel-user behaviours.
Int. J. High Perform. Syst. Archit., 2016

Soft computing in big data intelligent transportation systems.
Appl. Soft Comput., 2016

Scheduling algorithm based on prefetching in MapReduce clusters.
Appl. Soft Comput., 2016

SCADIS: A Scalable Accelerator for Data-Intensive String Set Matching on FPGAs.
Proceedings of the 2016 IEEE Trustcom/BigDataSE/ISPA, 2016

Brief Announcement: MIC++: Accelerating Maximal Information Coefficient Calculation with GPUs and FPGAs.
Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, 2016

A Fast and Better Hybrid Recommender System Based on Spark.
Proceedings of the Network and Parallel Computing, 2016

Behavior-Aware Integrated CPU-GPU Power Management for Mobile Games.
Proceedings of the 24th IEEE International Symposium on Modeling, 2016

FairPlay: Services Migration with Lock-Free Mechanisms for Load Balancing in Cloud Architectures.
Proceedings of the IEEE International Conference on Web Services, 2016

SOLAR: Services-Oriented Learning Architectures.
Proceedings of the IEEE International Conference on Web Services, 2016

PIE: A Pipeline Energy-Efficient Accelerator for Inference Process in Deep Neural Networks.
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016

FCM: Towards Fine-Grained GPU Power Management for Closed Source Mobile Games.
Proceedings of the 26th edition on Great Lakes Symposium on VLSI, 2016

Run-time phase prediction for a reconfigurable VLIW processor.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

Display power reduction for mobile closed-source games.
Proceedings of the 27th IEEE International Conference on Application-specific Systems, 2016

2015
FreeRider: Non-Local Adaptive Network-on-Chip Routing with Packet-Carried Propagation of Congestion Information.
IEEE Trans. Parallel Distributed Syst., 2015

Heterogeneous Cloud Framework for Big Data Genome Sequencing.
IEEE ACM Trans. Comput. Biol. Bioinform., 2015

Architecture Support for Task Out-of-Order Execution in MPSoCs.
IEEE Trans. Computers, 2015

A case study of parallel JPEG encoding on an FPGA.
J. Parallel Distributed Comput., 2015

CRAIS: A Crossbar-Based Interconnection Scheme on FPGA for Big Data.
J. Comput. Sci. Technol., 2015

Fast approximate hash table using extended counting Bloom filter.
Int. J. Comput. Sci. Eng., 2015

SAKMA: Specialized FPGA-Based Accelerator Architecture for Data-Intensive K-Means Algorithms.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2015

RapidPath: Accelerating Constrained Shortest Path Finding in Graphs on FPGA (Abstract Only).
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

SODA: software defined FPGA based accelerators for big data.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

An FPGA-Based Accelerator for Neighborhood-Based Collaborative Filtering Recommendation Algorithms.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Design of a More Scalable Database System.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

A Deep Learning Prediction Process Accelerator Based FPGA.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

Enumeration System on HBase for Low-Latency.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

PuDianNao: A Polyvalent Machine Learning Accelerator.
Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

2014
A Unified Write Buffer Cache Management Scheme for Flash Memory.
IEEE Trans. Very Large Scale Integr. Syst., 2014

Accelerating the Next Generation Long Read Mapping with the FPGA-Based System.
IEEE ACM Trans. Comput. Biol. Bioinform., 2014

Colored Petri Net model with automatic parallelization on real-time multicore architectures.
J. Syst. Archit., 2014

Amdahl's and Hill-Marty laws revisited for FPGA-based MPSoCs: from theory to practice.
Int. J. High Perform. Syst. Archit., 2014

Memory power optimisation on low-bit multi-access cross memory address mapping schema.
Int. J. Embed. Syst., 2014

Towards Energy Optimization Based on Delay-Sensitive Traffic for WiFi Network.
Proceedings of the 2014 IEEE 11th Intl Conf on Ubiquitous Intelligence and Computing and 2014 IEEE 11th Intl Conf on Autonomic and Trusted Computing and 2014 IEEE 14th Intl Conf on Scalable Computing and Communications and Its Associated Workshops, 2014

Unbinds data and tasks to improving the Hadoop performance.
Proceedings of the 15th IEEE/ACIS International Conference on Software Engineering, 2014

Memory power optimization on different memory address mapping schemas.
Proceedings of the 2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications, 2014

Multi-objective aware design flow for coarse-grained systems on chip.
Proceedings of the 2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications, 2014

DLBer: A Dynamic Load Balancing Algorithm for the Event-Driven Clusters.
Proceedings of the Network and Parallel Computing, 2014

Kernel-User Space Separation in DRAM Memory.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2014

Trade-offs between the sensitivity and the speed of the FPGA-based sequence aligner.
Proceedings of the 2014 International Symposium on Integrated Circuits (ISIC), 2014

DLBS: Decentralized load balancing scheme for event-driven cloud frameworks.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

A Thread Behavior-Based Memory Management Framework on Multi-core Smartphone.
Proceedings of the 2014 19th International Conference on Engineering of Complex Computer Systems, 2014

Behavior Gaps and Relations between Operating System and Applications on Accessing DRAM.
Proceedings of the 2014 19th International Conference on Engineering of Complex Computer Systems, 2014

An Adaptive Auto-configuration Tool for Hadoop.
Proceedings of the 2014 19th International Conference on Engineering of Complex Computer Systems, 2014

Application-aware group scheduler for Android.
Proceedings of the Fifth International Conference on the Applications of Digital Information and Web Technologies, 2014

Bwasw-Cloud: Efficient sequence alignment algorithm for two big data with MapReduce.
Proceedings of the Fifth International Conference on the Applications of Digital Information and Web Technologies, 2014

HPSO: Prefetching Based Scheduling to Improve Data Locality for MapReduce Clusters.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2014

Domino: an incremental computing framework in cloud with eventual synchronization.
Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

Texture-Directed Mobile GPU Power Management for Closed-Source Games.
Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

Big data genome sequencing on Zynq based clusters (abstract only).
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

Co-processing with dynamic reconfiguration on heterogeneous MPSoC: practices and design tradeoffs (abstract only).
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

Wave: Trigger Based Synchronous Data Process System.
Proceedings of the 14th IEEE/ACM International Symposium on Cluster, 2014

Instruction Extension and Generation for Adaptive Processors.
Proceedings of the Reconfigurable Computing: Architectures, Tools, and Applications, 2014

2013
Cooperating Virtual Memory and Write Buffer Management for Flash-Based Storage Systems.
IEEE Trans. Very Large Scale Integr. Syst., 2013

MP-Tomasulo: A Dependency-Aware Automatic Parallel Execution Engine for Sequential Programs.
ACM Trans. Archit. Code Optim., 2013

Heterothread: hybrid thread level parallelism on heterogeneous multicore architectures.
SIGBED Rev., 2013

A Result Fusion based Distributed Anomaly Detection System for Android Smartphones.
J. Networks, 2013

Services-oriented URL filtering and verification.
Int. J. High Perform. Syst. Archit., 2013

Static or Dynamic: Trade-Offs for Task Dependency Analysis for Heterogeneous MPSoC.
Proceedings of the 12th IEEE International Conference on Trust, 2013

Automatic Loop-Based Pipeline Optimization on Reconfigurable Platform.
Proceedings of the 12th IEEE International Conference on Trust, 2013

Detecting Associations in Large Dataset on MapReduce.
Proceedings of the 12th IEEE International Conference on Trust, 2013

SOBA: A Services-Oriented Browser Architecture with Distributed URL-Filtering Mechanisms for Teenagers.
Proceedings of the IEEE Ninth World Congress on Services, 2013

FPGA implementation of a scheduler supporting parallel dataflow execution.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

Power-aware buddy system and task group scheduler.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

SmartMal: A Service-Oriented Behavioral Malware Detection Framework for Smartphones.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

An Intelligent Transportation System Using RFID Based Sensors.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

A FPGA-Based High Performance Acceleration Platform for the Next Generation Long Read Mapping.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

Hardware acceleration for the banded Smith-Waterman algorithm with the cycled systolic array.
Proceedings of the 2013 International Conference on Field-Programmable Technology, 2013

Genome sequencing using mapreduce on FPGA with multiple hardware accelerators (abstract only).
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

Custom instruction generation and mapping for reconfigurable instruction set processors (abstract only).
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

Acceleration of the long read mapping on a PC-FPGA architecture (abstract only).
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

Pipeline Optimization for Loops on Reconfigurable Platform.
Proceedings of the Reconfigurable Computing: Architectures, Tools and Applications, 2013

SmartClass: A Services-Oriented Approach for University Resource Scheduling.
Proceedings of the 2013 IEEE International Conference on Services Computing, Santa Clara, CA, USA, June 28, 2013

2012
Hybrid nonvolatile disk cache for energy-efficient and high-performance systems.
ACM Trans. Design Autom. Electr. Syst., 2012

A star network approach in heterogeneous multiprocessors system on chip.
J. Supercomput., 2012

Smart Grid communication using next generation heterogeneous wireless networks.
Proceedings of the IEEE Third International Conference on Smart Grid Communications, 2012

Analyzing Parallelization and Program Performance in Heterogeneous MPSoCs.
Proceedings of the 20th IEEE International Symposium on Modeling, 2012

Frequency Affinity: Analyzing and Maximizing Power Efficiency in Multi-core Systems.
Proceedings of the 20th IEEE International Symposium on Modeling, 2012

FPM: A Flexible Programming Model for MPSoC on FPGA.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Detecting Data Hazards in Multi-Processor System-on-Chips on FPGA.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Behavior Aware Data Locality for Caches.
Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012

A Dependency Aware Task Partitioning and Scheduling Algorithm for Hardware-Software Codesign on MPSoCs.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2012

Share memory aware scheduler: balancing performance and fairness.
Proceedings of the Great Lakes Symposium on VLSI 2012, 2012

A task-level OoO framework for heterogeneous systems.
Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012

Parallel dataflow execution for sequential programs on reconfigurable hybrid MPSoCs.
Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012

CaaS: Core as a service realizing hardware sercices on reconfigurable MPSoCS.
Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL), 2012

Phase Detection for Loop-Based Programs on Multicore Architectures.
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

Memory Affinity: Balancing Performance, Power, Thermal and Fairness for Multi-core Systems.
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

Cache Promotion Policy Using Re-reference Interval Prediction.
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

DTS: Using Dynamic Time-Slice Scaling to Address the OS Problem Incurred by DVFS.
Proceedings of the 2012 IEEE International Conference on Cluster Computing Workshops, 2012

Cloud Based Short Read Mapping Service.
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

Sedna: A Memory Based Key-Value Storage System for Realtime Processing in Cloud.
Proceedings of the 2012 IEEE International Conference on Cluster Computing Workshops, 2012

CRAIS: A Crossbar Based Adaptive Interconnection Scheme.
Proceedings of the Reconfigurable Computing: Architectures, Tools and Applications, 2012

Regarding Processors and Reconfigurable IP Cores as Services.
Proceedings of the 2012 IEEE Ninth International Conference on Services Computing, 2012

2011
Traceback in wireless sensor networks with packet marking and logging.
Frontiers Comput. Sci. China, 2011

Cooperating Write Buffer Cache and Virtual Memory Management for Flash Memory Based Systems.
Proceedings of the 17th IEEE Real-Time and Embedded Technology and Applications Symposium, 2011

Tool Chain Support with Dynamic Profiling for RISP.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2011

A Flexible High Speed Star Network Based on Peer to Peer Links on FPGA.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2011

ExLRU: a unified write buffer cache management for flash memory.
Proceedings of the 11th International Conference on Embedded Software, 2011

SOMP: Service-Oriented Multi Processors.
Proceedings of the IEEE International Conference on Services Computing, 2011

2010
Efficient distributed location verification in wireless sensor networks.
Frontiers Comput. Sci. China, 2010

Reputation-based trust model in Vehicular Ad Hoc Networks.
Proceedings of the International Conference on Wireless Communications and Signal Processing, 2010

Schedulability Analysis for MultiCore Global Scheduling with Model Checking.
Proceedings of the 11th International Workshop on Microprocessor Test and Verification, 2010

Human 3D Motion Recognition Based on Spatial-Temporal Context of Joints.
Proceedings of the 20th International Conference on Pattern Recognition, 2010

Multi-Dimensional Resilient Statistical En-Route Filtering in Wireless Sensor Networks.
Proceedings of the Advances in Grid and Pervasive Computing, 5th International Conference, 2010

Write activity reduction on flash main memory via smart victim cache.
Proceedings of the 20th ACM Great Lakes Symposium on VLSI 2009, 2010

A Non-cooperative Game Approach for Intrusion Detection in Smartphone Systems.
Proceedings of the 8th Annual Conference on Communication Networks and Services Research, 2010

2009
Trust Management in the P2P Grid.
J. Digit. Content Technol. its Appl., 2009

RePro: A Reputation-based Proactive Routing Protocol for the Wireless Mesh Backbone.
Proceedings of the International Conference on Networked Computing and Advanced Information Management, 2009

2008
Reputation Based Service Selection in Grid Environment.
Proceedings of the International Conference on Computer Science and Software Engineering, 2008

On the Performance of Probabilistic Packet Marking for Traceback in Sensor Networks.
Proceedings of the 5th IEEE Consumer Communications and Networking Conference, 2008

A new reliability evaluation method for embedded system.
Proceedings of 8th IEEE International Conference on Computer and Information Technology, 2008

2005
Compile-Time Energy Reduction Techniques based on Voltage Scaling Characteristic.
Proceedings of the IASTED International Conference on Software Engineering, 2005

2004
Compiling adaptive programs for real-time dynamic scheduling.
Proceedings of the IASTED Conference on Software Engineering and Applications, 2004

2003
OOEM: Object-Oriented Energy Model for Embedded Software IP Reuse.
Proceedings of the 2003 IEEE International Conference on Information Reuse and Integration, 2003

2001
KDAEHS: From Architecture to System.
Proceedings of the 2001 International Symposium on Information Technology (ITCC 2001), 2001

2000
Adaptability in KDAEHS: an adaptive educational hypermedia system based on structural computing.
Proceedings of the HYPERTEXT 2000, Proceedings of the 11th ACM Conference on Hypertext and Hypermedia, May 30, 2000

1999
KDAHS: An Adaptive Hypermedia System based on Structural Computing.
Proceedings of the 1999 ACM Digital Library Workshop on Organizing Web Space (WOWS), 1999


  Loading...