Dongrui Fan

Orcid: 0000-0001-5219-0908

According to our database1, Dongrui Fan authored at least 176 papers between 2003 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
HiHGNN: Accelerating HGNNs Through Parallelism and Data Reusability Exploitation.
IEEE Trans. Parallel Distributed Syst., July, 2024

Skyway: Accelerate Graph Applications with a Dual-Path Architecture and Fine-Grained Data Management.
J. Comput. Sci. Technol., July, 2024

MoDSE: A High-Accurate Multiobjective Design Space Exploration Framework for CPU Microarchitectures.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., May, 2024

Improving Utilization of Dataflow Unit for Multi-Batch Processing.
ACM Trans. Archit. Code Optim., March, 2024

Multilayer Dataflow: Orchestrate Butterfly Sparsity to Accelerate Attention Computation.
CoRR, 2024

Multi-objective Optimization in CPU Design Space Exploration: Attention is All You Need.
CoRR, 2024

SiHGNN: Leveraging Properties of Semantic Graphs for Efficient HGNN Acceleration.
CoRR, 2024

Accelerating Mini-batch HGNN Training by Reducing CUDA Kernels.
CoRR, 2024

A Comprehensive Survey on GNN Characterization.
CoRR, 2024

Characterizing and Understanding HGNN Training on GPUs.
CoRR, 2024

Revisiting Edge Perturbation for Graph Neural Network in Graph Data Augmentation and Attack.
CoRR, 2024

Disttack: Graph Adversarial Attacks Toward Distributed GNN Training.
Proceedings of the Euro-Par 2024: Parallel Processing, 2024

ADE-HGNN: Accelerating HGNNs Through Attention Disparity Exploitation.
Proceedings of the Euro-Par 2024: Parallel Processing, 2024

GDL-GNN: Applying GPU Dataloading of Large Datasets for Graph Neural Network Inference.
Proceedings of the Euro-Par 2024: Parallel Processing, 2024

JPlace: A Clock-Aware Length-Matching Placement for Rapid Single-Flux-Quantum Circuits.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

GDR-HGNN: A Heterogeneous Graph Neural Networks Accelerator Frontend with Graph Decoupling and Recoupling.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

2023
Accelerating Convolutional Neural Networks by Exploiting the Sparsity of Output Activation.
IEEE Trans. Parallel Distributed Syst., December, 2023

A Comprehensive Survey on Distributed Training of Graph Neural Networks.
Proc. IEEE, December, 2023

Domain adaptive person re-identification with memory-based circular ranking.
Appl. Intell., March, 2023

A Survey of Graph Pre-processing Methods: From Algorithmic to Hardware Perspectives.
CoRR, 2023

Characterizing and Understanding Defense Methods for GNNs on GPUs.
IEEE Comput. Archit. Lett., 2023

Alleviating Transfer Latency in DataFlow Accelerator for DSP Applications.
Proceedings of the 41st IEEE International Conference on Computer Design, 2023

A Transfer Learning Framework for High-Accurate Cross-Workload Design Space Exploration of CPU.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

ROMA: A Reconfigurable On-chip Memory Architecture for Multi-core Accelerators.
Proceedings of the IEEE International Conference on High Performance Computing & Communications, 2023

A High-accurate Multi-objective Ensemble Exploration Framework for Design Space of CPU Microarchitecture.
Proceedings of the Great Lakes Symposium on VLSI 2023, 2023

JRouter: A Multi-Terminal Hierarchical Length-Matching Router under Planar Manhattan Routing Model for RSFQ Circuits.
Proceedings of the Great Lakes Symposium on VLSI 2023, 2023

Improving Utilization of Dataflow Architectures Through Software and Hardware Co-Design.
Proceedings of the Euro-Par 2023: Parallel Processing - 29th International Conference on Parallel and Distributed Computing, Limassol, Cyprus, August 28, 2023

A High-accurate Multi-objective Exploration Framework for Design Space of CPU.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Simple and Efficient Heterogeneous Graph Neural Network.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
General spiking neural network framework for the learning trajectory from a noisy mmWave radar.
Neuromorph. Comput. Eng., June, 2022

Multi-Node Acceleration for Large-Scale GCNs.
IEEE Trans. Computers, 2022

JBNN: A Hardware Design for Binarized Neural Networks Using Single-Flux-Quantum Circuits.
IEEE Trans. Computers, 2022

Accelerating Data Transfer in Dataflow Architectures Through a Look-Ahead Acknowledgment Mechanism.
J. Comput. Sci. Technol., 2022

Sampling Methods for Efficient Training of Graph Convolutional Networks: A Survey.
IEEE CAA J. Autom. Sinica, 2022

Rethinking Efficiency and Redundancy in Training Large-scale Graphs.
CoRR, 2022

A synergistic reinforcement learning-based framework design in driving automation.
Comput. Electr. Eng., 2022

A survey on superconducting computing technology: circuits, architectures and design tools.
CCF Trans. High Perform. Comput., 2022

Accelerating Graph Processing With Lightweight Learning-Based Data Reordering.
IEEE Comput. Archit. Lett., 2022

Characterizing and Understanding HGNNs on GPUs.
IEEE Comput. Archit. Lett., 2022

Characterization and Implementation of Radar System Applications on a Reconfigurable Dataflow Architecture.
IEEE Comput. Archit. Lett., 2022

Characterizing and Understanding Distributed GNN Training on GPUs.
IEEE Comput. Archit. Lett., 2022

GNNSampler: Bridging the Gap Between Sampling Algorithms of GNN and Hardware.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2022

A Routing-Aware Mapping Method for Dataflow Architectures.
Proceedings of the Network and Parallel Computing, 2022

Survey on Graph Neural Network Acceleration: An Algorithmic Perspective.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Heterogeneous Collaborative Refining for Real-Time End-to-End Image-Text Retrieval System.
Proceedings of the ICIAI 2022: The 6th International Conference on Innovation in Artificial Intelligence, Guangzhou China, March 4, 2022

GEM: Execution-Aware Cache Management for Graph Analytics.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2022

MatGraph: An Energy-Efficient and Flexible CGRA Engine for Matrix-Based Graph Analytics.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2022

A Loop Optimization Method for Dataflow Architecture.
Proceedings of the 24th IEEE Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, 2022

HetGraph: A High Performance CPU-CGRA Architecture for Matrix-based Graph Analytics.
Proceedings of the GLSVLSI '22: Great Lakes Symposium on VLSI 2022, Irvine CA USA, June 6, 2022

LRP: Predictive output activation based on SVD approach for CNN s acceleration.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

Alleviating datapath conflicts and design centralization in graph analytics acceleration.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

2021
An efficient scheduling algorithm for dataflow architecture using loop-pipelining.
Inf. Sci., 2021

BSR-TC: Adaptively Sampling for Accurate Triangle Counting over Evolving Graph Streams.
Int. J. Softw. Eng. Knowl. Eng., 2021

Tackling Variabilities in Autonomous Driving.
CoRR, 2021

RISC-NN: Use RISC, NOT CISC as Neural Network Hardware Infrastructure.
CoRR, 2021

Scalable and efficient graph traversal on high-throughput cluster.
CCF Trans. High Perform. Comput., 2021

Hardware Acceleration for GCNs via Bidirectional Fusion.
IEEE Comput. Archit. Lett., 2021

Triangle Counting by Adaptively Resampling over Evolving Graph Streams.
Proceedings of the 33rd International Conference on Software Engineering and Knowledge Engineering, 2021

Alleviating Imbalance in Synchronous Distributed Training of Deep Neural Networks.
Proceedings of the 2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), New York City, NY, USA, September 30, 2021

Streamline Ring ORAM Accesses through Spatial and Temporal Optimization.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

2020
An efficient dataflow accelerator for scientific applications.
Future Gener. Comput. Syst., 2020

Video Face Recognition System: RetinaFace-mnet-faster and Secondary Search.
CoRR, 2020

Top-Related Meta-Learning Method for Few-Shot Detection.
CoRR, 2020

Pixel-Semantic Revise of Position Learning A One-Stage Object Detector with A Shared Encoder-Decoder.
CoRR, 2020

Characterizing and Understanding GCNs on GPU.
IEEE Comput. Archit. Lett., 2020

An Efficient Multicast Router using Shared-Buffer with Packet Merging for Dataflow Architecture.
Proceedings of the 14th IEEE/ACM International Symposium on Networks-on-Chip, 2020

Highly Efficient and GPU-Friendly Implementation of BFS on Single-node System.
Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2020

Pixel-Semantic Revising of Position: One-Stage Object Detector with Shared Encoder-Decoder.
Proceedings of the Neural Information Processing - 27th International Conference, 2020

Accelerating Sparse Convolutional Neural Networks Based on Dataflow Architecture.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2020

CTA: A Critical Task Aware Scheduling Mechanism for Dataflow Architecture.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2020

HyGCN: A GCN Accelerator with Hybrid Architecture.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

Design Automation Methodology from RTL to Gate-level Netlist and Schematic for RSFQ Logic Circuits.
Proceedings of the GLSVLSI '20: Great Lakes Symposium on VLSI 2020, 2020

2019
PIM-WEAVER: A High Energy-efficient, General-purpose Acceleration Architecture for String Operations in Big Data Processing.
Sustain. Comput. Informatics Syst., 2019

Applying CNN on a scientific application accelerator based on dataflow architecture.
CCF Trans. High Perform. Comput., 2019

Alleviating Irregularity in Graph Analytics Acceleration: a Hardware/Software Co-Design Approach.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Instruction Vulnerability Test and Code Optimization Against DVFS Attack.
Proceedings of the IEEE International Test Conference in Asia, 2019

Balancing Memory Accesses for Energy-Efficient Graph Analytics Accelerators.
Proceedings of the 2019 IEEE/ACM International Symposium on Low Power Electronics and Design, 2019

iATPG: Instruction-level Automatic Test Program Generation for Vulnerabilities under DVFS attack.
Proceedings of the 25th IEEE International Symposium on On-Line Testing and Robust System Design, 2019

C-MIDN: Coupled Multiple Instance Detection Network With Segmentation Guidance for Weakly Supervised Object Detection.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Highly Efficient Breadth-First Search on CPU-Based Single-Node System.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

A Sharing Path Awareness Scheduling Algorithm for Dataflow Architecture.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

C-MAP: Improving the Effectiveness of Mapping Method for CGRA by Reducing NoC Congestion.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

Magma: A Monolithic 3D Vertical Heterogeneous ReRAM-based Main Memory Architecture.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Utilizing the Instability in Weakly Supervised Object Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

2018
CRAT: Enabling Coordinated Register Allocation and Thread-Level Parallelism Optimization for GPUs.
IEEE Trans. Computers, 2018

The rise of high-throughput computing.
Frontiers Inf. Technol. Electron. Eng., 2018

A Pipelining Loop Optimization Method for Dataflow Architecture.
J. Comput. Sci. Technol., 2018

A Non-Stop Double Buffering Mechanism for Dataflow Architecture.
J. Comput. Sci. Technol., 2018

High-Performance and Energy-Efficient Fault Tolerance Scheduling Algorithm Based on Improved TMR for Heterogeneous System.
Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2018

WEAVER: An Energy Efficient, General-Purpose Acceleration Architecture for String Operations in Big Data Applications.
Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2018

Accelerating CNN Algorithm with Fine-Grained Dataflow Architectures.
Proceedings of the 20th IEEE International Conference on High Performance Computing and Communications; 16th IEEE International Conference on Smart City; 4th IEEE International Conference on Data Science and Systems, 2018

Optimizing the Efficiency of Data Transfer in Dataflow Architectures.
Proceedings of the 20th IEEE International Conference on High Performance Computing and Communications; 16th IEEE International Conference on Smart City; 4th IEEE International Conference on Data Science and Systems, 2018

SmarCo: An Efficient Many-Core Processor for High-Throughput Applications in Datacenters.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

Optimizing network efficiency of dataflow architectures through dynamic packet merging.
Proceedings of the Ninth International Green and Sustainable Computing Conference, 2018

2017
An Efficient Network-on-Chip Router for Dataflow Architecture.
J. Comput. Sci. Technol., 2017

An Adaptive Tuning Sparse Fast Fourier Transform.
Proceedings of the Advances in Multimedia Information Processing - PCM 2017, 2017

Hard Neighboring Variables Based Configuration Checking in Stochastic Local Search for Weighted Partial Maximum Satisfiability.
Proceedings of the 29th IEEE International Conference on Tools with Artificial Intelligence, 2017

2016
An Evolutionary Technique for Performance-Energy-Temperature Optimized Scheduling of Parallel Tasks on Multi-Core Processors.
IEEE Trans. Parallel Distributed Syst., 2016

ACCC: An Acceleration Mechanism for Character Operation Based on Cache Computing in Big Data Applications.
Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, 2016

An energy-efficient bandwidth allocation method for single-chip heterogeneous processor.
Proceedings of the Seventh International Green and Sustainable Computing Conference, 2016

A framework for energy-efficient optimization on multi-cores.
Proceedings of the Seventh International Green and Sustainable Computing Conference, 2016

Memory partition for SIMD in streaming dataflow architectures.
Proceedings of the Seventh International Green and Sustainable Computing Conference, 2016

On the properties of data migration based on topology pattern keeping on cache hierarchy.
Proceedings of the Seventh International Green and Sustainable Computing Conference, 2016

A Percolation Data Migration Schema in a hybrid Cache Hierarchy.
Proceedings of the Seventh International Green and Sustainable Computing Conference, 2016

Message from the chairs.
Proceedings of the Seventh International Green and Sustainable Computing Conference, 2016

POSTER: An Optimization of Dataflow Architectures for Scientific Applications.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
Introduction to special issue on Selected Papers from 2013 International Green Computing Conference.
Sustain. Comput. Informatics Syst., 2015

Corrigendum to "Fast and scalable lock methods for video coding on many-core architecture" [J. Visual Communication and Image Representation 25(7) (2014) 1758-1762].
J. Vis. Commun. Image Represent., 2015

Enabling coordinated register allocation and thread-level parallelism optimization for GPUs.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

Decoupling Contention with Victim Row-Buffer on Multicore Memory Systems.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

A high-density data path implementation fitting for HTC applications.
Proceedings of the Sixth International Green and Sustainable Computing Conference, 2015

Thread ID based power reduction mechanism for multi-thread shared set-associative caches.
Proceedings of the Sixth International Green and Sustainable Computing Conference, 2015

ParaVerifier: An Automatic Framework for Proving Parameterized Cache Coherence Protocols.
Proceedings of the Automated Technology for Verification and Analysis, 2015

2014
QBNoC: QoS-aware bufferless NoC architecture.
Microelectron. J., 2014

CRANarch: A feasible processor micro-architecture for Cloud Radio Access Network.
Microprocess. Microsystems, 2014

Fast and scalable lock methods for video coding on many-core architecture.
J. Vis. Commun. Image Represent., 2014

Optimizing mapreduce with low memory requirements for shared-memory systems.
Proceedings of the 15th IEEE/ACIS International Conference on Software Engineering, 2014

Efficiently and Completely Verifying Synchronized Consistency Models.
Proceedings of the Automated Technology for Verification and Analysis, 2014

SpongeDirectory: flexible sparse directories utilizing multi-level memristors.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
StreamTMC: Stream compilation for tiled multi-core architectures.
J. Parallel Distributed Comput., 2013

Scalability study of molecular dynamics simulation on Godson-T many-core architecture.
J. Parallel Distributed Comput., 2013

3D Networks-on-Chip mapping targeting minimum signal TSVs.
IEICE Electron. Express, 2013

A Path-Adaptive Opto-electronic Hybrid NoC for Chip Multi-processor.
Proceedings of the 12th IEEE International Conference on Trust, 2013

Energy-Performance Modeling and Optimization of Parallel Computing in On-Chip Networks.
Proceedings of the 12th IEEE International Conference on Trust, 2013

Low power cache architectures with hybrid approach of filtering unnecessary way accesses.
Proceedings of the 2013 PPOPP International Workshop on Programming Models and Applications for Multicores and Manycores, 2013

HRUL: A Hardware Assisted Recorder for User-Level Application.
Proceedings of the International Conference on Parallel and Distributed Computing, 2013

SimICT: A fast and flexible framework for performance and power evaluation of large-scale architecture.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Low Execution Efficiency: When General Multi-core Processor Meets Wireless Communication Protocol.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

An Efficient Parallel Mechanism for Highly-Debuggable Multicore Simulator.
Proceedings of the Advanced Parallel Processing Technologies, 2013

2012
Extendable pattern-oriented optimization directives.
ACM Trans. Archit. Code Optim., 2012

Godson-T: An Efficient Many-Core Processor Exploring Thread-Level Parallelism.
IEEE Micro, 2012

A SAT-based diagnosis pattern generation method for timing faults in scan chains.
Proceedings of the 2012 IEEE International Symposium on Circuits and Systems, 2012

Self-Correction Trace Model: A Full-System Simulator for Optical Network-on-Chip.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Auto-Tuning GEMV on Many-Core GPU.
Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012

CRAW/P: A Workload Partition Method for the Efficient Parallel Simulation of Manycores.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

2011
An Efficient and Flexible Task Management for Many Cores.
Trans. High Perform. Embed. Archit. Compil., 2011

New Methodologies for Parallel Architecture.
J. Comput. Sci. Technol., 2011

Optimizing Web Browser on Many-Core Architectures.
Proceedings of the 12th International Conference on Parallel and Distributed Computing, 2011

High-efficient architecture of Godson-T many-core processor.
Proceedings of the 2011 IEEE Hot Chips 23 Symposium (HCS), 2011

Performance analysis and optimization of molecular dynamics simulation on <i>Godson-T</i> many-core processor.
Proceedings of the 8th Conference on Computing Frontiers, 2011

Design Space Exploration of Parallel Architectures.
Proceedings of the Multi-objective Design Space Exploration of Multiprocessor SoC Architectures, 2011

2010
Landing Stencil Code on Godson-T.
J. Comput. Sci. Technol., 2010

P-GAS: Parallelizing a Cycle-Accurate Event-Driven Many-Core Processor Simulator Using Parallel Discrete Event Simulation.
Proceedings of the 24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation, 2010

Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010


High performance comparison-based sorting algorithm on many-core GPUs.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

GVE: Godson-T Verification Engine for many-core architecture rapid prototyping and debugging.
Proceedings of the International Conference on Field-Programmable Technology, 2010

Thread Owned Block Cache: Managing Latency in Many-Core Architecture.
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

Efficient Address Mapping of Shared Cache for On-Chip Many-Core Architecture.
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

Preliminary Investigation of Accelerating Molecular Dynamics Simulation on Godson-T Many-Core Processor.
Proceedings of the Euro-Par 2010 Parallel Processing Workshops, 2010

2009
Godson-T: An Efficient Many-Core Architecture for Parallel Program Executions.
J. Comput. Sci. Technol., 2009

Study on Fine-Grained Synchronization in Many-Core Architecture.
Proceedings of the 10th ACIS International Conference on Software Engineering, 2009

Architectural support for cilk computations on many-core architectures.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

GFFC: The Global Feedback Based Flow Control in the NoC Design for Many-core Processor.
Proceedings of the NPC 2009, 2009

Data Management: The Spirit to Pursuit Peak Performance on Many-Core Processor.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2009

Evaluation Method of Synchronization for Shared-Memory On-Chip Many-Core Processor.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2009

A Synchronization-Based Alternative to Directory Protocol.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2009

High Performance Matrix Multiplication on Many Cores.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

Characterizing and Understanding the Bandwidth Behavior of Workloads on Multi-core Processors.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

Software and Hardware Cooperate for 1-D FFT Algorithm Optimization on Multicore Processors.
Proceedings of the Ninth IEEE International Conference on Computer and Information Technology, 2009

A Fast Linear-Space Sequence Alignment Algorithm with Dynamic Parallelization Framework.
Proceedings of the Ninth IEEE International Conference on Computer and Information Technology, 2009

Design of New Hash Mapping Functions.
Proceedings of the Ninth IEEE International Conference on Computer and Information Technology, 2009

A Low-Complexity Synchronization Based Cache Coherence Solution for Many Cores.
Proceedings of the Ninth IEEE International Conference on Computer and Information Technology, 2009

2008
Experience on optimizing irregular computation for memory hierarchy in manycore architecture.
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

Efficient Parallelization of a Protein Sequence Comparison Algorithm on Manycore Architecture.
Proceedings of the Ninth International Conference on Parallel and Distributed Computing, 2008

Location Consistency Model Revisited: Problem, Solution and Prospects.
Proceedings of the Ninth International Conference on Parallel and Distributed Computing, 2008

A Quantitative Study of the On-Chip Network and Memory Hierarchy Design for Many-Core Processor.
Proceedings of the 14th International Conference on Parallel and Distributed Systems, 2008

A Performance Model of Dense Matrix Operations on Many-Core Architectures.
Proceedings of the Euro-Par 2008, 2008

A Study and Implementation of the Huffman Algorithm Based on Condensed Huffman Table.
Proceedings of the International Conference on Computer Science and Software Engineering, 2008

2007
Design and Implementation of Floating Point Stack on General RISC Architecture.
Proceedings of the 15th Euromicro International Conference on Parallel, 2007

Simplified Multi-Ported Cache in High Performance Processor.
Proceedings of the International Conference on Networking, 2007

Circuit implementation of floating point range reduction for trigonometric functions.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2007), 2007

Optimized Register Renaming Scheme for Stack-Based x86 Operations.
Proceedings of the Architecture of Computing Systems, 2007

2005
SoC Leakage Power Reduction Algorithm by Input Vector Control.
Proceedings of the 2005 International Symposium on System-on-Chip, 2005

An energy efficient TLB design methodology.
Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005

2003
Evaluation and Choice of Various Branch Predictors for Low-Power Embedded Processor.
J. Comput. Sci. Technol., 2003


  Loading...