Wei Zhang

Orcid: 0000-0003-1343-2817

Affiliations:
  • Virginia Commonwealth University, Compiler, Architecture, and Realtime Systems Lab, Richmond, VA, USA
  • Southern Illinois University
  • Pennsylvania State University (PhD)


According to our database1, Wei Zhang authored at least 155 papers between 2001 and 2021.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2021
Cache Leakage Reduction Techniques for Hybrid SPM-Cache Architectures.
J. Circuits Syst. Comput., 2021

2020
Reducing CPU-GPU Interferences to Improve CPU Performance in Heterogeneous Architectures.
J. Comput. Sci. Eng., 2020

pacSCA: A Profiling-Assisted Correlation-based Side-Channel Attack on GPUs.
Proceedings of the 38th IEEE International Conference on Computer Design, 2020

Denial of Service in CPU-GPU Heterogeneous Architectures.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

Packing Narrow-Width Operands to Improve Energy Efficiency of General-Purpose GPU Computing.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

2019
An Efficient Profiling-Based Side-Channel Attack on Graphics Processing Units.
Proceedings of the National Cyber Summit, 2019

Execution Units Power-Gating to Improve Energy Efficiency of GPGPUs.
Proceedings of the 2019 International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, 2019

A Collaborative Neurodynamic Approach to Sparse Coding.
Proceedings of the Advances in Neural Networks - ISNN 2019, 2019

Cracking Randomized Coalescing Techniques with An Efficient Profiling-Based Side-Channel Attack to GPU.
Proceedings of the 8th International Workshop on Hardware and Architectural Support for Security and Privacy, 2019

Improving Parallelism of Breadth First Search (BFS) Algorithm for Accelerated Performance on GPUs.
Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019

Heterogeneous Cache Hierarchy Management for Integrated CPU-GPU Architecture.
Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019

2018
Cache-Aware SPM Allocation Algorithms for Performance and Energy Optimization on Hybrid SPM-Cache Architecture.
J. Comput. Sci. Eng., 2018

Exploring GPU Data Cache Leakage Management Techniques.
J. Comput. Sci. Eng., 2018

Estimating the Worst-Case Execution Time of the Shared Data Cache in Integrated CPU-GPU Architectures.
J. Comput. Sci. Eng., 2018

Improving CPU and GPU Performance through Sample-Based Dynamic LLC Bypassing.
J. Comput. Sci. Eng., 2018

Packing Narrow-Width Operands to Improve GPU Performance.
J. Comput. Sci. Eng., 2018

Cache-Aware SPM Allocation to Reduce Worst-Case Execution Time for Hybrid SPM-Caches.
J. Circuits Syst. Comput., 2018

Reducing Inter-Application Interferences in Integrated CPU-GPU Heterogeneous Architecture.
Proceedings of the 36th IEEE International Conference on Computer Design, 2018

Exploiting GPU with 3D Stacked Memory to Boost Performance for Data-Intensive Applications.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018

Regression Based WCET Analysis For Sampling Based Motion Planning.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018

Energy-Efficient DNN Computing on GPUs Through Register File Management.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018

WCET Analysis of GPU L1 Data Caches.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018

2017
Enhancing GPU Performance by Efficient Hardware-Based and Hybrid L1 Data Cache Bypassing.
J. Comput. Sci. Eng., 2017

Warp-Based Load/Store Reordering to Improve GPU Time Predictability.
J. Comput. Sci. Eng., 2017

A Sample-Based Dynamic CPU and GPU LLC Bypassing Method for Heterogeneous CPU-GPU Architectures.
Proceedings of the 2017 IEEE Trustcom/BigDataSE/ICESS, Sydney, Australia, August 1-4, 2017, 2017

GPU Register Packing: Dynamically Exploiting Narrow-Width Operands to Improve Performance.
Proceedings of the 2017 IEEE Trustcom/BigDataSE/ICESS, Sydney, Australia, August 1-4, 2017, 2017

Static WCET Analysis of GPUs with Predictable Warp Scheduling.
Proceedings of the 20th IEEE International Symposium on Real-Time Distributed Computing, 2017

Drowsy Register Files for Reducing GPU Leakage Energy.
Proceedings of the 23rd IEEE International Conference on Parallel and Distributed Systems, 2017

Leakage energy reduction for hard real-time caches.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

WCET analysis of the shared data cache in integrated CPU-GPU architectures.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

2016
Priority L2 cache design for time predictability.
Int. J. Embed. Syst., 2016

Warp-Based Load/Store Reordering to Improve GPU Data Cache Time Predictability and Performance.
Proceedings of the 19th IEEE International Symposium on Real-Time Distributed Computing, 2016

Cache locking vs. partitioning for real-time computing on integrated CPU-GPU processors.
Proceedings of the 35th IEEE International Performance Computing and Communications Conference, 2016

2015
Profiling-based L1 data cache bypassing to improve GPU performance and energy efficiency.
SIGBED Rev., 2015

Scratchpad Memory Architectures and Allocation Algorithms for Hard Real-Time Multicore Processors.
J. Comput. Sci. Eng., 2015

Exploiting Static Non-Uniform Cache Architectures for Hard Real-Time Computing.
J. Comput. Sci. Eng., 2015

Cache-aware SPM allocation algorithms for hybrid SPM-cache architectures.
Proceedings of the Sixteenth International Symposium on Quality Electronic Design, 2015

Exploring shared memory and cache to improve GPU performance and energy efficiency.
Proceedings of the Sixteenth International Symposium on Quality Electronic Design, 2015

Real-Time GPU Computing: Cache or No Cache?
Proceedings of the IEEE 18th International Symposium on Real-Time Distributed Computing, 2015

Hardware-Based Performance Enhancement Guaranteed Caches.
Proceedings of the IEEE 18th International Symposium on Real-Time Distributed Computing, 2015

Hardware-Based and Hybrid L1 Data Cache Bypassing to Improve GPU Performance.
Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

Boosting GPU Performance by Profiling-Based L1 Data Cache Bypassing.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

2014
Exploiting Standard Deviation of CPI to Evaluate Architectural Time-Predictability.
J. Comput. Sci. Eng., 2014

Comparing Separate and Statically-Partitioned Caches for Time-Predictable Multicore Processors.
J. Comput. Sci. Eng., 2014

Two-Level Scratchpad Memory Architectures to Achieve Time Predictability and High Performance.
J. Comput. Sci. Eng., 2014

PEG-C: Performance Enhancement Guaranteed Cache for Hard Real-Time Systems.
IEEE Embed. Syst. Lett., 2014

Compiler-directed leakage energy reduction for instruction scratch-pad memories.
Proceedings of the Fifteenth International Symposium on Quality Electronic Design, 2014

A Real-Time Instruction Cache with High Average-Case Performance.
Proceedings of the 17th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, 2014

Worst-case performance guaranteed data cache.
Proceedings of the IEEE 33rd International Performance Computing and Communications Conference, 2014

WCET analysis of static NUCA caches.
Proceedings of the IEEE 33rd International Performance Computing and Communications Conference, 2014

Exploiting Hybrid SPM-Cache Architectures to Reduce Energy Consumption for Embedded Computing.
Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

Characterizing Energy Consumption of Real-Time and Media Benchmarks on Hybrid SPM-Caches.
Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

Performance Implication of Multicore Cache Locking on General-Purpose Processors.
Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

Bounding the Worst-Case Execution Time of Static NUCA Caches.
Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

Improving Energy Efficiency with Dynamic Compiler-Directed Function Unit Power Control.
Proceedings of the 12th IEEE International Conference on Embedded and Ubiquitous Computing, 2014

Hop-Based Priority Scheduling to Improve Worst-Case Inter-core Communication Latency.
Proceedings of the 12th IEEE International Conference on Embedded and Ubiquitous Computing, 2014

Reducing cache leakage energy for hybrid SPM-cache architectures.
Proceedings of the 2014 International Conference on Compilers, 2014

2013
Static worst-case lifetime estimation of wireless sensor networks: A case study on VigilNet.
J. Syst. Archit., 2013

Overview of Real-Time Java Computing.
J. Comput. Sci. Eng., 2013

Bounding Worst-Case DRAM Performance on Multicore Processors.
J. Comput. Sci. Eng., 2013

Counter-Based Approaches for Efficient WCET Analysis of Multicore Processors with Shared Caches.
J. Comput. Sci. Eng., 2013

Multicore Real-Time Scheduling to Reduce Inter-Thread Cache Interferences.
J. Comput. Sci. Eng., 2013

On the interactions between real-time scheduling and inter-thread cached interferences for multicore processors.
Proceedings of the International Symposium on Quality Electronic Design, 2013

Defend GPUs against DoS attacks.
Proceedings of the IEEE 32nd International Performance Computing and Communications Conference, 2013

Reducing worst-case execution time of hybrid SPM-caches.
Proceedings of the IEEE 32nd International Performance Computing and Communications Conference, 2013

Compiler-based approach to reducing leakage energy of instruction scratch-pad memories.
Proceedings of the 2013 IEEE 31st International Conference on Computer Design, 2013

Hybrid SPM-cache architectures to achieve high time predictability and performance.
Proceedings of the 24th International Conference on Application-Specific Systems, 2013

Standard deviation of CPI: A new metric to evaluate architectural time predictability.
Proceedings of the 24th International Conference on Application-Specific Systems, 2013

2012
A Model Checking Based Approach to Bounding Worst-Case Execution Time for Multicore Processors.
ACM Trans. Embed. Comput. Syst., 2012

Architectural time-predictability factor (ATF): a metric to evaluate time predictability of processors.
SIGBED Rev., 2012

On-line Trace Based Automatic Parallelization of Java Programs on Multicore Platforms.
J. Comput. Sci. Eng., 2012

Time-Predictable Java Dynamic Compilation on Multicore Processors.
J. Comput. Sci. Eng., 2012

Multicore-Aware Code Co-Positioning to Reduce WCET on Dual-Core Processors with Shared Instruction Caches.
J. Comput. Sci. Eng., 2012

Static Timing Analysis of Shared Caches for Multicore Processors.
J. Comput. Sci. Eng., 2012

Exploiting multi-level scratchpad memories for time-predictable multicore computing.
Proceedings of the 30th International IEEE Conference on Computer Design, 2012

Exploiting SPM-aware Scheduling on EPIC architectures for high-performance real-time systems.
Proceedings of the IEEE Conference on High Performance Extreme Computing, 2012

2011
An Interference Matrix Based Approach to Bounding Worst-Case Inter-Thread Cache Interferences and WCET for Multi-Core Processors.
J. Comput. Sci. Eng., 2011

Computing and Reducing Transient Error Propagation in Registers.
J. Comput. Sci. Eng., 2011

Bounding Worst-Case Performance for Multi-Core Processors with Shared L2 Instruction Caches.
J. Comput. Sci. Eng., 2011

Exploiting Instruction Reuse to Improve the Performance of Dual Instruction Execution.
J. Circuits Syst. Comput., 2011

Stack distance based worst-case instruction cache performance analysis.
Proceedings of the 2011 ACM Symposium on Applied Computing (SAC), TaiChung, Taiwan, March 21, 2011

Exploiting time predictable two-level scratchpad memory for real-time systems.
Proceedings of the 2011 ACM Symposium on Applied Computing (SAC), TaiChung, Taiwan, March 21, 2011

Multicore-Aware Code Positioning to Improve Worst-Case Performance.
Proceedings of the 14th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, 2011

Static Worst-Case Lifetime Estimation of Vigil Net.
Proceedings of the 2011 IEEE/ACM International Conference on Green Computing and Communications (GreenCom), 2011

Work in progress - Course development of programming for general-purpose multicore processors.
Proceedings of the 2011 Frontiers in Education Conference, 2011

2010
Loop-Based Instruction Prefetching to Reduce the Worst-Case Execution Time.
IEEE Trans. Computers, 2010

Static Worst-Case Energy and Lifetime Estimation of Wireless Sensor Networks.
J. Comput. Sci. Eng., 2010

Time-dependent density functional theory study on the hydrogen bonding-induced twisted intramolecular charge-transfer excited states of 2-(4'-<i>N</i>, <i>N</i>-dimethylaminophenyl)imidazo[4, 5-<i>b</i>]pyridine.
J. Comput. Chem., 2010

Replica victim caching to improve cache reliability against transient errors.
Int. J. High Perform. Syst. Archit., 2010

Design and implementation of hybrid multicore simulators.
Int. J. Embed. Syst., 2010

Time-Predictable L2 Cache Design for High-Performance Real-Time Systems.
Proceedings of the 16th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 2010

A time-predictable dual-core prototype on FPGA.
Proceedings of the 48th Annual Southeast Regional Conference, 2010

Improving the static real-time scheduling on multicore processors by reducing worst-case inter-thread cache interferences.
Proceedings of the 48th Annual Southeast Regional Conference, 2010

2009
Bounding Worst-Case Data Cache Performance by Using Stack Distance.
J. Comput. Sci. Eng., 2009

Optimizing Instruction Prefetching to Improve Worst-Case Performance for Real-Time Applications.
J. Comput. Sci. Eng., 2009

Boosting the Performance of Software-Based Transient Errors Tolerant Techniques through Compiler Optimizations.
J. Circuits Syst. Comput., 2009

Studying Energy-Oriented Dynamic Optimizations in Java Virtual Machines.
J. Circuits Syst. Comput., 2009

Computing and Minimizing Cache Vulnerability to Transient Errors.
IEEE Des. Test Comput., 2009

Improving Java performance and energy dissipation through efficient code caching.
Des. Autom. Embed. Syst., 2009

Exploiting stack distance to estimate worst-case data cache performance.
Proceedings of the 2009 ACM Symposium on Applied Computing (SAC), 2009

Accurately Estimating Worst-Case Execution Time for Multi-core Processors with Shared Direct-Mapped Instruction Caches.
Proceedings of the 15th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 2009

Exploiting Multi-core Processors to Improve Time Predictability for Real-Time Java Computing.
Proceedings of the 15th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 2009

2008
Analyzing the worst-case execution time for instruction caches with prefetching.
ACM Trans. Embed. Comput. Syst., 2008

Exploiting virtual registers to reduce pressure on real registers.
ACM Trans. Archit. Code Optim., 2008

A time-predictable VLIW processor and its compiler support.
Real Time Syst., 2008

WCET Analysis for Multi-Core Processors with Shared L2 Instruction Caches.
Proceedings of the 14th IEEE Real-Time and Embedded Technology and Applications Symposium, 2008

Improving code caching performance for Java applications.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Adaptive Drowsy Cache Control for Java Applications.
Proceedings of the 2008 IEEE/IPIP International Conference on Embedded and Ubiquitous Computing (EUC 2008), 2008

On the Energy Efficiency of Java Virtual Machine.
Proceedings of the 2008 International Conference on Embedded Systems & Applications, 2008

Efficient code caching to improve performance and energy consumption for java applications.
Proceedings of the 2008 International Conference on Compilers, 2008

2007
Reducing branch predictor leakage energy by exploiting loops.
ACM Trans. Embed. Comput. Syst., 2007

Evaluating instruction cache vulnerability to transient errors.
SIGARCH Comput. Archit. News, 2007

Hybrid multi-core architecture for boosting single-threaded performance.
SIGARCH Comput. Archit. News, 2007

Compiler-Assisted Leakage Energy Reduction for Cache Memories.
Adv. Comput., 2007

WCET analysis of instruction caches with prefetching.
Proceedings of the 2007 ACM SIGPLAN/SIGBED Conference on Languages, 2007

Real-time Accurate Object Detection using Multiple Resolutions.
Proceedings of the IEEE 11th International Conference on Computer Vision, 2007

Virtual Registers: Reducing Register Pressure Without Enlarging the Register File.
Proceedings of the High Performance Embedded Architectures and Compilers, 2007

Exploring Functional Unit Design Space of VLIW Processors for Optimizing Both Performance and Energy Consumption.
Proceedings of the 21st International Conference on Advanced Information Networking and Applications (AINA 2007), 2007

Detecting VLIW Hard Errors Cost-Effectively through a Software-Based Approach.
Proceedings of the 21st International Conference on Advanced Information Networking and Applications (AINA 2007), 2007

An Area-Efficient Approach to Improving Register File Reliability against Transient Errors.
Proceedings of the 21st International Conference on Advanced Information Networking and Applications (AINA 2007), 2007

2006
Reducing dynamic and leakage energy in VLIW architectures.
ACM Trans. Embed. Comput. Syst., 2006

Reducing Instruction Translation Look-Aside Buffer Energy Through Compiler-Directed Resizing.
J. Low Power Electron., 2006

Compiler-guided next sub-bank prediction for reducing instruction cache leakage energy.
J. Embed. Comput., 2006

The Impact of Cache Organization in Optimizing Microprocessor Power Consumption.
Proceedings of the 2006 International Conference on Computer Design & Conference on Computing in Nanotechnology, 2006

2005
Reducing data cache leakage energy using a compiler-based approach.
ACM Trans. Embed. Comput. Syst., 2005

Replication Cache: A Small Fully Associative Cache to Improve Data Cache Reliability.
IEEE Trans. Computers, 2005

Exploiting the replication cache to improve performance for multiple-issue microprocessors.
SIGARCH Comput. Archit. News, 2005

Exploiting loop behavior for data cache leakage reduction.
J. Embed. Comput., 2005

A Computational Model of Eye Movements during Object Class Detection.
Proceedings of the Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, 2005

The Role of Top-down and Bottom-up Processes in Guiding Eye Movements during Visual Search.
Proceedings of the Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, 2005

Compiler-guided register reliability improvement against soft errors.
Proceedings of the EMSOFT 2005, 2005

Computing Cache Vulnerability to Transient Errors and Its Implication.
Proceedings of the 20th IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT 2005), 2005

Object Class Recognition Using Multiple Layer Boosting with Heterogeneous Features.
Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), 2005

2004
Reducing instruction cache energy consumption using a compiler-based strategy.
ACM Trans. Archit. Code Optim., 2004

Compiler-Directed Data Cache Leakage Reduction.
Proceedings of the 2004 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2004), 2004

Enhancing data cache reliability by the addition of a small fully-associative replication cache.
Proceedings of the 18th Annual International Conference on Supercomputing, 2004

Loop-based leakage control for branch predictors.
Proceedings of the 2004 International Conference on Compilers, 2004

Static next sub-bank prediction for drowsy instruction cache.
Proceedings of the 2004 International Conference on Compilers, 2004

Replica Victim Caching to Improve Reliability of In-Cache Replication.
Proceedings of the Advances in Computer Systems Architecture, 9th Asia-Pacific Conference, 2004

2003
A compiler approach for reducing data cache energy.
Proceedings of the 17th Annual International Conference on Supercomputing, 2003

ICR: In-Cache Replication for Enhancing Data Cache Reliability.
Proceedings of the 2003 International Conference on Dependable Systems and Networks (DSN 2003), 2003

Compiler-Directed Management of Instruction Accesses.
Proceedings of the 2003 Euromicro Symposium on Digital Systems Design (DSD 2003), 2003

Compiler Support for Reducing Leakage Energy Consumption.
Proceedings of the 2003 Design, 2003

Masking the Energy Behavior of DES Encryption.
Proceedings of the 2003 Design, 2003

Runtime Code Parallelization for On-Chip Multiprocessors.
Proceedings of the 2003 Design, 2003

Implementation and Evaluation of an On-Demand Parameter-Passing Strategy for Reducing Energy.
Proceedings of the 2003 Design, 2003

Data Space Oriented Scheduling in Embedded Systems.
Proceedings of the 2003 Design, 2003

Interprocedural optimizations for improving data cache performance of array-intensive embedded applications.
Proceedings of the 40th Design Automation Conference, 2003

Performance, energy, and reliability tradeoffs in replicating hot cache lines.
Proceedings of the International Conference on Compilers, 2003

Energy-Aware Parameter Passing.
Proceedings of the Embedded Software for SoC, 2003

Data Space Oriented Scheduling.
Proceedings of the Embedded Software for SoC, 2003

Dynamic Parallelization of Array Based On-Chip Multiprocessor Applications.
Proceedings of the Embedded Software for SoC, 2003

2002
Compiler-directed instruction cache leakage optimization.
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

Compiler-directed cache polymorphism.
Proceedings of the 2002 Joint Conference on Languages, 2002

2001
Exploiting VLIW schedule slacks for dynamic and leakage energy reduction.
Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001


  Loading...