Lieven Eeckhout

Orcid: 0000-0001-8792-4473

  • Ghent University, Gent, Belgium

According to our database1, Lieven Eeckhout authored at least 284 papers between 1999 and 2024.

Collaborative distances:


ACM Fellow

ACM Fellow 2021, "For contributions in computer architecture performance analysis and modeling".

IEEE Fellow

IEEE Fellow 2018, "For contributions in computer architecture performance analysis and modeling".



In proceedings 
PhD thesis 


Online presence:



Toward Sustainable Computer Systems.
Computer, February, 2024

Decoupled Vector Runahead for Prefetching Nested Memory-Access Chains.
IEEE Micro, 2024

Per-Instruction Cycle Stacks Through Time-Proportional Event Analysis.
IEEE Micro, 2024

Sustainable Hardware Specialization.
CoRR, 2024

Improving Multi-Instance GPU Efficiency via Sub-Entry Sharing TLB Design.
CoRR, 2024

R.I.P. Geomean Speedup Use Equal-Work (Or Equal-Time) Harmonic Mean Speedup Instead.
IEEE Comput. Archit. Lett., 2024

Scalar Vector Runahead.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

STAR: Sub-Entry Sharing-Aware TLB for Multi-Instance GPU.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

GPU Scale-Model Simulation.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

FOCAL: A First-Order Carbon Model to Assess Processor Sustainability.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Characterizing Multi-Chip GPU Data Sharing.
ACM Trans. Archit. Code Optim., December, 2023

Photonic Network-on-Wafer for Multichiplet GPUs.
IEEE Micro, 2023

Kaya for Computer Architects: Toward Sustainable Computer Systems.
IEEE Micro, 2023

Decoupled Vector Runahead.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Sieve: Stratified GPU-Compute Workload Sampling.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023

SAC: Sharing-Aware Caching in Multi-Chip GPUs.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

TEA: Time-Proportional Event Analysis.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

NUBA: Non-Uniform Bandwidth GPUs.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

VMT: Virtualized Multi-Threading for Accelerating Graph Workloads on Commodity Processors.
IEEE Trans. Computers, 2022

The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture.
ACM Trans. Archit. Code Optim., 2022

Vector Runahead for Indirect Memory Accesses.
IEEE Micro, 2022

A First-Order Model to Assess Computer Architecture Sustainability.
IEEE Comput. Archit. Lett., 2022

Scale-Model Architectural Simulation.
Proceedings of the International IEEE Symposium on Performance Analysis of Systems and Software, 2022

Delegated Replies: Alleviating Network Clogging in Heterogeneous Architectures.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

Reliability-Aware Runahead.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

Reliability-aware Garbage Collection for Hybrid HBM-DRAM Memories.
ACM Trans. Archit. Code Optim., 2021

Scale-Model Simulation.
IEEE Comput. Archit. Lett., 2021

TIP: Time-Proportional Instruction Profiling.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Vector Runahead.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

Cactus: Top-Down GPU-Compute Benchmarking using Real-Life Applications.
Proceedings of the IEEE International Symposium on Workload Characterization, 2021

COPA: Highly Cost-Effective Power Back-Up for Green Datacenters.
IEEE Trans. Parallel Distributed Syst., 2020

Thread Isolation to Improve Symbiotic Scheduling on SMT Multicore Processors.
IEEE Trans. Parallel Distributed Syst., 2020

MDM: The GPU Memory Divergence Model.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Selective Replication in Memory-Side GPU Caches.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

A Rigorous Benchmarking and Performance Analysis Methodology for Python Workloads.
Proceedings of the IEEE International Symposium on Workload Characterization, 2020

HSM: A Hybrid Slowdown Model for Multitasking GPUs.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

The Forward Slice Core Microarchitecture.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

HeteroCore GPU to Exploit TLP-Resource Diversity.
IEEE Trans. Parallel Distributed Syst., 2019

CD-Xbar: A Converge-Diverge Crossbar Network for High-Performance GPUs.
IEEE Trans. Computers, 2019

Intra-Cluster Coalescing and Distributed-Block Scheduling to Reduce GPU NoC Pressure.
IEEE Trans. Computers, 2019

Crystal Gazer: Profile-Driven Write-Rationing Garbage Collection for Hybrid Memories.
Proc. ACM Meas. Anal. Comput. Syst., 2019

Modeling Emerging Memory-Divergent GPU Applications.
IEEE Comput. Archit. Lett., 2019

Precise Runahead Execution.
IEEE Comput. Archit. Lett., 2019

Directed Statistical Warming through Time Traveling.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

RPPM: Rapid Performance Prediction of Multithreaded Workloads on Multicore Processors.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

Emulating and Evaluating Hybrid Memory for Managed Languages on NUMA Hardware.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

Racing to Hardware-Validated Simulation.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

Adaptive memory-side last-level GPU caching.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

MIA: Metric Importance Analysis for Big Data Workload Characterization.
IEEE Trans. Parallel Distributed Syst., 2018

QIG: Quantifying the Importance and Interaction of GPGPU Architecture Parameters.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

Optimizing Soft Error Reliability Through Scheduling on Heterogeneous Multicore Processors.
IEEE Trans. Computers, 2018

Hardware Acceleration and a Grateful Goodbye.
IEEE Micro, 2018

Memristors and More.
IEEE Micro, 2018

Approximate Computing, Intelligent Computing.
IEEE Micro, 2018

Top Picks.
IEEE Micro, 2018

Hot Chips 29.
IEEE Micro, 2018

Automotive Computing, Neuromorphic Computing, and Beyond.
IEEE Micro, 2018

Emulating Hybrid Memory on NUMA Hardware.
CoRR, 2018

Modeling Superscalar Processor Memory-Level Parallelism.
IEEE Comput. Archit. Lett., 2018

RPPM: Rapid Performance Prediction of Multithreaded Applications on Multicore Hardware.
IEEE Comput. Archit. Lett., 2018

Managing hybrid memories by predicting object write intensity.
Proceedings of the Conference Companion of the 2nd International Conference on Art, 2018

Write-rationing garbage collection for hybrid memories.
Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018

Architectural Support for Probabilistic Branches.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Get Out of the Valley: Power-Efficient Address Mapping for GPUs.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

Intra-Cluster Coalescing to Reduce GPU NoC Pressure.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Classification-Driven Search for Effective SM Partitioning in Multitasking GPUs.
Proceedings of the 32nd International Conference on Supercomputing, 2018

GDP: Using Dataflow Properties to Accurately Estimate Interference-Free Performance at Runtime.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

Improving IBM POWER8 Performance Through Symbiotic Job Scheduling.
IEEE Trans. Parallel Distributed Syst., 2017

Linear Branch Entropy: Characterizing and Optimizing Branch Behavior in a Micro-Architecture Independent Way.
IEEE Trans. Computers, 2017

DEP+BURST: Online DVFS Performance Prediction for Energy-Efficient Managed Language Execution.
IEEE Trans. Computers, 2017

Moore's Law and Ultra-Low-Power Processors.
IEEE Micro, 2017

From Cool Chips to Hot Interconnects.
IEEE Micro, 2017

Is Moore's Law Slowing Down? What's Next?
IEEE Micro, 2017

Thoughts on the Top Picks Selections.
IEEE Micro, 2017

Hot Chips: Industry and Academia Cutting-Edge Microprocessors.
IEEE Micro, 2017

Looking Forward to Upcoming Themes.
IEEE Micro, 2017

Shared resource aware scheduling on power-constrained tiled many-core processors.
J. Parallel Distributed Comput., 2017

LA-LLC: Inter-Core Locality-Aware Last-Level Cache to Exploit Many-to-Many Traffic in GPGPUs.
IEEE Comput. Archit. Lett., 2017

Mind The Power Holes: Sifting Operating Points in Power-Limited Heterogeneous Multicores.
IEEE Comput. Archit. Lett., 2017

Analyzing the scalability of managed language applications with speedup stacks.
Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017

BACM: Barrier-Aware Cache Management for Irregular Memory-Intensive GPGPU Workloads.
Proceedings of the 2017 IEEE International Conference on Computer Design, 2017

Reliability-Aware Scheduling on Heterogeneous Multicore Processors.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

Application Clustering Policies to Address System Fairness with Intel's Cache Allocation Technology.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

POSTER: BACM: Barrier-Aware Cache Management for Irregular Memory-Intensive GPGPU Workloads.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

RFHOC: A Random-Forest Approach to Auto-Tuning Hadoop's Configuration.
IEEE Trans. Parallel Distributed Syst., 2016

The Truth, The Whole Truth, and Nothing But the Truth: A Pragmatic Guide to Assessing Empirical Evaluations.
ACM Trans. Program. Lang. Syst., 2016

ShenZhen transportation system (SZTS): a novel big data benchmark suite.
J. Supercomput., 2016

Analytical Processor Performance and Power Modeling Using Micro-Architecture Independent Characteristics.
IEEE Trans. Computers, 2016

State of the Journal.
IEEE Trans. Computers, 2016

Two-Level Hybrid Sampled Simulation of Multithreaded Applications.
ACM Trans. Archit. Code Optim., 2016

MInGLE: An Efficient Framework for Domain Acceleration Using Low-Power Specialized Functional Units.
ACM Trans. Archit. Code Optim., 2016

Boosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors.
ACM Trans. Archit. Code Optim., 2016

Maximizing Heterogeneous Processor Performance Under Power Constraints.
ACM Trans. Archit. Code Optim., 2016

The Internet of Things Revolution.
IEEE Micro, 2016

Security and Our Reader Survey.
IEEE Micro, 2016

Hot Interconnects and Debates on Computer Architecture Research Directions.
IEEE Micro, 2016

Top Picks and Welcoming New Editorial Board Members.
IEEE Micro, 2016

Hot Chips: The Annual Feast of Riches.
IEEE Micro, 2016

Looking Forward to the 2016 Theme Issues.
IEEE Micro, 2016

DVFS performance prediction for managed multithreaded applications.
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

Barrier-Aware Warp Scheduling for Throughput Processors.
Proceedings of the 2016 International Conference on Supercomputing, 2016

Thread Similarity Matrix: Visualizing Branch Divergence in GPGPU Programs.
Proceedings of the 45th International Conference on Parallel Processing, 2016

A heterogeneous low-cost and low-latency Ring-Chain network for GPGPUs.
Proceedings of the 34th IEEE International Conference on Computer Design, 2016

A low-cost conflict-free NoC for GPGPUs.
Proceedings of the 53rd Annual Design Automation Conference, 2016

Mechanistic Modeling of Architectural Vulnerability Factor.
ACM Trans. Comput. Syst., 2015

GPGPU-MiniBench: Accelerating GPGPU Micro-Architecture Simulation.
IEEE Trans. Computers, 2015

Practical Iterative Optimization for the Data Center.
ACM Trans. Archit. Code Optim., 2015

Performance Evaluation and Its Impact on Design.
IEEE Micro, 2015

The Structure of Computer Architecture (R)evolution.
IEEE Micro, 2015

Heterogeneity in Response to the Power Wall.
IEEE Micro, 2015

The State of the Computer Architecture Field and Its Top Picks.
IEEE Micro, 2015

Hot Chips in an Increasingly Diverse Microprocessor Landscape.
IEEE Micro, 2015

Building on 35 Years toward a Vibrant Future.
IEEE Micro, 2015

Micro-architecture independent analytical processor performance and power modeling.
Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015

Micro-architecture independent branch behavior characterization.
Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015

The load slice core microarchitecture.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

SZTS: A Novel Big Data Transportation System Benchmark Suite.
Proceedings of the 44th International Conference on Parallel Processing, 2015

Shorter On-Line Warmup for Sampled Simulation of Multi-threaded Applications.
Proceedings of the 44th International Conference on Parallel Processing, 2015

Chrysso: an integrated power manager for constrained many-core processors.
Proceedings of the 12th ACM International Conference on Computing Frontiers, 2015

Automatic design of domain-specific instructions for low-power processors.
Proceedings of the 26th IEEE International Conference on Application-specific Systems, 2015

Performance Portability Across Heterogeneous SoCs Using a Generalized Library-Based Approach.
ACM Trans. Archit. Code Optim., 2014

An Evaluation of High-Level Mechanistic Core Models.
ACM Trans. Archit. Code Optim., 2014

Mechanistic Analytical Modeling of Superscalar In-Order Processor Performance.
ACM Trans. Archit. Code Optim., 2014

Restating the Case for Weighted-IPC Metrics to Evaluate Multiprogram Workload Performance.
IEEE Comput. Archit. Lett., 2014

BarrierPoint: Sampled simulation of multi-threaded applications.
Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, 2014

Automatic SMT threading for OpenMP applications on the Intel Xeon Phi co-processor.
Proceedings of the 4th International Workshop on Runtime and Operating Systems for Supercomputers, 2014

Undersubscribed threading on clustered cache architectures.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

Cooperative cache scrubbing.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

PCantorSim: Accelerating parallel architecture simulation through fractal-based sampling.
ACM Trans. Archit. Code Optim., 2013

Accelerating an application domain with specialized functional units.
ACM Trans. Archit. Code Optim., 2013

Understanding fundamental design choices in single-ISA heterogeneous multicore architectures.
ACM Trans. Archit. Code Optim., 2013

Selecting representative benchmark inputs for exploring microprocessor design spaces.
ACM Trans. Archit. Code Optim., 2013

Per-thread cycle accounting in multicore processors.
ACM Trans. Archit. Code Optim., 2013

Accelerating GPGPU architecture simulation.
Proceedings of the ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, 2013

Node Performance and Energy Analysis with the Sniper Multi-core Simulator.
Proceedings of the Tools for High Performance Computing 2013, 2013

Bottle graphs: visualizing scalability bottlenecks in multi-threaded applications.
Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, 2013

Sampled simulation of multi-threaded applications.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

Criticality stacks: identifying critical threads in parallel programs using synchronization behavior.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

Fairness-aware scheduling on single-ISA heterogeneous multi-cores.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

Exploiting media stream similarity for energy-efficient decoding and resource prediction.
ACM Trans. Embed. Comput. Syst., 2012

VSim: Simulating multi-server setups at near native hardware speed.
ACM Trans. Archit. Code Optim., 2012

Probabilistic modeling for job symbiosis scheduling on SMT processors.
ACM Trans. Archit. Code Optim., 2012

Deconstructing iterative optimization.
ACM Trans. Archit. Code Optim., 2012

SWAP: Parallelization through Algorithm Substitution.
IEEE Micro, 2012

Studying hardware and software trade-offs for a real-life web 2.0 workload.
Proceedings of the Third Joint WOSP/SIPEW International Conference on Performance Engineering, 2012

Workload generation for microprocessor performance evaluation: SPEC PhD award (invited abstract).
Proceedings of the Third Joint WOSP/SIPEW International Conference on Performance Engineering, 2012

Exploring multi-threaded Java application performance on multicore hardware.
Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2012

Speedup stacks: Identifying scaling bottlenecks in multi-threaded applications.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012

A mechanistic performance model for superscalar in-order processors.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012

A first-order mechanistic model for architectural vulnerability factor.
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

Scheduling heterogeneous multi-cores through performance impact estimation (PIE).
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

An efficient CPI stack counter architecture for superscalar processors.
Proceedings of the Great Lakes Symposium on VLSI 2012, 2012

Iterative optimization for the data center.
Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012

Power-aware multi-core simulation for early design stage hardware/software co-optimization.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Finding Extreme Behaviors in Microprocessor Workloads.
Trans. High Perform. Embed. Archit. Compil., 2011

Characterizing Time-Varying Program Behavior Using Phase Complexity Surfaces.
Trans. High Perform. Embed. Archit. Compil., 2011

Fine-grained DVFS using on-chip regulators.
ACM Trans. Archit. Code Optim., 2011

Automated Full-System Power Characterization.
IEEE Micro, 2011

Trends in Server Energy Proportionality.
Computer, 2011

Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation.
Proceedings of the Conference on High Performance Computing Networking, 2011

How sensitive is processor customization to the workload's input datasets?
Proceedings of the IEEE 9th Symposium on Application Specific Processors, 2011

Using Fast and Accurate Simulation to Explore Hardware/Software Trade-offs in the Multi-Core Era.
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011

Mechanistic-empirical processor performance modeling for constructing CPI stacks on real hardware.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011

Predictive Learning in Two-Way Datasets.
Proceedings of the Latest Advances in Inductive Logic Programming, 2011

Ranking commercial machines through data transposition.
Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011

Using cycle stacks to understand scaling bottlenecks in multi-threaded workloads.
Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011

The Multi-Program Performance Model: Debunking current practice in multi-core simulation.
Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011

Optimizing the datacenter for data-centric workloads.
Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

SWEEP: evaluating computer system energy efficiency using synthetic workloads.
Proceedings of the High Performance Embedded Architectures and Compilers, 2011

Evaluating Application Vulnerability to Soft Errors in Multi-level Cache Hierarchy.
Proceedings of the Euro-Par 2011: Parallel Processing Workshops - CCPI, CGWS, HeteroPar, HiBB, HPCVirt, HPPC, HPSS, MDGS, ProPer, Resilience, UCHPC, VHPC, Bordeaux, France, August 29, 2011

Virtual Manycore platforms: Moving towards 100+ processor cores.
Proceedings of the Design, Automation and Test in Europe, 2011

Computer Architecture Performance Evaluation Methods
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01727-8, 2010

A Counter Architecture for Online DVFS Profitability Estimation.
IEEE Trans. Computers, 2010

Fast, Accurate, and Validated Full-System Software Simulation of x86 Hardware.
IEEE Micro, 2010

Per-Thread Cycle Accounting.
IEEE Micro, 2010

Workload Reduction and Generation Techniques.
IEEE Micro, 2010

Scenario-Based Resource Prediction for QoS-Aware Media Processing.
Computer, 2010

Evaluating iterative optimization across 1000 datasets.
Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2010

AVF Stressmark: Towards an Automated Methodology for Bounding the Worst-Case Vulnerability to Soft Errors.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Modeling critical sections in Amdahl's law and its implications for multicore design.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

Benchmark synthesis for architecture and compiler exploration.
Proceedings of the 2010 IEEE International Symposium on Workload Characterization, 2010

Interval simulation: Raising the level of abstraction in architectural simulation.
Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

Automated just-in-time compiler tuning.
Proceedings of the CGO 2010, 2010

Probabilistic job symbiosis modeling for SMT processor scheduling.
Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

System-scenario-based design of dynamic embedded systems.
ACM Trans. Design Autom. Electr. Syst., 2009

A mechanistic performance model for superscalar out-of-order processors.
ACM Trans. Comput. Syst., 2009

Branch Predictor Warmup for Sampled Simulation through Branch History Matching.
Trans. High Perform. Embed. Archit. Compil., 2009

Chip Multiprocessor Design Space Exploration through Statistical Simulation.
IEEE Trans. Computers, 2009

Memory-level parallelism aware fetch policies for simultaneous multithreading processors.
ACM Trans. Archit. Code Optim., 2009

A Methodology for Analyzing Commercial Processor Performance Numbers.
Computer, 2009

Finding Stress Patterns in Microprocessor Workloads.
Proceedings of the High Performance Embedded Architectures and Compilers, 2009

MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor.
Proceedings of the High Performance Embedded Architectures and Compilers, 2009

Per-thread cycle accounting in SMT processors.
Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, 2009

Memory Data Flow Modeling in Statistical Simulation for the Efficient Exploration of Microprocessor Design Spaces.
IEEE Trans. Computers, 2008

Distilling the essence of proprietary workloads into miniature benchmarks.
ACM Trans. Archit. Code Optim., 2008

System-Level Performance Metrics for Multiprogram Workloads.
IEEE Micro, 2008

Accurate and Efficient Cache Warmup for Sampled Processor Simulation Through NSL-BLRL.
Comput. J., 2008

Sampled Processor Simulation- A Survey.
Adv. Comput., 2008

Java performance evaluation through rigorous replay compilation.
Proceedings of the 23rd Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2008

Characterizing the Unique and Diverse Behaviors in Existing and Emerging General-Purpose and Domain-Specific Benchmark Suites.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2008

Automated microprocessor stressmark generation.
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

Phase Complexity Surfaces: Characterizing Time-Varying Program Behavior.
Proceedings of the High Performance Embedded Architectures and Compilers, 2008

Studying Compiler Optimizations on Superscalar Processors Through Interval Analysis.
Proceedings of the High Performance Embedded Architectures and Compilers, 2008

Automated hardware-independent scenario identification.
Proceedings of the 45th Design Automation Conference, 2008

Cole: compiler optimization level exploration.
Proceedings of the Sixth International Symposium on Code Generation and Optimization (CGO 2008), 2008

Dispersing proprietary applications as benchmarks through code mutation.
Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, 2008

GCH: Hints for Triggering Garbage Collections.
Trans. High Perform. Embed. Archit. Compil., 2007

Java object header elimination for reduced memory consumption in 64-bit virtual machines.
ACM Trans. Archit. Code Optim., 2007

Microarchitecture-Independent Workload Characterization.
IEEE Micro, 2007

A Top-Down Approach to Architecting CPI Component Performance Counters.
IEEE Micro, 2007

Exploiting program phase behavior for energy reduction on multi-configuration processors.
J. Syst. Archit., 2007

Analyzing commercial processor performance numbers for predicting performance of applications of interest.
Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2007

Adding rigorous statistics to the Java benchmarker's toolbox.
Proceedings of the Companion to the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2007

Statistically rigorous java performance evaluation.
Proceedings of the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2007

Using hpm-sampling to drive dynamic compilation.
Proceedings of the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2007

Exploiting Video Stream Similarity for Energy-Efficient Decoding.
Proceedings of the Advances in Multimedia Modeling, 2007

Representative Multiprogram Workloads for Multithreaded Processor Simulation.
Proceedings of the IEEE 10th International Symposium on Workload Characterization, 2007

Statistical simulation of chip multiprocessors running multi-program workloads.
Proceedings of the 25th International Conference on Computer Design, 2007

A Memory-Level Parallelism Aware Fetch Policy for SMT Processors.
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

Branch History Matching: Branch Predictor Warmup for Sampled Simulation.
Proceedings of the High Performance Embedded Architectures and Compilers, 2007

Object-Relative Addressing: Compressed Pointers in 64-Bit Java Virtual Machines.
Proceedings of the ECOOP 2007 - Object-Oriented Programming, 21st European Conference, Berlin, Germany, July 30, 2007

Resource prediction for media stream decoding.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

Exploring the Application Behavior Space Using Parameterized Synthetic Benchmarks.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

Studying Compiler-Microarchitecture Interactions through Interval Analysis.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

Measuring Benchmark Similarity Using Inherent Program Characteristics.
IEEE Trans. Computers, 2006

64-bit versus 32-bit Virtual Machines for Java.
Softw. Pract. Exp., 2006

Efficient Sampling Startup for SimPoint.
IEEE Micro, 2006

Yet shorter warmup by combining no-state-loss and MRRL for sampled LRU cache simulation.
J. Syst. Softw., 2006

Pattern-driven prefetching for multimedia applications on embedded processors.
J. Syst. Archit., 2006

Improved composite confidence mechanisms for a perceptron branch predictor.
J. Syst. Archit., 2006

The Future of Simulation: A Field of Dreams.
Computer, 2006

Javana: a system for building customized Java program analysis tools.
Proceedings of the 21th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2006

Building Java program analysis tools using Javana.
Proceedings of the Companion to the 21th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2006

Evaluating the efficacy of statistical simulation for design space exploration.
Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006

Characterizing the branch misprediction penalty.
Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006

Considering all starting points for simultaneous multithreading simulation.
Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006

Evaluating Benchmark Subsetting Approaches.
Proceedings of the 2006 IEEE International Symposium on Workload Characterization, 2006

Performance Cloning: A Technique for Disseminating Proprietary Applications as Benchmarks.
Proceedings of the 2006 IEEE International Symposium on Workload Characterization, 2006

Comparing Benchmarks Using Key Microarchitecture-Independent Characteristics.
Proceedings of the 2006 IEEE International Symposium on Workload Characterization, 2006

The exigency of benchmark and compiler drift: designing tomorrow's processors with yesterday's tools.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

Accurate memory data flow modeling in statistical simulation.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

Efficient design space exploration of high performance embedded out-of-order processors.
Proceedings of the Conference on Design, Automation and Test in Europe, 2006

Space-Efficient 64-bit Java Objects through Selective Typed Virtual Addressing.
Proceedings of the Fourth IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2006), 2006

A performance counter architecture for computing accurate CPI components.
Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006

NSL-BLRL: Efficient CacheWarmup for Sampled Processor Simulation.
Proceedings of the Proceedings 39th Annual Simulation Symposium (ANSS-39 2006), 2006

Performance prediction based on inherent program similarity.
Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006

Optimal sample length for efficient cache simulation.
J. Syst. Archit., 2005

SMA: A Self-Monitored Adaptive Cache Warm-Up Scheme for Microprocessor Simulation.
Int. J. Parallel Program., 2005

Middleware benchmarking: approaches, results, experiences.
Concurr. Comput. Pract. Exp., 2005

BLRL: Accurate and Efficient Warmup for Sampled Processor Simulation.
Comput. J., 2005

Offline Phase Analysis and Optimization for Multi-configuration Processors.
Proceedings of the Embedded Computer Systems: Architectures, 2005

Measuring Program Similarity: Experiments with SPEC CPU Benchmark Suites.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Garbage Collection Hints.
Proceedings of the High Performance Embedded Architectures and Compilers, 2005

Efficient Sampling Startup for Sampled Processor Simulation.
Proceedings of the High Performance Embedded Architectures and Compilers, 2005

A Detailed Study on Phase Predictors.
Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

Comparing Low-Level Behavior of SPEC CPU and Java Workloads.
Proceedings of the Advances in Computer Systems Architecture, 10th Asia-Pacific Conference, 2005

Using Decision Trees to Improve Program-Based and Profile-Based Static Branch Prediction.
Proceedings of the Advances in Computer Systems Architecture, 10th Asia-Pacific Conference, 2005

Speeding Up Architectural Simulations for High-Performance Processors.
Simul., 2004

Efficient simulation of trace samples on parallel machines.
Parallel Comput., 2004

How accurate should early design stage power/performance tools be? A case study with statistical simulation.
J. Syst. Softw., 2004

Efficient architectural design of high performance microprocessors.
Adv. Comput., 2004

Low-level behavioral analysis of the JVT/AVC decoder.
Proceedings of the Visual Communications and Image Processing 2004, 2004

Self-Monitored Adaptive Cache Warm-Up for Microprocessor Simulation.
Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2004), 2004

Method-level phase behavior in java workloads.
Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2004

Bottleneck analysis in java applications using hardware performance monitors.
Proceedings of the Companion to the 19th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2004

Efficient architectural design of high performance microprocessors.
Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software, 2004

Control Flow Modeling in Statistical Simulation for Accurate and Efficient Processor Design Studies.
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

Adaptive Prefetching for Multimedia Applications in Embedded Systems.
Proceedings of the 2004 Design, 2004

Statistical Simulation: Adding Efficiency to the Computer Designer's Toolbox.
IEEE Micro, 2003

Quantifying behavioral differences between multimedia and general-purpose workloads.
J. Syst. Archit., 2003

Quantifying the Impact of Input Data Sets on Program Behavior and its Applications.
J. Instr. Level Parallelism, 2003

Designing Computer Architecture Research Workloads.
Computer, 2003

Comparing Multiported Cache Schemes.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2003

How java programs interact with virtual machines at the microarchitectural level.
Proceedings of the 2003 ACM SIGPLAN Conference on Object-Oriented Programming Systems, 2003

Efficient Microprocessor Design Space Exploration through Statistical Simulatio.
Proceedings of the Proceedings 36th Annual Simulation Symposium (ANSS-36 2003), Orlando, Florida, USA, March 30, 2003

Workload Design: Selecting Representative Program-Input Pairs.
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 2002), 2002

Early design phase power/performance modeling through statistical simulation.
Proceedings of the 2001 IEEE International Symposium on Performance Analysis of Systems and Software, 2001

Hybrid Analytical-Statistical Modeling for Efficiently Exploring Architecture and Workload Design Spaces.
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), 2001

Early design stage exploration of fixed-length block structured architectures.
J. Syst. Archit., 2000

Performance analysis through synthetic trace generation.
Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software, 2000

On the Feasibility of Fixed-Length Block Structured Architectures.
Proceedings of the 5th Australasian Computer Architecture Conference (ACAC 2000), 31 January, 2000

Estimating IPC of a block structured instruction set architecture in an early design stage.
Proceedings of the Parallel Computing: Fundamentals & Applications, 1999

Investigating the Implementation of a Block Structured Architecture in an Early Design Stage.
Proceedings of the 25th EUROMICRO '99 Conference, 1999
