2023
Deployment Archetypes for Cloud Applications.
ACM Comput. Surv., 2023
2022
Architectures for Protecting Cloud Data Planes.
CoRR, 2022
2014
Inside windows azure: the challenges and opportunities of a cloud operating system.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014
2012
Erasure Coding in Windows Azure Storage.
Proceedings of the 2012 USENIX Annual Technical Conference, 2012
2011
Windows Azure Storage: a highly available cloud storage service with strong consistency.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 23rd ACM Symposium on Operating Systems Principles 2011, 2011
2008
ACM Trans. Archit. Code Optim., 2008
Reproducible simulation of multi-threaded workloads for architecture design exploration.
Proceedings of the 4th International Symposium on Workload Characterization (IISWC 2008), 2008
Phase-based cache reconfiguration for a highly-configurable two-level cache hierarchy.
Proceedings of the 18th ACM Great Lakes Symposium on VLSI 2008, 2008
2007
Patching Processor Design Errors with Programmable Hardware.
IEEE Micro, 2007
Accelerating Meta Data Checks for Software Correctness and Security.
J. Instr. Level Parallelism, 2007
Automatically classifying benign and harmful data racesallusing replay analysis.
Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, 2007
Cross Binary Simulation Points.
Proceedings of the 2007 IEEE International Symposium on Performance Analysis of Systems and Software, 2007
Representative Multiprogram Workloads for Multithreaded Processor Simulation.
Proceedings of the IEEE 10th International Symposium on Workload Characterization, 2007
Accelerating and Adapting Precomputation Threads for Effcient Prefetching.
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007
Bounds Checking with Taint-Based Analysis.
Proceedings of the High Performance Embedded Architectures and Compilers, 2007
Transient fault prediction based on anomalies in processor events.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007
A Loop Correlation Technique to Improve Performance Auditing.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007
2006
ACM Trans. Archit. Code Optim., 2006
BugNet: Recording Application-Level Execution for Deterministic Replay Debugging.
IEEE Micro, 2006
Efficient Sampling Startup for SimPoint.
IEEE Micro, 2006
Using Machine Learning to Guide Architecture Simulation.
J. Mach. Learn. Res., 2006
The Future of Simulation: A Field of Dreams.
Computer, 2006
Software Profiling for Deterministic Replay Debugging of User Code.
Proceedings of the New Trends in Software Methodologies, Tools and Techniques, 2006
Automatic logging of operating system effects to guide application-level architecture simulation.
Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, 2006
Online performance auditing: using hot optimizations without getting burned.
Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, 2006
Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006
Comparing multinomial and k-means clustering for SimPoint.
Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006
Considering all starting points for simultaneous multithreading simulation.
Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006
Detecting phases in parallel applications on shared memory architectures.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006
Speculative Code Value Specialization Using the Trace Cache Fill Unit.
Proceedings of the 24th International Conference on Computer Design (ICCD 2006), 2006
Patching Processor Design Errors.
Proceedings of the 24th International Conference on Computer Design (ICCD 2006), 2006
A Self-Repairing Prefetcher in an Event-Driven Dynamic Optimization Framework.
Proceedings of the Fourth IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2006), 2006
Selecting Software Phase Markers with Code Structure Analysis.
Proceedings of the Fourth IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2006), 2006
Recording shared memory dependencies using strata.
Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006
Unbounded page-based transactional memory.
Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006
2005
SimPoint 3.0: Faster and More Flexible Program Phase Analysis.
J. Instr. Level Parallelism, 2005
The entropia virtual machine for desktop grids.
Proceedings of the 1st International Conference on Virtual Execution Environments, 2005
The Strong correlation Between Code Signatures and Performance.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Motivation for Variable Length Intervals and Hierarchical Phase Behavior.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging.
Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005
A Dependency Chain Clustered Microarchitecture.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005
Transition Phase Classification and Prediction.
Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005
Exploiting a Computation Reuse Cache to Reduce Energy in Network Processors.
Proceedings of the High Performance Embedded Architectures and Compilers, 2005
Efficient Sampling Startup for Sampled Processor Simulation.
Proceedings of the High Performance Embedded Architectures and Compilers, 2005
Dynamic phase analysis for cycle-close trace generation.
Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2005
An Event-Driven Multithreaded Dynamic Optimization Framework.
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005
Variational Path Profiling.
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005
2004
How to use SimPoint to pick simulation points.
SIGMETRICS Perform. Evaluation Rev., 2004
Using a serial cache for energy efficient instruction fetching.
J. Syst. Archit., 2004
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy.
Proceedings of the 37th Annual International Symposium on Microarchitecture (MICRO-37 2004), 2004
Hardware and Binary Modification Support for Code Pointer Protection From Buffer Overflow.
Proceedings of the 37th Annual International Symposium on Microarchitecture (MICRO-37 2004), 2004
Structures for phase classification.
Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software, 2004
The future of simulation: A field of dreams.
Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software, 2004
A co-phase matrix to guide simultaneous multithreading simulation.
Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software, 2004
Deterministic Memory-Efficient String Matching Algorithms for Intrusion Detection.
Proceedings of the Proceedings IEEE INFOCOM 2004, 2004
Creating Converged Trace Schedules Using String Matching.
Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004
Balancing design options with Sherpa.
Proceedings of the 2004 International Conference on Compilers, 2004
2003
A Decoupled Predictor-Directed Stream Prefetching Architecture.
IEEE Trans. Computers, 2003
Discovering and Exploiting Program Phases.
IEEE Micro, 2003
Entropia: architecture and performance of an enterprise desktop grid system.
J. Parallel Distributed Comput., 2003
Using SimPoint for accurate and efficient simulation.
Proceedings of the International Conference on Measurements and Modeling of Computer Systems, 2003
A Pipelined Memory Architecture for High Throughput Network Processors.
Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003
Phase Tracking and Prediction.
Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003
Predicate prediction for efficient out-of-order execution.
Proceedings of the 17th Annual International Conference on Supercomputing, 2003
Incorporating Predicate Information into Branch Predictors.
Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003
Catching Accurate Profiles in Hardwar.
Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003
Phi-Predication for Light-Weight If-Conversion.
Proceedings of the 1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), 2003
Reducing code size with echo instructions.
Proceedings of the International Conference on Compilers, 2003
Picking Statistically Valid and Early Simulation Points.
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT 2003), 27 September, 2003
2002
Pointer cache assisted prefetching.
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002
High Performance and Energy Efficient Serial Prefetch Architecture.
Proceedings of the High Performance Computing, 4th International Symposium, 2002
An EPIC Processor with Pending Functional Units.
Proceedings of the High Performance Computing, 4th International Symposium, 2002
Using predicate path information in hardware to determine true dependences.
Proceedings of the 16th international conference on Supercomputing, 2002
Quantifying Load Stream Behavior.
Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002
Automatically characterizing large scale program behavior.
Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2002
Quantifying Instruction Criticality.
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 2002), 2002
2001
Optimizations Enabled by a Decoupled Front-End Architecture.
IEEE Trans. Computers, 2001
Reducing the overhead of dynamic compilation.
Softw. Pract. Exp., 2001
Using Annotation to Reduce Dynamic Optimization Time.
Proceedings of the 2001 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2001
Automated design of finite state machine predictors for customized processors.
Proceedings of the 28th Annual International Symposium on Computer Architecture, 2001
Reducing Delay with Dynamic Selection of Compression Formats.
Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing (HPDC-10 2001), 2001
Dynamic Prediction of Critical Path Instructions.
Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001
Patchable instruction ROM architecture.
Proceedings of the 2001 International Conference on Compilers, 2001
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications.
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), 2001
2000
Limits of task-based parallelism in irregular applications.
SIGARCH Comput. Archit. News, 2000
A Comparative Survey of Load Speculation Architectures.
J. Instr. Level Parallelism, 2000
Path Analysis and Renaming for Predicated Instruction Scheduling.
Int. J. Parallel Program., 2000
Predictor-directed stream buffers.
Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000
Loop Termination Prediction.
Proceedings of the High Performance Computing, Third International Symposium, 2000
ToolBlocks: An Infrastructure for the Construction of Memory Hierarchy Analysis Tools (Research Note).
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000
1999
A comparison of software code reordering and victim buffers.
SIGARCH Comput. Archit. News, 1999
The Precomputed-Branch architecture: Efficient branches with compiler support.
J. Syst. Archit., 1999
Value Profiling and Optimization.
J. Instr. Level Parallelism, 1999
Reducing Transfer Delay Using Java Class File Splitting and Prefetching.
Proceedings of the 1999 ACM SIGPLAN Conference on Object-Oriented Programming Systems, 1999
Fetch Directed Instruction Prefetching.
Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture, 1999
A Scalable Front-End Architecture for Fast Instruction Delivery.
Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999
Selective Value Prediction.
Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999
Reducing cache misses using hardware and software page placement.
Proceedings of the 13th international conference on Supercomputing, 1999
Classifying load and store instructions for memory renaming.
Proceedings of the 13th international conference on Supercomputing, 1999
Instruction Recycling on a Multiple-Path Processor.
Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, 1999
Predicated Static Single Assignment.
Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, 1999
1998
Predictive Techniques for Aggressive Load Speculation.
Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998
Threaded Multiple Path Execution.
Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998
Overlapping Execution with Transfer Using Non-Strict Execution for Mobile Programs.
Proceedings of the ASPLOS-VIII Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, 1998
Cache-Conscious Data Placement.
Proceedings of the ASPLOS-VIII Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, 1998
Dynamic Hammock Predication for Non-Predicated Instruction Set Architectures.
Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, 1998
1997
Evidence-Based Static Branch Prediction Using Machine Learning.
ACM Trans. Program. Lang. Syst., 1997
Efficient Procedure Mapping Using Cache Line Coloring.
Proceedings of the ACM SIGPLAN '97 Conference on Programming Language Design and Implementation (PLDI), 1997
Procedure Placement Using Temporal Ordering Information.
Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, 1997
Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, 1997
1996
Predictive Sequential Associative Cache.
Proceedings of the Second International Symposium on High-Performance Computer Architecture, 1996
1995
Corpus-Based Static Branch Prediction.
Proceedings of the ACM SIGPLAN'95 Conference on Programming Language Design and Implementation (PLDI), 1995
The predictability of branches in libraries.
Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, Michigan, USA, November 29, 1995
A system level perspective on branch architecture performance.
Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, Michigan, USA, November 29, 1995
Instruction Cache Fetch Policies for Speculative Execution.
Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995
Next Cache Line and Set Prediction.
Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995
1994
Reducing Indirect Function call Overhead in C++ Programs.
Proceedings of the Conference Record of POPL'94: 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 1994
Fast and Accurate Instruction Fetch and Branch Prediction.
Proceedings of the 21st Annual International Symposium on Computer Architecture. Chicago, 1994
Reducing Branch Costs via Branch Alignment.
Proceedings of the ASPLOS-VI Proceedings, 1994
1993
Leapfrogging: A Portable Technique for Implementing Efficient Futures.
Proceedings of the Fourth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), 1993