Accelerator Architectures A Ten-Year Retrospective.
IEEE Micro, 2018

Hybrid latency tolerance for robust energy-efficiency on 1000-core data parallel processors.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

Immersive Visual Communication.
IEEE Signal Process. Mag., 2011

Cohesion: An Adaptive Hybrid Memory Model for Accelerators.
IEEE Micro, 2011

Rigel: A 1, 024-Core Single-Chip Accelerator Architecture.
IEEE Micro, 2011

OUTRIDER: efficient memory latency tolerance with decoupled strands.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

Accelerating aerial image simulation with GPU.
Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011

Decoupled Architectures as a Low-Complexity Alternative to Out-of-order Execution.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

A Task-Centric Memory Model for Scalable Accelerator Architectures.
IEEE Micro, 2010

An adaptive performance modeling tool for GPU architectures.
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Implementing a GPU Programming Model on a Non-GPU Accelerator Architecture.
Proceedings of the Computer Architecture, 2010

Cohesion: a hybrid memory model for accelerators.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

GoldMine: Automatic assertion generation using data mining and static analysis.
Proceedings of the Design, Automation and Test in Europe, 2010

An integrated framework for joint design space exploration of microarchitecture and circuits.
Proceedings of the Design, Automation and Test in Europe, 2010

An asymmetric distributed shared memory model for heterogeneous parallel systems.
Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

WAYPOINT: scaling coherence to thousand-core architectures.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

Fool me twice: Exploring and exploiting error tolerance in physics-based animation.
ACM Trans. Graph., 2009

The parallelization of video processing.
IEEE Signal Process. Mag., 2009

Area-efficiency in CMP core design: co-optimization of microarchitecture and physical design.
SIGARCH Comput. Archit. News, 2009

Rigel: an architecture and scalable programming interface for a 1000-core accelerator.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Depth image-based rendering from multiple cameras with 3D propagation algorithm.
Proceedings of the 2nd International ICST Conference on Immersive Telecommunications, 2009

Depth image-based rendering with low resolution depth.
Proceedings of the International Conference on Image Processing, 2009

Optimization of tele-immersion codes.
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, 2009

Guest Editors' Introduction: Accelerator Architectures.
IEEE Micro, 2008

Tradeoffs in designing accelerator architectures for visual computing.
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

Hardware support for software controlled multithreading.
SIGARCH Comput. Archit. News, 2007

The Art of Deception: Adaptive Precision Reduction for Area Efficient Physics Acceleration.
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

ParallAX: an architecture for real-time physics.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

Examining ACE analysis reliability estimates using fault-injection.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

Implicitly Parallel Programming Models for Thousand-Core Microprocessors.
Proceedings of the 44th Design Automation Conference, 2007

Sequential Element Design With Built-In Soft Error Resilience.
IEEE Trans. Very Large Scale Integr. Syst., 2006

ReStore: Symptom-Based Soft Error Detection in Microprocessors.
IEEE Trans. Dependable Secur. Comput., 2006

Beating In-Order Stalls with "Flea-Flicker" Two-Pass Pipelining.
IEEE Trans. Computers, 2006

An Experimental Study of Soft Errors in Microprocessors.
IEEE Micro, 2005

Continuous Optimization.
Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005

The Future of Computer Architecture Research: An Industrial Perspective.
Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

Reducing the Scheduling Critical Cycle Using Wakeup Prediction.
Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004

Characterizing the Effects of Transient Faults on a High-Performance Processor Pipeline.
Proceedings of the 2004 International Conference on Dependable Systems and Networks (DSN 2004), 28 June, 2004

Introduction to computing systems - from bits and gates to C and beyond (2. ed.).
McGraw-Hill, ISBN: 978-0-07-246750-5, 2004

Characterization of essential dynamic instructions.
Proceedings of the International Conference on Measurements and Modeling of Computer Systems, 2003

Dynamic Optimization of Micro-Operations.
Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

Improving Quasi-Dynamic Schedules through Region Slip.
Proceedings of the 1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), 2003

Y-Branches: When You Come to a Fork in the Road, Take It.
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT 2003), 27 September, 2003

Instruction fetch deferral using static slack.
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

rePLay: A Hardware Framework for Dynamic Optimization.
IEEE Trans. Computers, 2001

Performance characterization of a hardware mechanism for dynamic optimization.
Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001

Increasing the size of atomic instruction blocks using control flow assertions.
Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000

Trace cache design for wide-issue superscalar processors.
PhD thesis, 1999

Evaluation of Design Options for the Trace Cache Fetch Mechanism.
IEEE Trans. Computers, 1999

Putting the Fill Unit to Work: Dynamic Optimizations for Trace Cache Microprocessors.
Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998

Improving Trace Cache Effectiveness with Branch Promotion and Trace Packing.
Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998

An Analysis of Correlation and Predictability: What Makes Two-Level Branch Predictors Work.
Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998

One Billion Transistors, One Uniprocessor, One Chip.
Computer, 1997

Alternative Fetch and Issue Policies for the Trace Cache Fetch Mechanism.
Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, 1997

Digital's DECchip 21066: The First Cost-focused Alpha AXP Chip
Digit. Tech. J., 1994

DECchip 21066: The Alpha AXP Chip for Cost-Focused Systems.
Proceedings of the Spring COMPCON 94, Digest of Papers, San Francisco, California, USA, February 28, 1994

Effectiveness of heuristics measures for automatic test pattern generation.
Proceedings of the 23rd ACM/IEEE Design Automation Conference. Las Vegas, 1986