2018
Accelerator Architectures A Ten-Year Retrospective.
IEEE Micro, 2018
2013
Hybrid latency tolerance for robust energy-efficiency on 1000-core data parallel processors.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013
2011
Immersive Visual Communication.
IEEE Signal Process. Mag., 2011
Cohesion: An Adaptive Hybrid Memory Model for Accelerators.
IEEE Micro, 2011
Rigel: A 1, 024-Core Single-Chip Accelerator Architecture.
IEEE Micro, 2011
OUTRIDER: efficient memory latency tolerance with decoupled strands.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011
Accelerating aerial image simulation with GPU.
Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011
Decoupled Architectures as a Low-Complexity Alternative to Out-of-order Execution.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011
2010
A Task-Centric Memory Model for Scalable Accelerator Architectures.
IEEE Micro, 2010
An adaptive performance modeling tool for GPU architectures.
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010
Implementing a GPU Programming Model on a Non-GPU Accelerator Architecture.
Proceedings of the Computer Architecture, 2010
Cohesion: a hybrid memory model for accelerators.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010
Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010
GoldMine: Automatic assertion generation using data mining and static analysis.
Proceedings of the Design, Automation and Test in Europe, 2010
An integrated framework for joint design space exploration of microarchitecture and circuits.
Proceedings of the Design, Automation and Test in Europe, 2010
An asymmetric distributed shared memory model for heterogeneous parallel systems.
Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010
WAYPOINT: scaling coherence to thousand-core architectures.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010
2009
Fool me twice: Exploring and exploiting error tolerance in physics-based animation.
ACM Trans. Graph., 2009
The parallelization of video processing.
IEEE Signal Process. Mag., 2009
Area-efficiency in CMP core design: co-optimization of microarchitecture and physical design.
SIGARCH Comput. Archit. News, 2009
Rigel: an architecture and scalable programming interface for a 1000-core accelerator.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009
Depth image-based rendering from multiple cameras with 3D propagation algorithm.
Proceedings of the 2nd International ICST Conference on Immersive Telecommunications, 2009
Depth image-based rendering with low resolution depth.
Proceedings of the International Conference on Image Processing, 2009
Optimization of tele-immersion codes.
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, 2009
2008
Guest Editors' Introduction: Accelerator Architectures.
IEEE Micro, 2008
Tradeoffs in designing accelerator architectures for visual computing.
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008
2007
Hardware support for software controlled multithreading.
SIGARCH Comput. Archit. News, 2007
The Art of Deception: Adaptive Precision Reduction for Area Efficient Physics Acceleration.
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007
ParallAX: an architecture for real-time physics.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007
Examining ACE analysis reliability estimates using fault-injection.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007
Implicitly Parallel Programming Models for Thousand-Core Microprocessors.
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 44th Design Automation Conference, 2007
2006
Sequential Element Design With Built-In Soft Error Resilience.
IEEE Trans. Very Large Scale Integr. Syst., 2006
ReStore: Symptom-Based Soft Error Detection in Microprocessors.
IEEE Trans. Dependable Secur. Comput., 2006
Beating In-Order Stalls with "Flea-Flicker" Two-Pass Pipelining.
IEEE Trans. Computers, 2006
2005
An Experimental Study of Soft Errors in Microprocessors.
IEEE Micro, 2005
Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005
The Future of Computer Architecture Research: An Industrial Perspective.
Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005
2004
Reducing the Scheduling Critical Cycle Using Wakeup Prediction.
Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004
Characterizing the Effects of Transient Faults on a High-Performance Processor Pipeline.
Proceedings of the 2004 International Conference on Dependable Systems and Networks (DSN 2004), 28 June, 2004
Introduction to computing systems - from bits and gates to C and beyond (2. ed.).
McGraw-Hill, ISBN: 978-0-07-246750-5, 2004
2003
Characterization of essential dynamic instructions.
Proceedings of the International Conference on Measurements and Modeling of Computer Systems, 2003
Dynamic Optimization of Micro-Operations.
Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003
Improving Quasi-Dynamic Schedules through Region Slip.
Proceedings of the 1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), 2003
Y-Branches: When You Come to a Fork in the Road, Take It.
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT 2003), 27 September, 2003
2002
Instruction fetch deferral using static slack.
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002
2001
rePLay: A Hardware Framework for Dynamic Optimization.
IEEE Trans. Computers, 2001
Performance characterization of a hardware mechanism for dynamic optimization.
Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001
2000
Increasing the size of atomic instruction blocks using control flow assertions.
Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000
1999
Trace cache design for wide-issue superscalar processors.
PhD thesis, 1999
Evaluation of Design Options for the Trace Cache Fetch Mechanism.
IEEE Trans. Computers, 1999
1998
Putting the Fill Unit to Work: Dynamic Optimizations for Trace Cache Microprocessors.
Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998
Improving Trace Cache Effectiveness with Branch Promotion and Trace Packing.
Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998
An Analysis of Correlation and Predictability: What Makes Two-Level Branch Predictors Work.
Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998
1997
One Billion Transistors, One Uniprocessor, One Chip.
Computer, 1997
Alternative Fetch and Issue Policies for the Trace Cache Fetch Mechanism.
Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, 1997
1994
Digital's DECchip 21066: The First Cost-focused Alpha AXP Chip
Digit. Tech. J., 1994
DECchip 21066: The Alpha AXP Chip for Cost-Focused Systems.
Proceedings of the Spring COMPCON 94, Digest of Papers, San Francisco, California, USA, February 28, 1994
1986
Effectiveness of heuristics measures for automatic test pattern generation.
Proceedings of the 23rd ACM/IEEE Design Automation Conference. Las Vegas, 1986