Todd C. Mowry

CoRR, 2018

2017

Relaxed Operator Fusion for In-Memory Databases: Making Compilation, Vectorization, and Prefetching Work Together At Last.

[BibT_eX]

[DOI]

Prashanth Menon

Andrew Pavlo

Proc. VLDB Endow., 2017

Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Self-Driving Database Management Systems.

[BibT_eX]

[DOI]

Proceedings of the 8th Biennial Conference on Innovative Data Systems Research, 2017

2016

RFVP: Rollback-Free Value Prediction with Safe-to-Approximate Loads.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2016

Mitigating the Memory Bottleneck With Approximate Load Value Prediction.

[BibT_eX]

[DOI]

IEEE Des. Test, 2016

A Framework for Accelerating Bottlenecks in GPU Execution with Assist Warps.

[BibT_eX]

[DOI]

CoRR, 2016

Buddy-RAM: Improving the Performance and Efficiency of Bulk Bitwise Operations Using DRAM.

[BibT_eX]

[DOI]

CoRR, 2016

A case for toggle-aware compression for GPU systems.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

2015

Fast Bulk Bitwise AND and OR in DRAM.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2015

Toggle-Aware Compression for GPUs.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2015

Gather-scatter DRAM: in-DRAM address translation to improve the spatial locality of non-unit strided accesses.

[BibT_eX]

[DOI]

Proceedings of the 48th International Symposium on Microarchitecture, 2015

A case for core-assisted bottleneck acceleration in GPUs: enabling flexible data compression with assist warps.

[BibT_eX]

[DOI]

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Page overlays: an enhanced virtual memory framework to enable fine-grained memory management.

[BibT_eX]

[DOI]

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Exploiting compressed block size as an indicator of future reuse.

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Tracking and Reducing Uncertainty in Dataflow Analysis-Based Dynamic Parallel Monitoring.

[BibT_eX]

[DOI]

Phillip B. Gibbons

Michael A. Kozuch

Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014

Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2014

The Dirty-Block Index.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Guardrail: a high fidelity approach to protecting hardware devices from buggy drivers.

[BibT_eX]

[DOI]

Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

Rollback-free value prediction with approximate loads.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013

Editorial.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2013

RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization.

[BibT_eX]

[DOI]

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Linearly compressed pages: a low-complexity, low-latency main memory compression framework.

[BibT_eX]

[DOI]

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

2012

Introduction to Special Issue APLOS 2011.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2012

The evicted-address filter: a unified mechanism to address both cache pollution and thrashing.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Base-delta-immediate compression: practical data compression for on-chip caches.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Linearly compressed pages: a main memory compression framework with low complexity and low latency.

[BibT_eX]

[DOI]

Gennady Pekhimenko

Onur Mutlu

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Chrysalis analysis: incorporating synchronization arcs in dataflow-analysis-based parallel monitoring.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011

Log-based architectures: using multicore to help software behave correctly.

[BibT_eX]

[DOI]

ACM SIGOPS Oper. Syst. Rev., 2011

2010

Decoupled lifeguards: enabling path optimizations for dynamic correctness checking tools.

[BibT_eX]

[DOI]

Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2010

ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications.

[BibT_eX]

[DOI]

Evangelos Vlachos

Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

Decoupling contention management from scheduling.

[BibT_eX]

[DOI]

Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

Butterfly analysis: adapting dataflow analysis to dynamic parallel monitoring.

[BibT_eX]

[DOI]

Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

2009

Flexible Hardware Acceleration for Instruction-Grain Lifeguards.

[BibT_eX]

[DOI]

IEEE Micro, 2009

Beyond Audio and Video: Using Claytronics to Enable Pario.

[BibT_eX]

[DOI]

Seth Copen Goldstein

Michael P. Ashley-Rollman

Jason Campbell

Michael DeRosa

Stanislav Funiak

James F. Hoburg

Mustafa Emre Karagozler

Michael Philetus Weller

AI Mag., 2009

Holistic Query Transformations for Dynamic Web Applications.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Data Engineering, 2009

2008

Incrementally parallelizing database transactions with thread-level speculation.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2008

Compiler and hardware support for reducing the synchronization of speculative threads.

[BibT_eX]

[DOI]

Michael P. Ashley-Rollman

ACM Trans. Archit. Code Optim., 2008

Scalable query result caching for web applications.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2008

Parallelizing dynamic information flow tracking.

[BibT_eX]

[DOI]

Proceedings of the SPAA 2008: Proceedings of the 20th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2008

Cut-and-stitch: efficient parallel learning of linear dynamical systems on smps.

[BibT_eX]

[DOI]

Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008

Flexible Hardware Acceleration for Instruction-Grain Program Monitoring.

[BibT_eX]

[DOI]

Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

Generalizing metamodules to simplify planning in modular robotic systems.

[BibT_eX]

[DOI]

Daniel J. Dewey

Michael DeRosa

Seth Copen Goldstein

Siddhartha S. Srinivasa

Padmanabhan Pillai

Jason Campbell

Proceedings of the 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2008

2007

CMP Support for Large and Dependent Speculative Threads.

[BibT_eX]

[DOI]

Michael P. Ashley-Rollman

IEEE Trans. Parallel Distributed Syst., 2007

Improving hash join performance through prefetching.

[BibT_eX]

[DOI]

ACM Trans. Database Syst., 2007

Scheduling threads for constructive cache sharing on CMPs.

[BibT_eX]

[DOI]

Shimin Chen

Phillip B. Gibbons

Michael Kozuch

Vasileios Liaskovitis

Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007

A modular robotic system using magnetic force effectors.

[BibT_eX]

[DOI]

Proceedings of the 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, October 29, 2007

Meld: A declarative approach to programming ensembles.

[BibT_eX]

[DOI]

Proceedings of the 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, October 29, 2007

Integrated Debugging of Large Modular Robot Ensembles.

[BibT_eX]

[DOI]

Proceedings of the 2007 IEEE International Conference on Robotics and Automation, 2007

Distributed Watchpoints: Debugging Large Multi-Robot Systems.

[BibT_eX]

[DOI]

Proceedings of the 2007 IEEE International Conference on Robotics and Automation, 2007

Invalidation Clues for Database Scalability Services.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Data Engineering, 2007

2006

Parallel depth first vs. work stealing schedulers on CMP architectures.

[BibT_eX]

[DOI]

Vasileios Liaskovitis

Proceedings of the SPAA 2006: Proceedings of the 18th Annual ACM Symposium on Parallelism in Algorithms and Architectures, Cambridge, Massachusetts, USA, July 30, 2006

Simultaneous scalability and security for data-intensive web applications.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGMOD International Conference on Management of Data, 2006

Tolerating Dependences Between Large Speculative Threads Via Sub-Threads.

[BibT_eX]

[DOI]

Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006

Log-based architectures for general-purpose monitoring of deployed code.

[BibT_eX]

[DOI]

Proceedings of the 1st Workshop on Architectural and System Support for Improving Software Dependability, 2006

2005

The STAMPede approach to thread-level speculation.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2005

Programmable Matter.

[BibT_eX]

[DOI]

Seth Copen Goldstein

Jason Campbell

Computer, 2005

Optimistic Intra-Transaction Parallelism on Chip Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30, 2005

Inspector Joins.

[BibT_eX]

[DOI]

Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30, 2005

Claytronics: highly scalable communications, sensing, and actuation networks.

[BibT_eX]

[DOI]

Proceedings of the 3rd International Conference on Embedded Networked Sensor Systems, 2005

A Scalability Service for Dynamic Web Applications.

[BibT_eX]

[DOI]

Proceedings of the Second Biennial Conference on Innovative Data Systems Research, 2005

Catoms: Moving Robots Without Moving Parts.

[BibT_eX]

[DOI]

Proceedings of the Proceedings, 2005

2004

Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads.

[BibT_eX]

[DOI]

Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004

2002

Fractal prefetching B±Trees: optimizing both cache and disk performance.

[BibT_eX]

[DOI]

Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, 2002

Improving Value Communication for Thread-Level Speculation.

[BibT_eX]

[DOI]

Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002

Compiler optimization of scalar value communication between speculative threads.

[BibT_eX]

[DOI]

Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2002

2001

Architectural and compiler support for effective instruction prefetching: a cooperative approach.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2001

Compiler-based I/O prefetching for out-of-core applications.

[BibT_eX]

[DOI]

Angela Demke Brown

Orran Krieger

ACM Trans. Comput. Syst., 2001

Improving Index Performance through Prefetching.

[BibT_eX]

[DOI]

Shimin Chen

Phillip B. Gibbons

Proceedings of the 2001 ACM SIGMOD international conference on Management of data, 2001

2000

Understanding Why Correlation Profiling Improves the Predictability of Data Cache Misses in Nonnumeric Applications.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2000

Taming the Memory Hogs: Using Compiler-Inserted Releases to Manage Physical Memory Intelligently.

[BibT_eX]

[DOI]

Angela Demke Brown

Proceedings of the 4th Symposium on Operating System Design and Implementation (OSDI 2000), 2000

A scalable approach to thread-level speculation.

[BibT_eX]

[DOI]

Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

Software-Controlled Multithreading Using Informing Memory Operations.

[BibT_eX]

[DOI]

Sherwyn R. Ramkissoon

Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000

1999

Automatic Compiler-Inserted Prefetching for Pointer-Based Applications.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1999

Memory Forwarding: Enabling Aggressive Layout Optimizations by Guaranteeing the Safety of Data Relocation.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999

1998

Tolerating Latency in Multiprocessors Through Compiler-Inserted Prefetching.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 1998

Informing Memory Operations: Memory Performance Feedback Mechanisms and Their Applications.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 1998

Cooperative Prefetching: Compiler and Hardware Support for Effective Instruction Prefetching in Modern Processors.

[BibT_eX]

[DOI]

Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998

The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization.

[BibT_eX]

[DOI]

Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, Las Vegas, Nevada, USA, January 31, 1998

Comparative Evaluation of Latency Tolerance Techniques for Software Distributed Shared Memory.

[BibT_eX]

[DOI]

Charles Q. C. Chan

Adley K. W. Lo

Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, Las Vegas, Nevada, USA, January 31, 1998

1997

Predicting Data Cache Misses in Non-Numeric Applications through Correlation Profiling.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, 1997

1996

Automatic Compiler-Inserted I/O Prefetching for Out-of-Core Applications.

[BibT_eX]

[DOI]

Angela K. Demke

Orran Krieger

Proceedings of the Second USENIX Symposium on Operating Systems Design and Implementation (OSDI), 1996

Informing Memory Operations: Providing Memory Performance Feedback in Modern Processors.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual International Symposium on Computer Architecture, 1996

Compiler-Based Prefetching for Recursive Data Structures.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS-VII Proceedings, 1996

Compiler-Directed Page Coloring for Multiprocessors.

[BibT_eX]

[DOI]

Edouard Bugnion

Jennifer-Ann M. Anderson

Mendel Rosenblum

Monica S. Lam

Proceedings of the ASPLOS-VII Proceedings, 1996

1992

Design and Evaluation of a Compiler Algorithm for Prefetching.

[BibT_eX]

[DOI]

Monica S. Lam

Anoop Gupta

Proceedings of the ASPLOS-V Proceedings, 1992

1991

Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors.

[BibT_eX]

[DOI]

Anoop Gupta

J. Parallel Distributed Comput., 1991

Comparative Evaluation of Latency Reducing and Tolerating Techniques.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual International Symposium on Computer Architecture. Toronto, 1991

1990

Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes.

[BibT_eX]

Anoop Gupta

Wolf-Dietrich Weber