Todd C. Mowry

Orcid: 0000-0003-4076-5684

Affiliations:
  • Carnegie Mellon University, Pittsburgh, USA


According to our database1, Todd C. Mowry authored at least 104 papers between 1990 and 2024.

Collaborative distances:

Awards

ACM Fellow

ACM Fellow 2016, "For contributions to software prefetching and thread-level speculation".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Address Scaling: Architectural Support for Fine-Grained Thread-Safe Metadata Management.
IEEE Comput. Archit. Lett., 2024

ACROBAT: Optimizing Auto-batching of Dynamic Deep Learning at Compile Time.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

Dear User-Defined Functions, Inlining isn't working out so great for us. Let's try batching to make our relationship work. Sincerely, SQL.
Proceedings of the 14th Conference on Innovative Data Systems Research, 2024

2023
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning.
CoRR, 2023

Memento: Architectural Support for Ephemeral Memory Management in Serverless Environments.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

ED-Batch: Efficient Automatic Batching of Dynamic Neural Networks via Learned Finite State Machines.
Proceedings of the International Conference on Machine Learning, 2023

2022
The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal Padding.
Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

A programmable, energy-minimal dataflow compiler and architecture.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

2021
Cortex: A Compiler for Recursive Deep Learning Models.
Proceedings of the Fourth Conference on Machine Learning and Systems, 2021

NVOverlay: Enabling Efficient and Scalable High-Frequency Snapshotting to NVM.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

Filter Representation in Vectorized Query Execution.
Proceedings of the 17th International Workshop on Data Management on New Hardware, 2021

2020
Permutable Compiled Queries: Dynamically Adapting Compiled Queries without Recompiling.
Proc. VLDB Endow., 2020

TardisTM: incremental repair for transactional memory.
Proceedings of the PMAM@PPoPP '20: Eleventh International Workshop on Programming Models and Applications for Multicores and Manycores colocated with the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

2019
Multiversioned Page Overlays: Enabling Faster Serializable Hardware Transactional Memory.
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

Towards Breaking the Memory Bandwidth Wall Using Approximate Value Prediction.
Proceedings of the Approximate Circuits, Methodologies and CAD., 2019

2018
RowClone: Accelerating Data Movement and Initialization Using DRAM.
CoRR, 2018

2017
Relaxed Operator Fusion for In-Memory Databases: Making Compilation, Vectorization, and Prefetching Work Together At Last.
Proc. VLDB Endow., 2017

Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Self-Driving Database Management Systems.
Proceedings of the 8th Biennial Conference on Innovative Data Systems Research, 2017

2016
RFVP: Rollback-Free Value Prediction with Safe-to-Approximate Loads.
ACM Trans. Archit. Code Optim., 2016

Mitigating the Memory Bottleneck With Approximate Load Value Prediction.
IEEE Des. Test, 2016

A Framework for Accelerating Bottlenecks in GPU Execution with Assist Warps.
CoRR, 2016

Buddy-RAM: Improving the Performance and Efficiency of Bulk Bitwise Operations Using DRAM.
CoRR, 2016

A case for toggle-aware compression for GPU systems.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

2015
Fast Bulk Bitwise AND and OR in DRAM.
IEEE Comput. Archit. Lett., 2015

Toggle-Aware Compression for GPUs.
IEEE Comput. Archit. Lett., 2015

Gather-scatter DRAM: in-DRAM address translation to improve the spatial locality of non-unit strided accesses.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

A case for core-assisted bottleneck acceleration in GPUs: enabling flexible data compression with assist warps.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Page overlays: an enhanced virtual memory framework to enable fine-grained memory management.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Exploiting compressed block size as an indicator of future reuse.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Tracking and Reducing Uncertainty in Dataflow Analysis-Based Dynamic Parallel Monitoring.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014
Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks.
ACM Trans. Archit. Code Optim., 2014

The Dirty-Block Index.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Guardrail: a high fidelity approach to protecting hardware devices from buggy drivers.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

Rollback-free value prediction with approximate loads.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
Editorial.
ACM Trans. Comput. Syst., 2013

RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Linearly compressed pages: a low-complexity, low-latency main memory compression framework.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

2012
Introduction to Special Issue APLOS 2011.
ACM Trans. Comput. Syst., 2012

The evicted-address filter: a unified mechanism to address both cache pollution and thrashing.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Base-delta-immediate compression: practical data compression for on-chip caches.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Linearly compressed pages: a main memory compression framework with low complexity and low latency.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Chrysalis analysis: incorporating synchronization arcs in dataflow-analysis-based parallel monitoring.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Log-based architectures: using multicore to help software behave correctly.
ACM SIGOPS Oper. Syst. Rev., 2011

2010
Decoupled lifeguards: enabling path optimizations for dynamic correctness checking tools.
Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2010

ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications.
Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

Decoupling contention management from scheduling.
Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

Butterfly analysis: adapting dataflow analysis to dynamic parallel monitoring.
Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

2009
Flexible Hardware Acceleration for Instruction-Grain Lifeguards.
IEEE Micro, 2009

Beyond Audio and Video: Using Claytronics to Enable Pario.
AI Mag., 2009

Holistic Query Transformations for Dynamic Web Applications.
Proceedings of the 25th International Conference on Data Engineering, 2009

2008
Incrementally parallelizing database transactions with thread-level speculation.
ACM Trans. Comput. Syst., 2008

Compiler and hardware support for reducing the synchronization of speculative threads.
ACM Trans. Archit. Code Optim., 2008

Scalable query result caching for web applications.
Proc. VLDB Endow., 2008

Parallelizing dynamic information flow tracking.
Proceedings of the SPAA 2008: Proceedings of the 20th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2008

Cut-and-stitch: efficient parallel learning of linear dynamical systems on smps.
Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008

Flexible Hardware Acceleration for Instruction-Grain Program Monitoring.
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

Generalizing metamodules to simplify planning in modular robotic systems.
Proceedings of the 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2008

2007
CMP Support for Large and Dependent Speculative Threads.
IEEE Trans. Parallel Distributed Syst., 2007

Improving hash join performance through prefetching.
ACM Trans. Database Syst., 2007

Scheduling threads for constructive cache sharing on CMPs.
Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007

A modular robotic system using magnetic force effectors.
Proceedings of the 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, October 29, 2007

Meld: A declarative approach to programming ensembles.
Proceedings of the 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, October 29, 2007

Integrated Debugging of Large Modular Robot Ensembles.
Proceedings of the 2007 IEEE International Conference on Robotics and Automation, 2007

Distributed Watchpoints: Debugging Large Multi-Robot Systems.
Proceedings of the 2007 IEEE International Conference on Robotics and Automation, 2007

Invalidation Clues for Database Scalability Services.
Proceedings of the 23rd International Conference on Data Engineering, 2007

2006
Parallel depth first vs. work stealing schedulers on CMP architectures.
Proceedings of the SPAA 2006: Proceedings of the 18th Annual ACM Symposium on Parallelism in Algorithms and Architectures, Cambridge, Massachusetts, USA, July 30, 2006

Simultaneous scalability and security for data-intensive web applications.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2006

Tolerating Dependences Between Large Speculative Threads Via Sub-Threads.
Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006

Log-based architectures for general-purpose monitoring of deployed code.
Proceedings of the 1st Workshop on Architectural and System Support for Improving Software Dependability, 2006

2005
The STAMPede approach to thread-level speculation.
ACM Trans. Comput. Syst., 2005

Programmable Matter.
Computer, 2005

Optimistic Intra-Transaction Parallelism on Chip Multiprocessors.
Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30, 2005

Inspector Joins.
Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30, 2005


A Scalability Service for Dynamic Web Applications.
Proceedings of the Second Biennial Conference on Innovative Data Systems Research, 2005

Catoms: Moving Robots Without Moving Parts.
Proceedings of the Proceedings, 2005

2004
Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads.
Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004

2002
Fractal prefetching B±Trees: optimizing both cache and disk performance.
Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, 2002

Improving Value Communication for Thread-Level Speculation.
Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002

Compiler optimization of scalar value communication between speculative threads.
Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2002

2001
Architectural and compiler support for effective instruction prefetching: a cooperative approach.
ACM Trans. Comput. Syst., 2001

Compiler-based I/O prefetching for out-of-core applications.
ACM Trans. Comput. Syst., 2001

Improving Index Performance through Prefetching.
Proceedings of the 2001 ACM SIGMOD international conference on Management of data, 2001

2000
Understanding Why Correlation Profiling Improves the Predictability of Data Cache Misses in Nonnumeric Applications.
IEEE Trans. Computers, 2000

Taming the Memory Hogs: Using Compiler-Inserted Releases to Manage Physical Memory Intelligently.
Proceedings of the 4th Symposium on Operating System Design and Implementation (OSDI 2000), 2000

A scalable approach to thread-level speculation.
Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

Software-Controlled Multithreading Using Informing Memory Operations.
Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000

1999
Automatic Compiler-Inserted Prefetching for Pointer-Based Applications.
IEEE Trans. Computers, 1999

Memory Forwarding: Enabling Aggressive Layout Optimizations by Guaranteeing the Safety of Data Relocation.
Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999

1998
Tolerating Latency in Multiprocessors Through Compiler-Inserted Prefetching.
ACM Trans. Comput. Syst., 1998

Informing Memory Operations: Memory Performance Feedback Mechanisms and Their Applications.
ACM Trans. Comput. Syst., 1998

Cooperative Prefetching: Compiler and Hardware Support for Effective Instruction Prefetching in Modern Processors.
Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998

The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization.
Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, Las Vegas, Nevada, USA, January 31, 1998

Comparative Evaluation of Latency Tolerance Techniques for Software Distributed Shared Memory.
Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, Las Vegas, Nevada, USA, January 31, 1998

1997
Predicting Data Cache Misses in Non-Numeric Applications through Correlation Profiling.
Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, 1997

1996
Automatic Compiler-Inserted I/O Prefetching for Out-of-Core Applications.
Proceedings of the Second USENIX Symposium on Operating Systems Design and Implementation (OSDI), 1996

Informing Memory Operations: Providing Memory Performance Feedback in Modern Processors.
Proceedings of the 23rd Annual International Symposium on Computer Architecture, 1996

Compiler-Based Prefetching for Recursive Data Structures.
Proceedings of the ASPLOS-VII Proceedings, 1996

Compiler-Directed Page Coloring for Multiprocessors.
Proceedings of the ASPLOS-VII Proceedings, 1996

1992
Design and Evaluation of a Compiler Algorithm for Prefetching.
Proceedings of the ASPLOS-V Proceedings, 1992

1991
Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors.
J. Parallel Distributed Comput., 1991

Comparative Evaluation of Latency Reducing and Tolerating Techniques.
Proceedings of the 18th Annual International Symposium on Computer Architecture. Toronto, 1991

1990
Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes.
Proceedings of the 1990 International Conference on Parallel Processing, 1990


  Loading...