Tor M. Aamodt

Orcid: 0000-0003-1161-692X

  • University of Toronto, Canada

According to our database1, Tor M. Aamodt authored at least 89 papers between 2000 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 


Online presence:



Characterizing and Improving Resilience of Accelerators to Memory Errors in Autonomous Robots.
ACM Trans. Cyber Phys. Syst., July, 2024

Graph-Based Identification of Qubit Network (GidNET) for Qubit Reuse.
Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2024

Uncovering Real GPU NoC Characteristics: Implications on Interconnect Architecture.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

Generalizing Ray Tracing Accelerators for Tree Traversals on GPUs.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

Zatel: Sample Complexity-Aware Scale-Model Simulation for Ray Tracing.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2024

Collision Prediction for Robotics Accelerators.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

GhOST: a GPU Out-of-Order Scheduling Technique for Stall Reduction.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Treelet Prefetching For Ray Tracing.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Energy-Efficient Realtime Motion Planning.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

LumiBench: A Benchmark Suite for Hardware Ray Tracing.
Proceedings of the IEEE International Symposium on Workload Characterization, 2023

Learning Label Encodings for Deep Regression.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Vulkan-Sim: A GPU Architecture Simulator for Ray Tracing.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Anticipating and eliminating redundant computations in accelerated sparse training.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Label Encoding for Regression Networks.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Characterizing and Improving the Resilience of Accelerators in Autonomous Robots.
CoRR, 2021

AC-GC: Lossy Activation Compression with Guaranteed Convergence.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Intersection Prediction for Accelerated GPU Ray Tracing.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

AccelWattch: A Power Modeling Framework for Modern GPUs.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Energy Efficient On-Demand Dynamic Branch Prediction Models.
IEEE Trans. Computers, 2020

Sparse Weight Activation Training.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Deterministic Atomic Buffering.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

JPEG-ACT: Accelerating Deep Learning via Transform-based Lossy Compression.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

ReSprop: Reuse Sparsified Backpropagation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Bandwidth Bottleneck in Network-on-Chip for High-Throughput Processors.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

Hash-Based Ray Path Prediction: Skipping BVH Traversal Computation by Exploiting Ray Locality.
CoRR, 2019

Surface Compression Using Dynamic Color Palettes.
CoRR, 2019

Modeling Deep Learning Accelerator Enabled GPUs.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

Analyzing Machine Learning Workloads Using a Detailed GPU Simulator.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

A Detailed Model for Contemporary GPU Memory Systems.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

Emerald: graphics modeling for SoC systems.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

EDGE: Event-Driven GPU Execution.
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

General-Purpose Graphics Processor Architectures
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01759-9, 2018

Proteus: Exploiting precision variability in deep neural networks.
Parallel Comput., 2018

Value-Based Deep-Learning Acceleration.
IEEE Micro, 2018

Exploring Modern GPU Memory System Design Challenges through Accurate Modeling.
CoRR, 2018

Exploiting Typical Values to Accelerate Deep Learning.
Computer, 2018

Identifying and Exploiting Ineffectual Computations to Enable Hardware Acceleration of Deep Learning.
Proceedings of the 16th IEEE International New Circuits and Systems Conference, 2018

Warp Scheduling for Fine-Grained Synchronization.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution Near In-Order Energy with Near Out-of-Order Performance.
ACM Trans. Archit. Code Optim., 2017

HoLiSwap: Reducing Wire Energy in L1 Caches.
CoRR, 2017

A state machine block for high-level synthesis.
Proceedings of the International Conference on Field Programmable Technology, 2017

Reuse Distance-Based Probabilistic Cache Replacement.
ACM Trans. Archit. Code Optim., 2016

CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution.
CoRR, 2016

Inter-Core Locality Aware Memory Scheduling.
IEEE Comput. Archit. Lett., 2016

Stripes: Bit-serial deep neural network computing.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

MIMD synchronization on SIMT architectures.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks.
Proceedings of the 2016 International Conference on Supercomputing, 2016

Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets.
CoRR, 2015

On-Demand Dynamic Branch Prediction.
IEEE Comput. Archit. Lett., 2015

SLIP: reducing wire energy in the memory hierarchy.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

MemcachedGPU: scaling-up scale-out key-value stores.
Proceedings of the Sixth ACM Symposium on Cloud Computing, 2015

Cache Coherence for GPU Architectures.
IEEE Micro, 2014

Learning your limit: managing massively multithreaded caches through scheduling.
Commun. ACM, 2014

Scaling usable computing capability.
Proceedings of the XIVth International Conference on Embedded Computer Systems: Architectures, 2014

A scalable multi-path microarchitecture for efficient GPU control flow.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Designing on-chip networks for throughput accelerators.
ACM Trans. Archit. Code Optim., 2013

Cache-Conscious Thread Scheduling for Massively Multithreaded Processors.
IEEE Micro, 2013

Divergence-aware warp scheduling.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Energy efficient GPU transactional memory via space-time optimizations.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

GPUWattch: enabling energy optimizations in GPGPUs.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

Characterizing the performance benefits of fused CPU/GPU systems using FusionSim.
Proceedings of the Design, Automation and Test in Europe, 2013

GPUDet: a deterministic GPU architecture.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2013

Formal-Analysis-Based Trace Computation for Post-Silicon Debug.
IEEE Trans. Very Large Scale Integr. Syst., 2012

Modeling Cache Contention and Throughput of Multiprogrammed Manycore Processors.
IEEE Trans. Computers, 2012

Kilo TM: Hardware Transactional Memory for GPU Architectures.
IEEE Micro, 2012

Progressive-BackSpace: Efficient Predecessor Computation for Post-Silicon Debug.
Proceedings of the 13th International Workshop on Microprocessor Test and Verification, 2012

Cache-Conscious Wavefront Scheduling.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012

Small virtual channel routers on FPGAs through block RAM sharing.
Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012

Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs.
ACM Trans. Archit. Code Optim., 2011

Hardware transactional memory for GPU architectures.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Thread block compaction for efficient SIMT control flow.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Throughput-Effective On-Chip Networks for Manycore Accelerators.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Accelerating trace computation in post-silicon debug.
Proceedings of the 11th International Symposium on Quality of Electronic Design (ISQED 2010), 2010

Visualizing complex dynamics in many-core accelerator architectures.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2010

On-chip network design considerations for compute accelerators.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware.
ACM Trans. Archit. Code Optim., 2009

Complexity effective memory access scheduling for many-core accelerator architectures.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Analyzing CUDA workloads using a detailed GPU simulator.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

A first-order fine-grained multithreaded throughput model.
Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

Compile-time and instruction-set methods for improving floating- to fixed-point conversion accuracy.
ACM Trans. Embed. Comput. Syst., 2008

Pangaea: a tightly-coupled IA32 heterogeneous chip multiprocessor.
Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow.
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

Optimization of data prefetch helper threads with path-expression based statistical modeling.
Proceedings of the 21th Annual International Conference on Supercomputing, 2007

Hardware Support for Prescient Instruction Prefetch.
Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004

A framework for modeling and optimization of prescient instruction prefetch.
Proceedings of the International Conference on Measurements and Modeling of Computer Systems, 2003

Embedded ISA support for enhanced floating-point to fixed-point ANSI-C compilation.
Proceedings of the 2000 International Conference on Compilers, 2000
