Kim M. Hazelwood

Orcid: 0000-0002-2713-8507

According to our database1, Kim M. Hazelwood authored at least 65 papers between 2000 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Beyond Efficiency: Scaling AI Sustainably.
IEEE Micro, 2024

Revealing Compiler Heuristics Through Automated Discovery and Optimization.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2024

2023
Large Language Models for Compiler Optimization.
CoRR, 2023

BenchDirect: A Directed Language Model for Compiler Benchmarks.
CoRR, 2023

2022
Object Intersection Captures on Interactive Apps to Drive a Crowd-sourced Replay-based Compiler Optimization.
ACM Trans. Archit. Code Optim., 2022


F3M: Fast Focused Function Merging.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2022

Caviar: an e-graph based TRS for automatic code optimization.
Proceedings of the CC '22: 31st ACM SIGPLAN International Conference on Compiler Construction, Seoul, South Korea, April 2, 2022

BenchPress: A Deep Active Benchmark Generator.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

Q-gym: An Equality Saturation Framework for DNN Inference Exploiting Weight Repetition.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021
Exploiting Parallelism Opportunities with Deep Learning Frameworks.
ACM Trans. Archit. Code Optim., 2021

Datacenter-Scale Analysis and Optimization of GPU Machine Learning Workloads.
IEEE Micro, 2021

Sustainable AI: Environmental Implications, Challenges and Opportunities.
CoRR, 2021

Using Python for Model Inference in Deep Learning.
CoRR, 2021

Developer and user-transparent compiler optimization for interactive applications.
Proceedings of the PLDI '21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2021

HyFM: function merging for free.
Proceedings of the LCTES '21: 22nd ACM SIGPLAN/SIGBED International Conference on Languages, 2021

Understanding Training Efficiency of Deep Learning Recommendation Models at Scale.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

2020

RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

The Architectural Implications of Facebook's DNN-Based Personalized Recommendation.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

2019
MLPerf Training Benchmark.
CoRR, 2019

The Architectural Implications of Facebook's DNN-based Personalized Recommendation.
CoRR, 2019

SysML: The New Frontier of Machine Learning Systems.
CoRR, 2019

Bandana: Using Non-Volatile Memory for Storing Deep Learning Models.
Proceedings of the Second Conference on Machine Learning and Systems, SysML 2019, 2019

Machine Learning at Facebook: Understanding Inference at the Edge.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

2018
Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications.
CoRR, 2018

Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

Reducing DRAM footprint with NVM in facebook.
Proceedings of the Thirteenth EuroSys Conference, 2018

2016
Profiling a Warehouse-Scale Computer.
IEEE Micro, 2016

2014
Tradeoffs between power management and tail latency in warehouse-scale applications.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

2012
Memory optimization of dynamic binary translators for embedded systems.
ACM Trans. Archit. Code Optim., 2012

EcoSim: a language and experience teaching parallel programming in elementary school.
Proceedings of the 43rd ACM technical symposium on Computer science education, 2012

Fine-Grained Resource Sharing for Concurrent GPGPU Kernels.
Proceedings of the 4th USENIX Workshop on Hot Topics in Parallelism, 2012

2011
Dynamic Binary Modification: Tools, Techniques, and Applications
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01732-2, 2011

Finding cool code: An analysis of source-level causes of temperature effects.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011

Performance characterization of mobile-class nodes: Why fewer bits is better.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011

Where is the data? Why you cannot debate CPU vs. GPU performance without the answer.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011

Process-level virtualization for runtime adaptation of embedded software.
Proceedings of the 48th Design Automation Conference, 2011

Analyzing program flow within a many-kernel OpenCL application.
Proceedings of 4th Workshop on General Purpose Processing on Graphics Processing Units, 2011

2010
Eliminating voltage emergencies via software-guided code transformations.
ACM Trans. Archit. Code Optim., 2010

Analyzing Parallel Programs with Pin.
Computer, 2010

DBT path selection for holistic memory efficiency and performance.
Proceedings of the 6th International Conference on Virtual Execution Environments, 2010

Design of a custom VEE core in a chip multiprocessor.
Proceedings of the IEEE 8th Symposium on Application Specific Processors, 2010

Dynamic program analysis of Microsoft Windows applications.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2010

Balancing memory and performance through selective flushing of software code caches.
Proceedings of the 2010 International Conference on Compilers, 2010

2009
Multicore compilation strategies and challenges.
IEEE Signal Process. Mag., 2009

Challenges and opportunities at all levels: interactions among operating systems, compilers, and multicore processors.
ACM SIGOPS Oper. Syst. Rev., 2009

A cross-layer approach to heterogeneity and reliability.
Proceedings of the 7th ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE 2009), 2009

Scalable support for multithreaded applications on dynamic binary instrumentation systems.
Proceedings of the 8th International Symposium on Memory Management, 2009

2008
Trace fragment selection within method-based JVMs.
Proceedings of the 4th International Conference on Virtual Execution Environments, 2008

Evaluating the impact of dynamic binary translation systems on hardware cache performance.
Proceedings of the 4th International Symposium on Workload Characterization (IISWC 2008), 2008

2007
Virtual Execution Environments: Support and Tools.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Reducing Exit Stub Memory Consumption in Code Caches.
Proceedings of the High Performance Embedded Architectures and Compilers, 2007

SuperPin: Parallelizing Dynamic Instrumentation for Real-Time Performance.
Proceedings of the Fifth International Symposium on Code Generation and Optimization (CGO 2007), 2007

2006
Managing bounded code caches in dynamic binary optimization systems.
ACM Trans. Archit. Code Optim., 2006

A Cross-Architectural Interface for Code Cache Manipulation.
Proceedings of the Fourth IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2006), 2006

A dynamic binary instrumentation engine for the ARM architecture.
Proceedings of the 2006 International Conference on Compilers, 2006

2005
Pin: building customized program analysis tools with dynamic instrumentation.
Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, 2005

Improving Region Selection in Dynamic Optimization Systems.
Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-38 2005), 2005

2004
Eliminating voltage emergencies via microarchitectural voltage control feedback and dynamic optimization.
Proceedings of the 2004 International Symposium on Low Power Electronics and Design, 2004

Exploring Code Cache Eviction Granularities in Dynamic Optimization Systems.
Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004

2003
Generational Cache Management of Code Traces in Dynamic Optimization Systems.
Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

Adaptive Online Context-Sensitive Inlining.
Proceedings of the 1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), 2003

2002
Code Cache Management Schemes for Dynamic Optimizers.
Proceedings of the 6th Annual Workshop on Interaction between Compilers and Computer Architecture (INTERACT-6 2002), 2002

2000
A Lightweight Algorithm for Dynamic If-Conversion during Dynamic Optimization.
Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques (PACT'00), 2000


  Loading...