Gennady Pekhimenko
Orcid: 0000-0002-3839-0919Affiliations:
- University of Toronto
- Microsoft Research
- Carnegie Mellon University (former)
According to our database1,
Gennady Pekhimenko
authored at least 100 papers
between 2010 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
-
on github.com
-
on cs.cmu.edu
-
on dl.acm.org
On csauthors.net:
Bibliography
2024
CoRR, 2024
APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model Prompts.
CoRR, 2024
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024
Sylva: Sparse Embedded Adapters via Hierarchical Approximate Second-Order Information.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Nineteenth European Conference on Computer Systems, 2024
BOOM: Use your Desktop to Accurately Predict the Performance of Large Deep Neural Networks.
Proceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques, 2024
2023
Nat. Mac. Intell., July, 2023
CoRR, 2023
Proceedings of the 2023 USENIX Annual Technical Conference, 2023
Hotline Profiler: Automatic Annotation and A Multi-Scale Timeline for Visualizing Time-Use in DNN Training.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023
Grape: Practical and Efficient Graphed Execution for Dynamic Deep Neural Networks on GPUs.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023
Proceedings of the Programming Languages and Systems - 21st Asian Symposium, 2023
2022
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022
Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Proceedings of the Fifth Conference on Machine Learning and Systems, 2022
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022
How to validate Machine Learning Models Prior to Deployment: Silent trial protocol for evaluation of real-time models at ICU.
Proceedings of the Conference on Health, Inference, and Learning, 2022
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2022
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022
2021
ACM Trans. Archit. Code Optim., 2021
MedPerf: Open Benchmarking Platform for Medical Artificial Intelligence using Federated Evaluation.
CoRR, 2021
Computational Performance Predictions for Deep Neural Network Training: A Runtime-Based Approach.
CoRR, 2021
Habitat: A Runtime-Based Computational Performance Predictor for Deep Neural Network Training.
Proceedings of the 2021 USENIX Annual Technical Conference, 2021
Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Proceedings of the Fourth Conference on Machine Learning and Systems, 2021
Proceedings of the Fourth Conference on Machine Learning and Systems, 2021
Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models.
Proceedings of the Fourth Conference on Machine Learning and Systems, 2021
Proceedings of the Fourth Conference on Machine Learning and Systems, 2021
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021
2020
CoRR, 2020
TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference.
CoRR, 2020
Proceedings of the 2020 USENIX Annual Technical Conference, 2020
Skyline: Interactive In-Editor Computational Performance Profiling for Deep Neural Network Training.
Proceedings of the UIST '20: The 33rd Annual ACM Symposium on User Interface Software and Technology, 2020
Proceedings of the Third Conference on Machine Learning and Systems, 2020
Proceedings of the Third Conference on Machine Learning and Systems, 2020
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020
2019
Proceedings of the Second Conference on Machine Learning and Systems, SysML 2019, 2019
Proceedings of the 46th International Symposium on Computer Architecture, 2019
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019
Proceedings of the Approximate Circuits, Methodologies and CAD., 2019
2018
CoRR, 2018
CoRR, 2018
Flexible-Latency DRAM: Understanding and Exploiting Latency Variation in Modern DRAM Chips.
CoRR, 2018
CoRR, 2018
Decoupling GPU Programming Models from Resource Management for Enhanced Programming Ease, Portability, and Performance.
CoRR, 2018
Zorua: Enhancing Programming Ease, Portability, and Performance in GPUs by Decoupling Programming Models from Resource Management.
CoRR, 2018
Proceedings of the 2018 USENIX Annual Technical Conference, 2018
A Case for Richer Cross-Layer Abstractions: Bridging the Semantic Gap with Expressive Memory.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018
Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018
Proceedings of the 28th Annual International Conference on Computer Science and Software Engineering, 2018
2017
Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms.
Proc. ACM Meas. Anal. Comput. Syst., 2017
Proceedings of the 2017 USENIX Annual Technical Conference, 2017
SoftMC: A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017
2016
ACM Trans. Archit. Code Optim., 2016
ACM Trans. Archit. Code Optim., 2016
IEEE Des. Test, 2016
CoRR, 2016
Reducing DRAM Latency by Exploiting Design-Induced Latency Variation in Modern DRAM Chips.
CoRR, 2016
Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization.
Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, 2016
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016
2015
Simultaneous Multi Layer Access: A High Bandwidth and Low Cost 3D-Stacked Memory Interface.
CoRR, 2015
Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping.
Bioinform., 2015
PocketTrend: Timely Identification and Delivery of Trending Search Content to Mobile Users.
Proceedings of the 24th International Conference on World Wide Web, 2015
A case for core-assisted bottleneck acceleration in GPUs: enabling flexible data compression with assist warps.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015
Page overlays: an enhanced virtual memory framework to enable fine-grained memory management.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015
2014
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014
2013
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013
Linearly compressed pages: a low-complexity, low-latency main memory compression framework.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013
2012
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012
Linearly compressed pages: a main memory compression framework with low complexity and low latency.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012
2010
Proceedings of the Software Automatic Tuning, From Concepts to State-of-the-Art Results, 2010