We stand with Ukraine

We stand with Ukraine

Saeed Maleki

Orcid: 0000-0002-7998-3681

According to our database¹, Saeed Maleki authored at least 39 papers between 2011 and 2024.

Collaborative distances:

Dijkstra number² of three.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

Efficient Schedule Construction for Distributed Execution of Large DNN Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

IEEE Trans. Parallel Distributed Syst., December, 2024

ForestColl: Efficient Collective Communications on Heterogeneous Network Fabrics.

[BibT_eX]

[DOI]

,

,

,

Hossein Pourreza

,

,

,

Arvind Krishnamurthy

CoRR, 2024

nnScaler: Constraint-Guided Parallelization Plan Generation for Deep Learning Training.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

Splitwise: Efficient Generative LLM Inference Using Phase Splitting.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Ricardo Bianchini

Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

Aceso: Efficient Parallel DNN Training through Iterative Bottleneck Alleviation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the Nineteenth European Conference on Computer Systems, 2024

A Framework for Fine-Grained Synchronization of Dependent GPU Kernels.

[BibT_eX]

[DOI]

,

,

Maryam Mehri Dehnavi

,

Madan Musuvathi

,

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2024

2023

Look-Up mAI GeMM: Increasing AI GeMMs Performance by Nearly 2.5x via msGeMM.

[BibT_eX]

[DOI]

CoRR, 2023

Rethinking Machine Learning Collective Communication as a Multi-Commodity Flow Problem.

[BibT_eX]

[DOI]

,

Siva Kesava Reddy Kakarla

,

,

Srikanth Kandula

,

,

CoRR, 2023

SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2023

TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches.

[BibT_eX]

[DOI]

,

Vijay Chidambaram

,

,

,

Madan Musuvathi

,

,

,

Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation, 2023

MSCCLang: Microsoft Collective Communication Language.

[BibT_eX]

[DOI]

,

,

Madanlal Musuvathi

,

,

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022

Error-Covariance Analysis of Monocular Pose Estimation Using Total Least Squares.

[BibT_eX]

[DOI]

,

John L. Crassidis

,

,

Matthias Schmid

CoRR, 2022

Optimal Pose Estimation and Covariance Analysis with Simultaneous Localization and Mapping Applications.

[BibT_eX]

[DOI]

,

,

,

John L. Crassidis

,

Matthias Schmid

CoRR, 2022

MSCCL: Microsoft Collective Communication Library.

[BibT_eX]

[DOI]

,

,

Madanlal Musuvathi

,

,

CoRR, 2022

Breaking the computation and communication abstraction barrier in distributed machine learning workloads.

[BibT_eX]

[DOI]

,

,

,

Amir Hossein Nodehi Sabet

,

,

,

Madanlal Musuvathi

,

,

Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021

Synthesizing Collective Communication Algorithms for Heterogeneous Networks with TACCL.

[BibT_eX]

[DOI]

,

Vijay Chidambaram

,

,

,

Madan Musuvathi

,

,

,

,

CoRR, 2021

Total Least Squares for Optimal Pose Estimation.

[BibT_eX]

[DOI]

,

John L. Crassidis

,

,

Matthias Schmid

CoRR, 2021

CoCoNet: Co-Optimizing Computation and Communication for Distributed Machine Learning.

[BibT_eX]

[DOI]

,

,

,

Amir Hossein Nodehi Sabet

,

,

,

Madanlal Musuvathi

,

,

CoRR, 2021

Synthesizing optimal collective algorithms.

[BibT_eX]

[DOI]

,

,

,

Madanlal Musuvathi

,

,

,

Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Scaling Distributed Training with Adaptive Summation.

[BibT_eX]

[DOI]

,

Madan Musuvathi

,

,

,

,

Vadim Eksarevskiy

,

Jaliya Ekanayake

,

Proceedings of the Fourth Conference on Machine Learning and Systems, 2021

Distributed Training of Embeddings using Graph Analytics.

[BibT_eX]

[DOI]

,

Roshan Dathathri

,

,

Madan Musuvathi

,

,

Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

2019

Distributed Word2Vec using Graph Analytics Frameworks.

[BibT_eX]

[DOI]

,

Roshan Dathathri

,

,

Madan Musuvathi

,

,

CoRR, 2019

CHET: an optimizing compiler for fully-homomorphic neural-network inferencing.

[BibT_eX]

[DOI]

Roshan Dathathri

,

,

,

,

Kristin E. Lauter

,

,

Madanlal Musuvathi

,

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019

2018

An empirical study of the effect of source-level loop transformations on compiler stability.

[BibT_eX]

[DOI]

Zhangxiaowen Gong

,

,

Justin Josef Szaday

,

,

,

Neftali Watkinson

,

,

,

Alexander V. Veidenbaum

,

Alexandru Nicolau

,

Josep Torrellas

Proc. ACM Program. Lang., 2018

CHET: Compiler and Runtime for Homomorphic Evaluation of Tensor Programs.

[BibT_eX]

[DOI]

Roshan Dathathri

,

,

,

,

Kristin E. Lauter

,

,

Madanlal Musuvathi

,

CoRR, 2018

Semantics-Preserving Parallelization of Stochastic Gradient Descent.

[BibT_eX]

[DOI]

,

Madanlal Musuvathi

,

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

2017

Parallel Stochastic Gradient Descent with Sound Combiners.

[BibT_eX]

[DOI]

,

Madanlal Musuvathi

,

CoRR, 2017

LORE: A loop repository for the evaluation of compilers.

[BibT_eX]

[DOI]

,

Zhangxiaowen Gong

,

Justin Josef Szaday

,

,

,

Alexandru Nicolau

,

Alexander V. Veidenbaum

,

Neftali Watkinson

,

,

,

Josep Torrellas

,

Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

2016

Low-Rank Methods for Parallelizing Dynamic Programming Algorithms.

[BibT_eX]

[DOI]

,

Madanlal Musuvathi

,

ACM Trans. Parallel Comput., 2016

Efficient parallelization using rank convergence in dynamic programming algorithms.

[BibT_eX]

[DOI]

,

Madanlal Musuvathi

,

Commun. ACM, 2016

DSMR: a shared and distributed memory algorithm for single-source shortest path problem.

[BibT_eX]

[DOI]

,

,

Andrew Lenharth

,

María Jesús Garzarán

,

,

Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

DSMR: A Parallel Algorithm for Single-Source Shortest Path Problem.

[BibT_eX]

[DOI]

,

,

Andrew Lenharth

,

María Jesús Garzarán

,

,

Proceedings of the 2016 International Conference on Supercomputing, 2016

Parallelizing WFST speech decoders.

[BibT_eX]

[DOI]

,

,

,

Madanlal Musuvathi

,

,

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Communication avoiding parallel algorithms for amorphous problems

[BibT_eX]

[DOI]

PhD thesis, 2015

2014

Parallelizing dynamic programming through rank convergence.

[BibT_eX]

[DOI]

,

Madanlal Musuvathi

,

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Tiled Linear Algebra a System for Parallel Graph Algorithms.

[BibT_eX]

[DOI]

,

,

Proceedings of the Languages and Compilers for Parallel Computing, 2014

2012

Performance Portability with the Chapel Language.

[BibT_eX]

[DOI]

Albert Sidelnik

,

,

Bradford L. Chamberlain

,

María Jesús Garzarán

,

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

2011

An Evaluation of Vectorizing Compilers.

[BibT_eX]

[DOI]

,

,

María Jesús Garzarán

,

,

Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

Loading...