Mikhail Smelyanskiy
Orcid: 0000-0002-2433-6110
According to our database1,
Mikhail Smelyanskiy
authored at least 65 papers
between 2000 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
2022
Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization.
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022
Check-N-Run: a Checkpointing System for Training Deep Learning Recommendation Models.
Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, 2022
Software-hardware co-design for fast and scalable training of deep learning recommendation models.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022
Proceedings of the 42nd IEEE International Conference on Distributed Computing Systems, 2022
2021
IEEE Micro, 2021
High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models.
CoRR, 2021
2020
Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems.
CoRR, 2020
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020
2019
CoRR, 2019
CoRR, 2019
Proceedings of the Second Conference on Machine Learning and Systems, SysML 2019, 2019
Proceedings of the 2019 IEEE Hot Chips 31 Symposium (HCS), 2019
2018
Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications.
CoRR, 2018
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018
2017
Proceedings of the 5th International Conference on Learning Representations, 2017
Proceedings of the Workshops of the The Thirty-First AAAI Conference on Artificial Intelligence, 2017
2016
Optimizations in a high-performance conjugate gradient benchmark for IA-based multi- and many-core processors.
Int. J. High Perform. Comput. Appl., 2016
Int. J. High Perform. Comput. Appl., 2016
CoRR, 2016
Proceedings of the International Conference for High Performance Computing, 2016
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016
2015
Can traditional programming bridge the ninja performance gap for parallel computing applications?
Commun. ACM, 2015
High-performance algebraic multigrid solver optimized for multi-core based distributed parallel systems.
Proceedings of the International Conference for High Performance Computing, 2015
Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015
2014
Sparsifying Synchronization for High-Performance Shared-Memory Sparse Triangular Solver.
Proceedings of the Supercomputing - 29th International Conference, 2014
Efficient Shared-Memory Implementation of High-Performance Conjugate Gradient Benchmark and its Application to Unstructured Matrices.
Proceedings of the International Conference for High Performance Computing, 2014
Proceedings of the International Conference for High Performance Computing, 2014
Petascale High Order Dynamic Rupture Earthquake Simulations on Heterogeneous Supercomputers.
Proceedings of the International Conference for High Performance Computing, 2014
Improving Communication Performance and Scalability of Native Applications on Intel Xeon Phi Coprocessor Clusters.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
2013
Efficient backprojection-based synthetic aperture radar computation with many-core processors.
Sci. Program., 2013
Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013
Exploring SIMD for Molecular Dynamics, Using Intel® Xeon® Processors and Intel® Xeon Phi Coprocessors.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013
Design and Implementation of the Linpack Benchmark for Single and Multi-node Systems Based on Intel® Xeon Phi Coprocessor.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013
Proceedings of the International Conference on Supercomputing, 2013
2012
Proceedings of the SC Conference on High Performance Computing Networking, 2012
Analysis and Optimization of Financial Analytics Benchmark on Modern Multi- and Many-core IA-Based Architectures.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012
2011
High-Performance 3D Compressive Sensing MRI Reconstruction Using Many-Core Architectures.
Int. J. Biomed. Imaging, 2011
Comput. Sci. Res. Dev., 2011
High-performance lattice QCD for multi-core based parallel systems using a cache-friendly hybrid threaded-MPI approach.
Proceedings of the Conference on High Performance Computing Networking, 2011
2010
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010
2009
Mapping High-Fidelity Volume Rendering for Medical Imaging to CPU, GPU and Many-Core Architectures.
IEEE Trans. Vis. Comput. Graph., 2009
2008
Proc. IEEE, 2008
Numerische Mathematik, 2008
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008
2007
Scaling performance of interior-point method on large-scale chip multiprocessor system.
Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007
2004
Hardware/software mechanisms for increasing resource utilization on VLIW/EPIC processors.
PhD thesis, 2004
Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004
2003
Proceedings of the 1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), 2003
Proceedings of the 14th IEEE International Conference on Application-Specific Systems, 2003
2001
Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001
2000
Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques (PACT'00), 2000