Dheevatsa Mudigere

According to our database1, Dheevatsa Mudigere authored at least 48 papers between 2010 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
OPER: Optimality-Guided Embedding Table Parallelization for Large-scale Recommendation Model.
Proceedings of the 2024 USENIX Annual Technical Conference, 2024

Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large Scale Recommendation.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

2023
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale.
CoRR, 2023

MTrainS: Improving DLRM training efficiency using heterogeneous memories.
CoRR, 2023

TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs.
Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation, 2023

2022
Learning to Collide: Recommendation System Model Compression with Learned Hash Functions.
CoRR, 2022

DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction.
CoRR, 2022

TopoOpt: Optimizing the Network Topology for Distributed DNN Training.
CoRR, 2022

EL-Rec: Efficient Large-Scale Recommendation Model Training via Tensor-Train Embedding Table.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization.
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

Check-N-Run: a Checkpointing System for Training Deep Learning Recommendation Models.
Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, 2022


Supporting Massive DLRM Inference through Software Defined Memory.
Proceedings of the 42nd IEEE International Conference on Distributed Computing Systems, 2022

2021
Differentiable NAS Framework and Application to Ads CTR Prediction.
CoRR, 2021

Supporting Massive DLRM Inference Through Software Defined Memory.
CoRR, 2021

High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models.
CoRR, 2021

Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems.
Proceedings of the IEEE International Symposium on Information Theory, 2021

SEERL: Sample Efficient Ensemble Reinforcement Learning.
Proceedings of the AAMAS '21: 20th International Conference on Autonomous Agents and Multiagent Systems, 2021

2020
Check-N-Run: A Checkpointing System for Training Recommendation Models.
CoRR, 2020

Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems.
CoRR, 2020

Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

Building Recommender Systems with PyTorch.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

The Architectural Implications of Facebook's DNN-Based Personalized Recommendation.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

Scalable Distributed Training of Recommendation Models: An ASTRA-SIM + NS3 case-study with TCP/IP transport.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2020

Efficient Distributed Hessian Free Algorithm for Large-scale Empirical Risk Minimization via Accumulating Sample Strategy.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

ERLP: Ensembles of Reinforcement Learning Policies (Student Abstract).
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
The Architectural Implications of Facebook's DNN-based Personalized Recommendation.
CoRR, 2019

Deep Learning Recommendation Model for Personalization and Recommendation Systems.
CoRR, 2019

A Study of BFLOAT16 for Deep Learning Training.
CoRR, 2019

2018
Hierarchical Block Sparse Neural Networks.
CoRR, 2018

On Scale-out Deep Learning Training for Cloud and HPC.
CoRR, 2018

A Progressive Batching L-BFGS Method for Machine Learning.
Proceedings of the 35th International Conference on Machine Learning, 2018

Mixed Precision Training of Convolutional Neural Networks using Integer Operations.
Proceedings of the 6th International Conference on Learning Representations, 2018

RAIL: Risk-Averse Imitation Learning.
Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, 2018

2017
Ternary Neural Networks with Fine-Grained Quantization.
CoRR, 2017

Mixed Low-precision Deep Learning Inference using Dynamic Fixed Point.
CoRR, 2017

Ternary Residual Networks.
CoRR, 2017

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima.
Proceedings of the 5th International Conference on Learning Representations, 2017

Distributed Hessian-Free Optimization for Deep Neural Network.
Proceedings of the Workshops of the The Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
Large Scale Distributed Hessian-Free Optimization for Deep Neural Network.
CoRR, 2016

Distributed Deep Learning Using Synchronous Stochastic Gradient Descent.
CoRR, 2016

2015
Identification of Helicopter Dynamics based on Flight Data using Nature Inspired Techniques.
Int. J. Appl. Metaheuristic Comput., 2015

High-performance algebraic multigrid solver optimized for multi-core based distributed parallel systems.
Proceedings of the International Conference for High Performance Computing, 2015

Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

2011
Nature inspired optimization techniques for the design optimization of laminated composite structures using failure criteria.
Expert Syst. Appl., 2011

2010
Fast GPGPU Data Rearrangement Kernels using CUDA
CoRR, 2010

Fast Histograms using Adaptive CUDA Streams
CoRR, 2010


  Loading...