Arun Kejariwal

Orcid: 0009-0006-6172-2973

According to our database1, Arun Kejariwal authored at least 66 papers between 2003 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Layer Compression of Deep Networks with Straight Flows.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
HHVM Performance Optimization for Large Scale Web Services.
Proceedings of the 2023 ACM/SPEC International Conference on Performance Engineering, 2023

Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

Characterization of Data Compression in Datacenters.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023

2022
Future gradient descent for adapting the temporal shifting data distribution in online recommendation systems.
Proceedings of the Uncertainty in Artificial Intelligence, 2022

DreamShard: Generalizable Embedding Table Placement for Recommender Systems.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Harmless Transfer Learning for Item Embeddings.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022, 2022

AutoShard: Automated Embedding Table Sharding for Recommender Systems.
Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

Understanding Data Compression in Warehouse-Scale Datacenter Services.
Proceedings of the International IEEE Symposium on Performance Analysis of Systems and Software, 2022

Building a Performance Model for Deep Learning Recommendation Model Training on GPUs.
Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022

2021
Training Recommender Systems at Scale: Communication-Efficient Model and Data Parallelism.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

Alternate Model Growth and Pruning for Efficient Training of Recommendation Systems.
Proceedings of the 20th IEEE International Conference on Machine Learning and Applications, 2021

2020
Fast Distributed Training of Deep Neural Networks: Dynamic Communication Thresholding for Model and Data Parallelism.
CoRR, 2020

Adaptive Dense-to-Sparse Paradigm for Pruning Online Recommendation System with Non-Stationary Data.
CoRR, 2020

Le Taureau: Deconstructing the Serverless Landscape & A Look Forward.
Proceedings of the 2020 International Conference on Management of Data, 2020

2017
On the Runtime-Efficacy Trade-off of Anomaly Detection Techniques for Real-Time Streaming Data.
CoRR, 2017

Automatic Anomaly Detection in the Cloud Via Statistical Learning.
CoRR, 2017

2016
On the Definition of Real-Time: Applications and Systems.
Proceedings of the 2016 IEEE Trustcom/BigDataSE/ISPA, 2016

Leveraging cloud data to mitigate user experience from 'breaking bad'.
Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

2015
Real Time Analytics: Algorithms and Systems.
Proc. VLDB Endow., 2015

2014
A Novel Technique for Long-Term Anomaly Detection in the Cloud.
Proceedings of the 6th USENIX Workshop on Hot Topics in Cloud Computing, 2014

2013
Chiffchaff: Observability and analytics to achieve high availability.
Proceedings of the IEEE Symposium on Large-Scale Data Analysis and Visualization, 2013

Techniques for Optimizing Cloud Footprint.
Proceedings of the 2013 IEEE International Conference on Cloud Engineering, 2013

A Tool for Practical Garbage Collection Analysis in the Cloud.
Proceedings of the 2013 IEEE International Conference on Cloud Engineering, 2013

Visual Analytics Framework for Cloud Infrastructure Data.
Proceedings of the 16th IEEE International Conference on Computational Science and Engineering, 2013

On the Determination of Inlining Vectors for Program Optimization.
Proceedings of the Compiler Construction - 22nd International Conference, 2013

2012
Trin-Trin: Who's Calling? A Pin-Based Dynamic Call Graph Extraction Framework.
Int. J. Parallel Program., 2012

Big Data Challenges: A Program Optimization Perspective.
Proceedings of the 2012 Second International Conference on Cloud and Green Computing, 2012

Selective search of inlining vectors for program optimization.
Proceedings of the Computing Frontiers Conference, CF'12, 2012

2011
Modulo Scheduling and Loop Pipelining.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Pruning hardware evaluation space via correlation-driven application similarity analysis.
Proceedings of the 8th Conference on Computing Frontiers, 2011

2010
On the efficacy of call graph-level thread-level speculation.
Proceedings of the first joint WOSP/SIPEW International Conference on Performance Engineering, 2010

How Many Threads to Spawn during Program Multithreading?
Proceedings of the Languages and Compilers for Parallel Computing, 2010

Exploitation of nested thread-level speculative parallelism on multi-core systems.
Proceedings of the 7th Conference on Computing Frontiers, 2010

2009
On the exploitation of loop-level parallelism in embedded applications.
ACM Trans. Embed. Comput. Syst., 2009

Cache-aware partitioning of multi-dimensional iteration spaces.
Proceedings of of SYSTOR 2009: The Israeli Experimental Systems Conference 2009, 2009

Performance Characterization of Itanium® 2-Based Montecito Processor.
Proceedings of the Computer Performance Evaluation and Benchmarking, 2009

Techniques for efficient placement of synchronization primitives.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Parallelization spectroscopy: analysis of thread-level parallelism in hpc programs.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Synchronization optimizations for efficient execution on multi-cores.
Proceedings of the 23rd international conference on Supercomputing, 2009

Efficient Scheduling of Nested Parallel Loops on Multi-Core Systems.
Proceedings of the ICPP 2009, 2009

2008
Improving SDRAM access energy efficiency for low-power embedded systems.
ACM Trans. Embed. Comput. Syst., 2008

Comparative architectural characterization of SPEC CPU2000 and CPU2006 benchmarks on the intel® Core<sup>TM</sup> 2 Duo processor.
Proceedings of the 2008 International Conference on Embedded Computer Systems: Architectures, 2008

Cache-aware iteration space partitioning.
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

Compiler-Driven Dependence Profiling to Guide Program Parallelization.
Proceedings of the Languages and Compilers for Parallel Computing, 2008

2007
A predictive decode filter cache for reducing power consumption in embedded processors.
ACM Trans. Design Autom. Electr. Syst., 2007

Comparative characterization of SPEC CPU2000 and CPU2006 on Itanium architecture.
Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2007

Tight analysis of the performance potential of thread speculation using spec CPU 2006.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

2006
Energy efficient watermarking on mobile devices using proxy-based partitioning.
IEEE Trans. Very Large Scale Integr. Syst., 2006

A general approach for partitioning N-dimensional parallel nested loops with conditionals.
Proceedings of the SPAA 2006: Proceedings of the 18th Annual ACM Symposium on Parallelism in Algorithms and Architectures, Cambridge, Massachusetts, USA, July 30, 2006

Rapid Resource-Constrained Hardware Performance Estimation.
Proceedings of the 17th IEEE International Workshop on Rapid System Prototyping (RSP 2006), 2006

On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

Lightweight lock-free synchronization methods for multithreading.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

History-aware Self-Scheduling.
Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006

Probablistic Self-Scheduling.
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

Challenges in exploitation of loop parallelism in embedded applications.
Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis, 2006

2005
A novel approach for partitioning iteration spaces with variable densities.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005

An Efficient Approach for Self-scheduling Parallel Loops on Multiprogrammed Parallel Computers.
Proceedings of the Languages and Compilers for Parallel Computing, 2005

An Efficient Load Balancing Scheme for Grid-based High Performance Scientific Computing.
Proceedings of the 4th International Symposium on Parallel and Distributed Computing (ISPDC 2005), 2005

Enhanced Loop Coalescing: A Compiler Technique for Transforming Non-uniform Iteration Spaces.
Proceedings of the High-Performance Computing - 6th International Symposium, 2005

Energy Analysis of Multimedia Watermarking on Mobile Handheld Devices.
Proceedings of the 2005 3rd Workshop on Embedded Systems for Real-Time Multimedia, 2005

High performance annotation-aware JVM for Java cards.
Proceedings of the EMSOFT 2005, 2005

2004
Synthesis-driven Exploration of Pipelined Embedded Processors.
Proceedings of the 17th International Conference on VLSI Design (VLSI Design 2004), 2004

A Geometric Approach for Partitioning N-Dimensional Non-rectangular Iteration Spaces.
Proceedings of the Languages and Compilers for High Performance Computing, 2004

Proxy-based task partitioning of watermarking algorithms for reducing energy consumption in mobile devices.
Proceedings of the 41th Design Automation Conference, 2004

2003
Rapid Exploration of Pipelined Processors through Automatic Generation of Synthesizable RTL Models.
Proceedings of the 14th IEEE International Workshop on Rapid System Prototyping (RSP 2003), 2003


  Loading...