Alexey Tumanov

According to our database1, Alexey Tumanov authored at least 54 papers between 2007 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations.
CoRR, 2024

Metron: Holistic Performance Evaluation Framework for LLM Inference Systems.
CoRR, 2024

DεS: Delayed ε-Shrinking for Faster Once-For-All Training.
CoRR, 2024

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve.
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

VIDUR: A Large-Scale Simulation Framework for LLM Inference.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

Harmonica: Hybrid Accelerator to Overcome Imperfections of Mixed-signal DNN Accelerators.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

SuperFedNAS: Cost-Efficient Federated Neural Architecture Search for On-device Inference.
Proceedings of the Computer Vision - ECCV 2024, 2024

Inshrinkerator: Compressing Deep Learning Training Checkpoints via Dynamic Quantization.
Proceedings of the 2024 ACM Symposium on Cloud Computing, 2024

2023
Hardware-Software Co-Design for Real-Time Latency-Accuracy Navigation in Tiny Machine Learning Applications.
IEEE Micro, 2023

SuperServe: Fine-Grained Inference Serving for Unpredictable Workloads.
CoRR, 2023

Signed Binarization: Unlocking Efficiency Through Repetition-Sparsity Trade-Off.
CoRR, 2023

ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation.
CoRR, 2023

Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems.
CoRR, 2023

DynaQuant: Compressing Deep Learning Training Checkpoints via Dynamic Quantization.
CoRR, 2023

SuperFed: Weight Shared Federated Learning.
CoRR, 2023

Subgraph Stationary Hardware-Software Inference Co-Design.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

TransEHR: Self-Supervised Transformer for Clinical Time Series Data.
Proceedings of the Machine Learning for Health, 2023

2022
UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

CoDG-ReRAM: An Algorithm-Hardware Co-design to Accelerate Semi-Structured GNNs on ReRAM.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022

Automatic Parallelization of Python Programs for Distributed Heterogeneous Computing.
Proceedings of the Euro-Par 2022: Parallel Processing, 2022

ESCHER: expressive scheduling with ephemeral resources.
Proceedings of the 13th Symposium on Cloud Computing, SoCC 2022, 2022

2021
CompOFA - Compound Once-For-All Networks for Faster Multi-Platform Deployment.
Proceedings of the 9th International Conference on Learning Representations, 2021

RubberBand: cloud-based hyperparameter tuning.
Proceedings of the EuroSys '21: Sixteenth European Conference on Computer Systems, 2021

2020
Cloudburst: Stateful Functions-as-a-Service.
Proc. VLDB Endow., 2020

Cloudburst: Stateful Functions-as-a-Service.
CoRR, 2020

HOLMES: Health OnLine Model Ensemble Serving for Deep Learning Models in Intensive Care Units.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

InferLine: latency-aware provisioning and scaling for prediction serving pipelines.
Proceedings of the SoCC '20: ACM Symposium on Cloud Computing, 2020

2019
The OoO VLIW JIT Compiler for GPU Inference.
CoRR, 2019

Dynamic Space-Time Scheduling for GPU Inference.
CoRR, 2019

Lineage stash: fault tolerance off the critical path.
Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019

HyperSched: Dynamic Resource Reallocation for Model Development on a Deadline.
Proceedings of the ACM Symposium on Cloud Computing, SoCC 2019, 2019

Cirrus: a Serverless Framework for End-to-end ML Workflows.
Proceedings of the ACM Symposium on Cloud Computing, SoCC 2019, 2019

Serverless Computing: One Step Forward, Two Steps Back.
Proceedings of the 9th Biennial Conference on Innovative Data Systems Research, 2019

2018
InferLine: ML Inference Pipeline Composition Framework.
CoRR, 2018

Tributary: spot-dancing for elastic services with latency SLOs.
Proceedings of the 2018 USENIX Annual Technical Conference, 2018

IDK Cascades: Fast Deep Learning by Learning not to Overthink.
Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, 2018

Ray: A Distributed Framework for Emerging AI Applications.
Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, 2018

3Sigma: distribution-based cluster scheduling for runtime uncertainty.
Proceedings of the Thirteenth EuroSys Conference, 2018

2017
Ray: A Distributed Framework for Emerging AI Applications.
CoRR, 2017

IDK Cascades: Fast Deep Learning by Learning not to Overthink.
CoRR, 2017

Real-Time Machine Learning: The Missing Pieces.
CoRR, 2017

Real-Time Machine Learning: The Missing Pieces.
Proceedings of the 16th Workshop on Hot Topics in Operating Systems, 2017

Proteus: agile ML elasticity through tiered reliability in dynamic resource markets.
Proceedings of the Twelfth European Conference on Computer Systems, 2017

2016
Scheduling with Space-Time Soft Constraints In Heterogeneous Cloud Datacenters.
PhD thesis, 2016

Morpheus: Towards Automated SLOs for Enterprise Clusters.
Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, 2016

TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters.
Proceedings of the Eleventh European Conference on Computer Systems, 2016

2014
Agility and Performance in Elastic Distributed Storage.
ACM Trans. Storage, 2014

SpringFS: bridging agility and performance in elastic distributed storage.
Proceedings of the 12th USENIX conference on File and Storage Technologies, 2014

PriorityMeister: Tail Latency QoS for Shared Networked Storage.
Proceedings of the ACM Symposium on Cloud Computing, 2014

Exploiting iterative-ness for parallel ML computations.
Proceedings of the ACM Symposium on Cloud Computing, 2014

2012
alsched: algebraic scheduling of mixed workloads in heterogeneous clouds.
Proceedings of the ACM Symposium on Cloud Computing, SOCC '12, 2012

Heterogeneity and dynamicity of clouds at scale: Google trace analysis.
Proceedings of the ACM Symposium on Cloud Computing, SOCC '12, 2012

2011
Kaleidoscope: cloud micro-elasticity via VM state coloring.
Proceedings of the European Conference on Computer Systems, 2011

2007
Variability-Aware Latency Amelioration in Distributed Environments.
Proceedings of the IEEE Virtual Reality Conference, 2007


  Loading...