Esha Choukse

Orcid: 0000-0003-0371-5522

According to our database1, Esha Choukse authored at least 26 papers between 2016 and 2024.

Collaborative distances:

Timeline

2016
2017
2018
2019
2020
2021
2022
2023
2024
0
5
10
15
6
2
1
1
1
9
1
1
1
1
1
1

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
DroidSpeak: Enhancing Cross-LLM Communication.
CoRR, 2024

Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations.
CoRR, 2024

Intelligent Router for LLM Workloads: Improving Performance Through Workload-Aware Scheduling.
CoRR, 2024

DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency.
CoRR, 2024

Towards Greener LLMs: Bringing Energy-Efficiency to the Forefront of LLM Inference.
CoRR, 2024

Junctiond: Extending FaaS Runtimes with Kernel-Bypass.
CoRR, 2024

Input-Dependent Power Usage in GPUs.
Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

Making Kernel Bypass Practical for the Cloud with Junction.
Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

Mosaic: Harnessing the Micro-Architectural Resources of Servers in Serverless Environments.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

Memory Allocation Under Hardware Compression.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

Designing Cloud Servers for Lower Carbon.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

SmartOClock: Workload- and Risk-Aware Overclocking in the Cloud.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Splitwise: Efficient Generative LLM Inference Using Phase Splitting.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

DyLeCT: Achieving Huge-page-like Translation Performance for Hardware-compressed Memory.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Characterizing Power Management Opportunities for LLMs in the Cloud.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
POLCA: Power Oversubscription in LLM Cloud Providers.
CoRR, 2023

Towards Improved Power Management in Cloud GPUs.
IEEE Comput. Archit. Lett., 2023

Myths and Misconceptions Around Reducing Carbon Embedded in Cloud Platforms.
Proceedings of the 2nd Workshop on Sustainable Computer Systems, 2023

2022
Overclocking in Immersion-Cooled Datacenters.
IEEE Micro, 2022

Translation-optimized Memory Compression for Capacity.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

2020
Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

2019
PruneTrain: Gradual Structured Pruning from Scratch for Faster Neural Network Training.
CoRR, 2019

PruneTrain: fast neural network training by dynamic sparse model reconfiguration.
Proceedings of the International Conference for High Performance Computing, 2019

2018
CompressPoints: An Evaluation Methodology for Compressed Memory Systems.
IEEE Comput. Archit. Lett., 2018

Compresso: Pragmatic Main Memory Compression.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

2016
Bit-Plane Compression: Transforming Data for Better Compression in Many-Core Architectures.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016


  Loading...