Mithuna Thottethodi

Orcid: 0000-0003-4164-4542

Affiliations:
  • Purdue University, West Lafayette, USA


According to our database1, Mithuna Thottethodi authored at least 57 papers between 1998 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Efficient Sparse Processing-in-Memory Architecture (ESPIM) for Machine Learning Inference.
CoRR, 2024

QED: Scalable Verification of Hardware Memory Consistency.
CoRR, 2024

NetSmith: An Optimization Framework for Machine-Discovered Network Topologies.
Proceedings of the 53rd International Conference on Parallel Processing, 2024

2023
Occam: Optimal Data Reuse for Convolutional Neural Networks.
ACM Trans. Archit. Code Optim., March, 2023

SafeBet: Secure, Simple, and Fast Speculative Execution.
CoRR, 2023

Eureka: Efficient Tensor Cores for One-sided Unstructured Sparsity in DNN Inference.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

2022
Booster: An Accelerator for Gradient Boosting Decision Trees Training and Inference.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

2021
Karma: Cost-Effective Geo-Replicated Cloud Storage with Dynamic Enforcement of Causal Consistency.
IEEE Trans. Cloud Comput., 2021

Barrier-Free Large-Scale Sparse Tensor Accelerator (BARISTA) For Convolutional Neural Networks.
CoRR, 2021

FastZ: accelerating gapped whole genome alignment on GPUs.
Proceedings of the International Conference for High Performance Computing, 2021

2020
Dart: Divide and Specialize for Fast Response to Congestion in RDMA-Based Datacenter Networks.
IEEE/ACM Trans. Netw., 2020

Network Interface Architecture for Remote Indirect Memory Access (RIMA) in Datacenters.
ACM Trans. Archit. Code Optim., 2020

Booster: An Accelerator for Gradient Boosting Decision Trees.
CoRR, 2020

Newton: A DRAM-maker's Accelerator-in-Memory (AiM) Architecture for Machine Learning.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Secure automatic bounds checking: prevention is simpler than cure.
Proceedings of the CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020

2019
SparTen: A Sparse Tensor Accelerator for Convolutional Neural Networks.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

2018
Dart: Divide and Specialize for Fast Response to Congestion in RDMA-based Datacenter Networks.
CoRR, 2018

Fast Congestion Control in RDMA-based Datacenter Networks.
Proceedings of the ACM SIGCOMM 2018 Conference on Posters and Demos, 2018

Millipede: Die-Stacked Memory Optimizations for Big Data Machine Learning Analytics.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

ACCORD: Automated Change Coordination across Independently Administered Cloud Services.
Proceedings of the 11th IEEE International Conference on Cloud Computing, 2018

2017
NutShell: Scalable Whittled Proxy Execution for Low-Latency Web over Cellular Networks.
Proceedings of the 23rd Annual International Conference on Mobile Computing and Networking, 2017

Efficient Collaborative Approximation in MapReduce without Missing Rare Keys.
Proceedings of the 2017 International Conference on Cloud and Autonomic Computing, 2017

2016
Enabling Efficient Dynamic Resizing of Large DRAM Caches via A Hardware Consistent Hashing Mechanism.
CoRR, 2016

Extended task queuing: active messages for heterogeneous systems.
Proceedings of the International Conference for High Performance Computing, 2016

Scalable, Global, Optimal-bandwidth, Application-Specific Routing.
Proceedings of the 24th IEEE Annual Symposium on High-Performance Interconnects, 2016

2014
Top Picks from the 2013 Computer Architecture Conferences.
IEEE Micro, 2014

RAHTM: Routing Algorithm Aware Hierarchical Task Mapping.
Proceedings of the International Conference for High Performance Computing, 2014

MorphStore: A local file system for Big Data with utility-driven replication and load-adaptive access scheduling.
Proceedings of the IEEE 30th Symposium on Mass Storage Systems and Technologies, 2014

2013
MapReduce with communication overlap (MaRCO).
J. Parallel Distributed Comput., 2013

PreTrans: Reducing TLB CAM-search via page number prediction and speculative pre-translation.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Understanding and mitigating the impact of load imbalance in the memory caching tier.
Proceedings of the ACM Symposium on Cloud Computing, SOCC '13, 2013

2012
A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Selective commitment and selective margin: Techniques to minimize cost in an IaaS cloud.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012

2011
Dynamic server provisioning to minimize cost in an IaaS cloud.
Proceedings of the SIGMETRICS 2011, 2011

TransCom: transforming stream communication for load balance and efficiency in networks-on-chip.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

2010
Trifecta: A Nonspeculative Scheme to Exploit Common, Data-Dependent Subcritical Paths.
IEEE Trans. Very Large Scale Integr. Syst., 2010

Adaptive Flow Control for Robust Performance and Energy.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

SieveStore: a highly-selective, ensemble-level disk cache for cost-performance.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

LiteTM: Reducing transactional state overhead.
Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

2009
Undergraduate dual-core prototyping and analysis of factors influencing student success on dual-core designs.
Proceedings of the IEEE International Conference on Microelectronic Systems Education, 2009

Disjoint-path routing: Efficient communication for streaming applications.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

2008
Automatic volume management for programmable microfluidics.
Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, 2008

Power-efficient clustering via incomplete bypassing.
Proceedings of the 2008 International Symposium on Low Power Electronics and Design, 2008

2007
Aquacore: a programmable architecture for microfluidics.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

Table-lookup based Crossbar Arbitration for Minimal-Routed, 2D Mesh and Torus Networks.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Evaluating ISA Support and Hardware Support for Recursive Data Layouts.
Proceedings of the High Performance Computing, 2007

Effective Management of DRAM Bandwidth in Multicore Processors.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006
Architectural support for operating system-driven CMP cache management.
Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006

2005
Near-Optimal Worst-Case Throughput Routing for Two-Dimensional Mesh Networks.
Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005

2004
Exploiting Global Knowledge to Achieve Self-Tuned Congestion Control for k-Ary n-Cube Networks.
IEEE Trans. Parallel Distributed Syst., 2004

2003
BLAM : A High-Performance Routing Algorithm for Virtual Cut-Through Networks.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

2002
Recursive Array Layouts and Fast Matrix Multiplication.
IEEE Trans. Parallel Distributed Syst., 2002

2001
Self-Tuned Congestion Control for Multiprocessor Networks.
Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001

1999
Recursive Array Layouts and Fast Parallel Matrix Multiplication.
Proceedings of the Eleventh Annual ACM Symposium on Parallel Algorithms and Architectures, 1999

Nonlinear array layouts for hierarchical memory systems.
Proceedings of the 13th international conference on Supercomputing, 1999

Annotated Memory References: A Mechanism for Informed Cache Management.
Proceedings of the Euro-Par '99 Parallel Processing, 5th International Euro-Par Conference, Toulouse, France, August 31, 1999

1998
Tuning Strassen's Matrix Multiplication for Memory Efficiency.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1998


  Loading...