Mithuna Thottethodi

CoRR, 2024

QED: Scalable Verification of Hardware Memory Consistency.

[BibT_eX]

[DOI]

CoRR, 2024

NetSmith: An Optimization Framework for Machine-Discovered Network Topologies.

[BibT_eX]

[DOI]

Conor James Green

Proceedings of the 53rd International Conference on Parallel Processing, 2024

2023

Occam: Optimal Data Reuse for Convolutional Neural Networks.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., March, 2023

SafeBet: Secure, Simple, and Fast Speculative Execution.

[BibT_eX]

[DOI]

CoRR, 2023

Eureka: Efficient Tensor Cores for One-sided Unstructured Sparsity in DNN Inference.

[BibT_eX]

[DOI]

Ashish Gondimalla

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

2022

Booster: An Accelerator for Gradient Boosting Decision Trees Training and Inference.

[BibT_eX]

[DOI]

Mingxuan He

Shankaranarayanan Puzhavakath Narayanan

Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

2021

Karma: Cost-Effective Geo-Replicated Cloud Storage with Dynamic Enforcement of Causal Consistency.

[BibT_eX]

[DOI]

Tariq Mahmood

Sanjay G. Rao

IEEE Trans. Cloud Comput., 2021

Barrier-Free Large-Scale Sparse Tensor Accelerator (BARISTA) For Convolutional Neural Networks.

[BibT_eX]

[DOI]

Ashish Gondimalla

Sree Charan Gundabolu

CoRR, 2021

FastZ: accelerating gapped whole genome alignment on GPUs.

[BibT_eX]

[DOI]

Sree Charan Gundabolu

Proceedings of the International Conference for High Performance Computing, 2021

2020

Dart: Divide and Specialize for Fast Response to Congestion in RDMA-Based Datacenter Networks.

[BibT_eX]

[DOI]

Muhammad Usama Chaudhry

Balajee Vamanan

IEEE/ACM Trans. Netw., 2020

Network Interface Architecture for Remote Indirect Memory Access (RIMA) in Datacenters.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2020

Booster: An Accelerator for Gradient Boosting Decision Trees.

[BibT_eX]

[DOI]

Mingxuan He

CoRR, 2020

Newton: A DRAM-maker's Accelerator-in-Memory (AiM) Architecture for Machine Learning.

[BibT_eX]

[DOI]

Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Secure automatic bounds checking: prevention is simpler than cure.

[BibT_eX]

[DOI]

Ejebagom John Ojogbo

Proceedings of the CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020

2019

SparTen: A Sparse Tensor Accelerator for Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

2018

Dart: Divide and Specialize for Fast Response to Congestion in RDMA-based Datacenter Networks.

[BibT_eX]

[DOI]

Jaichen Xue

Muhammad Usama Chaudhry

Balajee Vamanan

CoRR, 2018

Fast Congestion Control in RDMA-based Datacenter Networks.

[BibT_eX]

[DOI]

Jaichen Xue

Muhammad Usama Chaudhry

Balajee Vamanan

Proceedings of the ACM SIGCOMM 2018 Conference on Posters and Demos, 2018

Millipede: Die-Stacked Memory Optimizations for Big Data Machine Learning Analytics.

[BibT_eX]

[DOI]

Nitin

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

ACCORD: Automated Change Coordination across Independently Administered Cloud Services.

[BibT_eX]

[DOI]

Tariq Mahmood

Bharath Balasubramanian

Shankaranarayanan Puzhavakath Narayanan

Sanjay G. Rao

Kaustubh Joshi

Proceedings of the 11th IEEE International Conference on Cloud Computing, 2018

2017

NutShell: Scalable Whittled Proxy Execution for Low-Latency Web over Cellular Networks.

[BibT_eX]

[DOI]

Ashiwan Sivakumar

Chuan Jiang

Yun Seong Nam

Proceedings of the 23rd Annual International Conference on Mobile Computing and Networking, 2017

Efficient Collaborative Approximation in MapReduce without Missing Rare Keys.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Conference on Cloud and Autonomic Computing, 2017

2016

Enabling Efficient Dynamic Resizing of Large DRAM Caches via A Hardware Consistent Hashing Mechanism.

[BibT_eX]

[DOI]

CoRR, 2016

Extended task queuing: active messages for heterogeneous systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2016

Scalable, Global, Optimal-bandwidth, Application-Specific Routing.

[BibT_eX]

[DOI]

Ahmed H. Abdel-Gawad

Proceedings of the 24th IEEE Annual Symposium on High-Performance Interconnects, 2016

2014

Top Picks from the 2013 Computer Architecture Conferences.

[BibT_eX]

[DOI]

Shubu Mukherjee

IEEE Micro, 2014

RAHTM: Routing Algorithm Aware Hierarchical Task Mapping.

[BibT_eX]

[DOI]

Ahmed H. Abdel-Gawad

Abhinav Bhatele

Proceedings of the International Conference for High Performance Computing, 2014

MorphStore: A local file system for Big Data with utility-driven replication and load-adaptive access scheduling.

[BibT_eX]

[DOI]

Proceedings of the IEEE 30th Symposium on Mass Storage Systems and Technologies, 2014

2013

MapReduce with communication overlap (MaRCO).

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2013

PreTrans: Reducing TLB CAM-search via page number prediction and speculative pre-translation.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Understanding and mitigating the impact of load imbalance in the memory caching tier.

[BibT_eX]

[DOI]

Yu-Ju Hong

Proceedings of the ACM Symposium on Cloud Computing, SOCC '13, 2013

2012

A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch.

[BibT_eX]

[DOI]

Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Selective commitment and selective margin: Techniques to minimize cost in an IaaS cloud.

[BibT_eX]

[DOI]

Yu-Ju Hong

Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012

2011

Dynamic server provisioning to minimize cost in an IaaS cloud.

[BibT_eX]

[DOI]

Yu-Ju Hong

Proceedings of the SIGMETRICS 2011, 2011

TransCom: transforming stream communication for load balance and efficiency in networks-on-chip.

[BibT_eX]

[DOI]

Ahmed H. Abdel-Gawad

Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

2010

Trifecta: A Nonspeculative Scheme to Exploit Common, Data-Dependent Subcritical Paths.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2010

Adaptive Flow Control for Robust Performance and Energy.

[BibT_eX]

[DOI]

Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

SieveStore: a highly-selective, ensemble-level disk cache for cost-performance.

[BibT_eX]

[DOI]

Timothy Pritchett

Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

LiteTM: Reducing transactional state overhead.

[BibT_eX]

[DOI]

Syed Ali Raza Jafri

Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

2009

Undergraduate dual-core prototyping and analysis of factors influencing student success on dual-core designs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Microelectronic Systems Education, 2009

Disjoint-path routing: Efficient communication for streaming applications.

[BibT_eX]

[DOI]

Daeho Seo

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

2008

Automatic volume management for programmable microfluidics.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, 2008

Power-efficient clustering via incomplete bypassing.

[BibT_eX]

[DOI]

Eric P. Villasenor

Daeho Seo

Proceedings of the 2008 International Symposium on Low Power Electronics and Design, 2008

2007

Aquacore: a programmable architecture for microfluidics.

[BibT_eX]

[DOI]

Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

Table-lookup based Crossbar Arbitration for Minimal-Routed, 2D Mesh and Torus Networks.

[BibT_eX]

[DOI]

Daeho Seo

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Evaluating ISA Support and Hardware Support for Recursive Data Layouts.

[BibT_eX]

[DOI]

Won-Taek Lim

Proceedings of the High Performance Computing, 2007

Effective Management of DRAM Bandwidth in Multicore Processors.

[BibT_eX]

[DOI]

Nauman Rafique

Won-Taek Lim

Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006

Architectural support for operating system-driven CMP cache management.

[BibT_eX]

[DOI]

Nauman Rafique

Won-Taek Lim

Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006

2005

Near-Optimal Worst-Case Throughput Routing for Two-Dimensional Mesh Networks.

[BibT_eX]

[DOI]

Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005

2004

Exploiting Global Knowledge to Achieve Self-Tuned Congestion Control for k-Ary n-Cube Networks.

[BibT_eX]

[DOI]

Shubhendu S. Mukherjee

IEEE Trans. Parallel Distributed Syst., 2004

2003

BLAM : A High-Performance Routing Algorithm for Virtual Cut-Through Networks.

[BibT_eX]

[DOI]

Shubhendu S. Mukherjee

Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

2002

Recursive Array Layouts and Fast Matrix Multiplication.

[BibT_eX]

[DOI]

Praveen K. Patnala

IEEE Trans. Parallel Distributed Syst., 2002

2001

Self-Tuned Congestion Control for Multiprocessor Networks.

[BibT_eX]

[DOI]

Shubhendu S. Mukherjee

Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001

1999

Recursive Array Layouts and Fast Parallel Matrix Multiplication.

[BibT_eX]

[DOI]

Praveen K. Patnala

Proceedings of the Eleventh Annual ACM Symposium on Parallel Algorithms and Architectures, 1999

Nonlinear array layouts for hierarchical memory systems.

[BibT_eX]

[DOI]

Proceedings of the 13th international conference on Supercomputing, 1999

Annotated Memory References: A Mechanism for Informed Cache Management.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par '99 Parallel Processing, 5th International Euro-Par Conference, Toulouse, France, August 31, 1999

1998

Tuning Strassen's Matrix Multiplication for Memory Efficiency.

[BibT_eX]

[DOI]