T. N. Vijaykumar

CoRR, 2024

QED: Scalable Verification of Hardware Memory Consistency.

[BibT_eX]

[DOI]

CoRR, 2024

2023

Occam: Optimal Data Reuse for Convolutional Neural Networks.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., March, 2023

SafeBet: Secure, Simple, and Fast Speculative Execution.

[BibT_eX]

[DOI]

CoRR, 2023

Eureka: Efficient Tensor Cores for One-sided Unstructured Sparsity in DNN Inference.

[BibT_eX]

[DOI]

Ashish Gondimalla

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

2022

Booster: An Accelerator for Gradient Boosting Decision Trees Training and Inference.

[BibT_eX]

[DOI]

Mingxuan He

Shankaranarayanan Puzhavakath Narayanan

Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

2021

Karma: Cost-Effective Geo-Replicated Cloud Storage with Dynamic Enforcement of Causal Consistency.

[BibT_eX]

[DOI]

Tariq Mahmood

Sanjay G. Rao

IEEE Trans. Cloud Comput., 2021

Barrier-Free Large-Scale Sparse Tensor Accelerator (BARISTA) For Convolutional Neural Networks.

[BibT_eX]

[DOI]

Ashish Gondimalla

Sree Charan Gundabolu

CoRR, 2021

FastZ: accelerating gapped whole genome alignment on GPUs.

[BibT_eX]

[DOI]

Sree Charan Gundabolu

Proceedings of the International Conference for High Performance Computing, 2021

2020

Dart: Divide and Specialize for Fast Response to Congestion in RDMA-Based Datacenter Networks.

[BibT_eX]

[DOI]

Jiachen Xue

Muhammad Usama Chaudhry

IEEE/ACM Trans. Netw., 2020

Network Interface Architecture for Remote Indirect Memory Access (RIMA) in Datacenters.

[BibT_eX]

[DOI]

Jiachen Xue

ACM Trans. Archit. Code Optim., 2020

Booster: An Accelerator for Gradient Boosting Decision Trees.

[BibT_eX]

[DOI]

Mingxuan He

CoRR, 2020

Newton: A DRAM-maker's Accelerator-in-Memory (AiM) Architecture for Machine Learning.

[BibT_eX]

[DOI]

Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Secure automatic bounds checking: prevention is simpler than cure.

[BibT_eX]

[DOI]

Ejebagom John Ojogbo

Proceedings of the CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020

2019

SparTen: A Sparse Tensor Accelerator for Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

2018

Dart: Divide and Specialize for Fast Response to Congestion in RDMA-based Datacenter Networks.

[BibT_eX]

[DOI]

Jaichen Xue

Muhammad Usama Chaudhry

CoRR, 2018

Fast Congestion Control in RDMA-based Datacenter Networks.

[BibT_eX]

[DOI]

Jaichen Xue

Muhammad Usama Chaudhry

Proceedings of the ACM SIGCOMM 2018 Conference on Posters and Demos, 2018

Millipede: Die-Stacked Memory Optimizations for Big Data Machine Learning Analytics.

[BibT_eX]

[DOI]

Nitin

Shankaranarayanan Puzhavakath Narayanan

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

2017

NutShell: Scalable Whittled Proxy Execution for Low-Latency Web over Cellular Networks.

[BibT_eX]

[DOI]

Ashiwan Sivakumar

Chuan Jiang

Yun Seong Nam

Proceedings of the 23rd Annual International Conference on Mobile Computing and Networking, 2017

Efficient Collaborative Approximation in MapReduce without Missing Rare Keys.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Conference on Cloud and Autonomic Computing, 2017

Exploring Functional Slicing in the Design of Distributed SDN Controllers.

[BibT_eX]

[DOI]

Proceedings of the Communication Systems and Networks - 9th International Conference, 2017

Hydra: Leveraging functional slicing for efficient distributed SDN controllers.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Communication Systems and Networks, 2017

2015

TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for On-line Data-Intensive Applications.

[BibT_eX]

[DOI]

CoRR, 2015

MigrantStore: Leveraging Virtual Memory in DRAM-PCM Memory Architecture.

[BibT_eX]

[DOI]

Hamza Bin Sohail

CoRR, 2015

TimeTrader: exploiting latency tail to save datacenter energy for online search.

[BibT_eX]

[DOI]

Proceedings of the 48th International Symposium on Microarchitecture, 2015

FaultHound: value-locality-based soft-fault tolerance.

[BibT_eX]

[DOI]

Nitin

Irith Pomeranz

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

2014

ShuffleWatcher: Shuffle-aware Scheduling in Multi-tenant MapReduce Clusters.

[BibT_eX]

[DOI]

Proceedings of the 2014 USENIX Annual Technical Conference, 2014

Fractal++: Closing the performance gap between fractal and conventional coherence.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

High-performance fractal coherence.

[BibT_eX]

[DOI]

Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

2013

MapReduce with communication overlap (MaRCO).

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2013

Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies.

[BibT_eX]

[DOI]

Syed Ali Raza Jafri

Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2013

2012

Top Picks from the 2011 Computer Architecture Conferences.

[BibT_eX]

[DOI]

Paolo Faraboschi

IEEE Micro, 2012

Deadline-aware datacenter tcp (D2TCP).

[BibT_eX]

[DOI]

Jahangir Hasan

Proceedings of the ACM SIGCOMM 2012 Conference, 2012

Tarazu: optimizing MapReduce on heterogeneous clusters.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012

2011

TreeCAM: decoupling updates and lookups in packet classification.

[BibT_eX]

[DOI]

Proceedings of the 2011 Conference on Emerging Networking Experiments and Technologies, 2011

2010

EffiCuts: optimizing packet classification for memory and throughput.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGCOMM 2010 Conference on Applications, 2010

Adaptive Flow Control for Robust Performance and Energy.

[BibT_eX]

[DOI]

Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Timetraveler: exploiting acyclic races for optimizing memory race recording.

[BibT_eX]

[DOI]

Faraz Ahmad

Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

LiteTM: Reducing transactional state overhead.

[BibT_eX]

[DOI]

Syed Ali Raza Jafri

Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

Joint optimization of idle and cooling power in data centers while maintaining response time.

[BibT_eX]

[DOI]

Faraz Ahmad

Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

2009

Speculatively Multithreaded Architectures.

[BibT_eX]

[DOI]

Proceedings of the Multicore Processors and Systems, 2009

2008

Optimal Power/Performance Pipeline Depth for SMT in Scaled Technologies.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2008

Automatic volume management for programmable microfluidics.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, 2008

Shapeshifter: Dynamically changing pipeline width and speed to address process variations.

[BibT_eX]

[DOI]

Eric Chun

Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

2007

Speculative thread decomposition through empirical optimization.

[BibT_eX]

[DOI]

Troy A. Johnson

Rudolf Eigenmann

Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Resource area dilation to reduce power density in throughput servers.

[BibT_eX]

[DOI]

Proceedings of the 2007 International Symposium on Low Power Electronics and Design, 2007

Aquacore: a programmable architecture for microfluidics.

[BibT_eX]

[DOI]

Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

BlackJack: Hard Error Detection with Redundant Threads on SMT.

[BibT_eX]

[DOI]

Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2007

2006

Exploiting reference idempotency to reduce speculative storage overflow.

[BibT_eX]

[DOI]

ACM Trans. Program. Lang. Syst., 2006

SmashGuard: A Hardware Solution to Prevent Security Attacks on the Function Return Address.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2006

Opportunistic Transient-Fault Detection.

[BibT_eX]

[DOI]

Mohamed A. Gomaa

IEEE Micro, 2006

Dynamic feature selection for hardware prediction.

[BibT_eX]

[DOI]

J. Syst. Archit., 2006

Pesticide: Using SMT Processors to Improve Performance of Pointer Bug Detection.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Computer Design (ICCD 2006), 2006

A program transformation and architecture support for quantum uncomputation.

[BibT_eX]

[DOI]

Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006

Do Trace Cache, Value Prediction and Prefetching Improve SMT Throughput?.

[BibT_eX]

[DOI]

Chen-Yong Cher

Proceedings of the Architecture of Computing Systems, 2006

2005

Combined circuit and architectural level variable supply-voltage scaling for low power.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2005

Detection and prevention of stack buffer overflow attacks.

[BibT_eX]

[DOI]

Commun. ACM, 2005

Dynamic pipelining: making IP-lookup truly scalable.

[BibT_eX]

[DOI]

Jahangir Hasan

Proceedings of the ACM SIGCOMM 2005 Conference on Applications, 2005

Balancing Resource Utilization to Mitigate Power Density in Processor Pipelines.

[BibT_eX]

[DOI]

Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-38 2005), 2005

Rescue: A Microarchitecture for Testability and Defect Tolerance.

[BibT_eX]

[DOI]

Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005

Optimizing Replication, Communication, and Capacity Allocation in CMPs.

[BibT_eX]

[DOI]

Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005

Heat Stroke: Power-Density-Based Denial of Service in SMT.

[BibT_eX]

[DOI]

Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

2004

DCG: deterministic clock-gating for low-power microprocessor design.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2004

Min-cut program decomposition for thread-level speculation.

[BibT_eX]

[DOI]

Troy A. Johnson

Rudolf Eigenmann

Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation 2004, 2004

Wire Delay is Not a Problem for SMT (In the Near Future).

[BibT_eX]

[DOI]

Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

Exploiting Resonant Behavior to Reduce Inductive Noise.

[BibT_eX]

[DOI]

Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

Heat-and-run: leveraging SMT and CMP to manage power density through the operating system.

[BibT_eX]

[DOI]

Mohamed A. Gomaa

Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, 2004

Software prefetching for mark-sweep garbage collection: hardware analysis and software redesign.

[BibT_eX]

[DOI]

Chen-Yong Cher

Antony L. Hosking

Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, 2004

2003

Transient-Fault Recovery for Chip Multiprocessors.

[BibT_eX]

[DOI]

IEEE Micro, 2003

Reducing Design Complexity of the Load/Store Queue.

[BibT_eX]

[DOI]

Chong-liang Ooi

Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

VSV: L2-Miss-Driven Variable Supply-Voltage Scaling for Low Power.

[BibT_eX]

[DOI]

Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures.

[BibT_eX]

[DOI]

Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

Accelerating private-key cryptography via multithreading on symmetric multiprocessors.

[BibT_eX]

[DOI]

Praveen Dongara

Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software, 2003

Pipeline muffling and a priori current ramping: architectural techniques to reduce high-frequency inductive noise.

[BibT_eX]

[DOI]

Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

Pipeline Damping: A Microarchitectural Technique to Reduce Inductive Noise in Supply Voltage.

[BibT_eX]

[DOI]

Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

Iimplicitly-Multithreaded Processors.

[BibT_eX]

[DOI]

Babak Falsafi

Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

Efficient Use of Memory Bandwidth to Improve Network Processor Throughput.

[BibT_eX]

[DOI]

Jahangir Hasan

Satish Chandra

Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

Deterministic Clock Gating for Microprocessor Power Reduction.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

Exploring High Bandwidth Pipelined Cache Architecture for Scaled Technology.

[BibT_eX]

[DOI]

Amit Agarwal

Kaushik Roy

Proceedings of the 2003 Design, 2003

Exploring High Bandwidth Pipelined Cache Architecture for Scaled Technology.

[BibT_eX]

[DOI]

Amit Agarwal

Kaushik Roy

Proceedings of the Embedded Software for SoC, 2003

2002

Reducing register ports for higher speed and lower energy.

[BibT_eX]

[DOI]

Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

Transient-Fault Recovery Using Simultaneous Multithreading.

[BibT_eX]

[DOI]

Irith Pomeranz

Karl Cheng

Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), 2002

Exploiting Choice in Resizable Cache Design to Optimize Deep-Submicron Processor Energy-Delay.

[BibT_eX]

[DOI]

Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002

2001

Reducing leakage in a high-performance deep-submicron instruction cache.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2001

Speculative Versioning Cache.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2001

Reference idempotency analysis: a framework for optimizing speculative execution.

[BibT_eX]

[DOI]

Proceedings of the 2001 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP'01), 2001

Reducing set-associative cache energy via way-prediction and selective direct-mapping.

[BibT_eX]

[DOI]

Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001

Skipper: a microarchitecture for exploiting control-flow independence.

[BibT_eX]

[DOI]

Chen-Yong Cher

Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001

Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor.

[BibT_eX]

[DOI]

Proceedings of the 15th international conference on Supercomputing, 2001

An Integrated Circuit/Architecture Approach to Reducing Leakage in Deep-Submicron High-Performance I-Caches.

[BibT_eX]

[DOI]

Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001

Reactive-Associative Caches.

[BibT_eX]

[DOI]

Brannon Batson

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), 2001

2000

Gated-V<sub>dd</sub>: a circuit technique to reduce leakage in deep-submicron cache memories

[BibT_eX]

[DOI]

Proceedings of the 2000 International Symposium on Low Power Electronics and Design, 2000

1999

Task Selection for the Multiscalar Architecture.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1999

Is SC + ILP=RC?

[BibT_eX]

[DOI]

Chris Gniady

Babak Falsafi

Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999

1998

Task Selection for a Multiscalar Processor.

[BibT_eX]

[DOI]

Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998

1997

Dynamic Speculation and Synchronization of Data Dependences.

[BibT_eX]

[DOI]

Proceedings of the 24th International Symposium on Computer Architecture, 1997

1995

Multiscalar Processors.

[BibT_eX]

[DOI]

Scott E. Breach

Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995

1994

The anatomy of the register file in a multiscalar processor.

[BibT_eX]

[DOI]

Scott E. Breach