Russell Ford

Sundeep Rangan

Proceedings of the 48th Asilomar Conference on Signals, Systems and Computers, 2014

2013

Joint Interference and User Association Optimization in Cellular Wireless Networks

[DOI]

CoRR, 2013

Locality-aware task management for unstructured parallelism: a quantitative limit study.

[DOI]

Richard M. Yoo

Yen-Kuang Chen

Christos Kozyrakis

Proceedings of the 25th ACM Symposium on Parallelism in Algorithms and Architectures, 2013

Opportunistic third-party backhaul for cellular wireless networks.

[DOI]

Russell Ford

Sundeep Rangan

Proceedings of the 2013 Asilomar Conference on Signals, 2013

2012

DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing.

[DOI]

Venkatraman Govindaraju

IEEE Micro, 2012

CloudRAMSort: fast and efficient large-scale distributed RAM sort on shared-nothing cluster.

[DOI]

Proceedings of the ACM SIGMOD International Conference on Management of Data, 2012

Large-scale energy-efficient graph traversal: a path to efficient data-intensive supercomputing.

[DOI]

Proceedings of the SC Conference on High Performance Computing Networking, 2012

Billion-particle SIMD-friendly two-point correlation on large-scale HPC cluster systems.

[DOI]

Proceedings of the SC Conference on High Performance Computing Networking, 2012

GPP-Grep: High-Speed Regular Expression Processing Engine on General Purpose Processors.

[DOI]

Proceedings of the Research in Attacks, Intrusions, and Defenses, 2012

Fast and Efficient Graph Traversal Algorithm for CPUs: Maximizing Single-Node Efficiency.

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

2011

Designing fast architecture-sensitive tree search on modern multicore/many-core processors.

[DOI]

ACM Trans. Database Syst., 2011

PALM: Parallel Architecture-Friendly Latch-Free Modifications to B+ Trees on Many-Core Processors.

[DOI]

Proc. VLDB Endow., 2011

Fast Updates on Read-Optimized Databases Using Multi-Core CPUs.

[DOI]

Proc. VLDB Endow., 2011

Moguls: a model to explore the memory hierarchy for bandwidth improvements.

[DOI]

Guangyu Sun

Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

2010

Performance and Energy Implications of Many-Core Caches for Throughput Computing.

[DOI]

Yen-Kuang Chen

IEEE Micro, 2010

Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort.

[DOI]

Proceedings of the ACM SIGMOD International Conference on Management of Data, 2010

FAST: fast architecture sensitive tree search on modern CPUs and GPUs.

[DOI]

Proceedings of the ACM SIGMOD International Conference on Management of Data, 2010

3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs.

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2010

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU.

[DOI]

Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

2009

Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs.

[DOI]

Proc. VLDB Endow., 2009

ClearPath: highly parallel collision avoidance for multi-agent simulation.

[DOI]

Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 2009

Interactive Modeling, Simulation and Control of Large-Scale Crowds and Traffic.

[DOI]

Proceedings of the Motion in Games, Second International Workshop, 2009

Efficient shared cache management through sharing-aware replacement and streaming-aware insertion policy.

[DOI]

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

2008

Multitasking workload scheduling on flexible core chip multiprocessors.

[DOI]

SIGARCH Comput. Archit. News, 2008

Second Life and the New Generation of Virtual Worlds.

[DOI]

Computer, 2008

Atomic Vector Operations on Chip Multiprocessors.

[DOI]

Victor W. Lee

Anthony D. Nguyen

Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

2007

A NUCA Substrate for Flexible CMP Cache Sharing.

[DOI]

IEEE Trans. Parallel Distributed Syst., 2007

On-Chip Interconnection Networks of the TRIPS Chip.

[DOI]

Paul Gratz

Heather Hanson

Premkishore Shivakumar

IEEE Micro, 2007

Composable Lightweight Processors.

[DOI]

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

2006

Distributed Microarchitectural Protocols in the TRIPS Prototype Processor.

[DOI]

Premkishore Shivakumar

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

Implementation and Evaluation of On-Chip Network Architectures.

[DOI]

Proceedings of the 24th International Conference on Computer Design (ICCD 2006), 2006

2004

TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP.

[DOI]

ACM Trans. Archit. Code Optim., 2004

2003

Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture.

[DOI]

IEEE Micro, 2003

Nonuniform Cache Architectures for Wire-Delay Dominated On-Chip Caches.

[DOI]

IEEE Micro, 2003

2002

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches.

[DOI]