Nadathur Satish

Proceedings of the International Conference for High Performance Computing, 2017

Galactos: computing the anisotropic 3-point correlation function for 2 billion galaxies.

[BibT_eX]

[DOI]

Brian Friesen

Proceedings of the International Conference for High Performance Computing, 2017

Banshee: bandwidth-efficient DRAM caching via software/hardware cooperation.

[BibT_eX]

[DOI]

Xiangyao Yu

Christopher J. Hughes

Onur Mutlu

Srinivas Devadas

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

2016

Efficient Approximation Algorithms for Weighted b-Matching.

[BibT_eX]

[DOI]

Arif M. Khan

Alex Pothen

Fredrik Manne

Mahantesh Halappanavar

SIAM J. Sci. Comput., 2016

BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies.

[BibT_eX]

[DOI]

Shihao Ji

S. V. N. Vishwanathan

Proceedings of the 4th International Conference on Learning Representations, 2016

Parallelizing Word2Vec in Multi-Core and Many-Core Architectures.

[BibT_eX]

[DOI]

CoRR, 2016

Designing scalable <i>b</i>-Matching algorithms on distributed memory multiprocessors by approximation.

[BibT_eX]

[DOI]

Arif M. Khan

Alex Pothen

Mahantesh Halappanavar

Proceedings of the International Conference for High Performance Computing, 2016

Graphicionado: A high-performance and energy-efficient accelerator for graph analytics.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

High Performance Parallel Stochastic Gradient Descent in Shared Memory.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

GraphPad: Optimized Graph Primitives for Parallel and Distributed Platforms.

[BibT_eX]

[DOI]

Theodore L. Willke

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Data tiering in heterogeneous memory systems.

[BibT_eX]

[DOI]

Proceedings of the Eleventh European Conference on Computer Systems, 2016

2015

GraphMat: High performance graph analytics made productive.

[BibT_eX]

[DOI]

Subramanya Dulloor

Satya Gautam Vadlamudi

Dipankar Das

Proc. VLDB Endow., 2015

GraphMat: High performance graph analytics made productive.

[BibT_eX]

[DOI]

Subramanya Dulloor

Satya Gautam Vadlamudi

Dipankar Das

CoRR, 2015

Can traditional programming bridge the ninja performance gap for parallel computing applications?

[BibT_eX]

[DOI]

Commun. ACM, 2015

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms.

[BibT_eX]

[DOI]

Jongsoo Park

Satya Gautam Vadlamudi

Proceedings of the High Performance Computing - 30th International Conference, 2015

Exploiting NVM in large-scale graph analytics.

[BibT_eX]

[DOI]

Proceedings of the 3rd Workshop on Interactions of NVM/FLASH with Operating Systems and Workloads, 2015

Full correlation matrix analysis of fMRI data on Intel® Xeon Phi™ coprocessors.

[BibT_eX]

[DOI]

Nicholas B. Turk-Browne

Theodore L. Willke

Proceedings of the International Conference for High Performance Computing, 2015

BD-CATS: big data clustering at trillion particle scale.

[BibT_eX]

[DOI]

Surendra Byna

Proceedings of the International Conference for High Performance Computing, 2015

Improving graph partitioning for modern graphs and architectures.

[BibT_eX]

[DOI]

Dominique LaSalle

Proceedings of the 5th Workshop on Irregular Applications - Architectures and Algorithms, 2015

IMP: indirect memory prefetcher.

[BibT_eX]

[DOI]

Xiangyao Yu

Christopher J. Hughes

Srinivas Devadas

Proceedings of the 48th International Symposium on Microarchitecture, 2015

Scalable Bayesian Optimization Using Deep Neural Networks.

[BibT_eX]

[DOI]

Prabhat

Ryan P. Adams

Proceedings of the 32nd International Conference on Machine Learning, 2015

2014

GenBase: a complex analytics genomics benchmark.

[BibT_eX]

[DOI]

Rebecca Taft

Manasi Vartak

Samuel Madden

Michael Stonebraker

Proceedings of the International Conference on Management of Data, 2014

Navigating the maze of graph analytics frameworks using massive graph datasets.

[BibT_eX]

[DOI]

Jiwon Seo

Jongsoo Park

Muhammad Amber Hassaan

Shubho Sengupta

Zhaoming Yin

Proceedings of the International Conference on Management of Data, 2014

Pardicle: Parallel Approximate Density-Based Clustering.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2014

2013

Streaming Similarity Search over one Billion Tweets using Parallel Locality-Sensitive Hashing.

[BibT_eX]

[DOI]

Karthikeyan Sankaralingam

Aizana Turmukhametova

Proc. VLDB Endow., 2013

2012

DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing.

[BibT_eX]

[DOI]

Venkatraman Govindaraju

Changkyu Kim

IEEE Micro, 2012

CloudRAMSort: fast and efficient large-scale distributed RAM sort on shared-nothing cluster.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGMOD International Conference on Management of Data, 2012

Analysis and Optimization of Financial Analytics Benchmark on Modern Multi- and Many-core IA-Based Architectures.

[BibT_eX]

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Large-scale energy-efficient graph traversal: a path to efficient data-intensive supercomputing.

[BibT_eX]

[DOI]

Proceedings of the SC Conference on High Performance Computing Networking, 2012

GPP-Grep: High-Speed Regular Expression Processing Engine on General Purpose Processors.

[BibT_eX]

[DOI]

Proceedings of the Research in Attacks, Intrusions, and Defenses, 2012

Fast and Efficient Graph Traversal Algorithm for CPUs: Maximizing Single-Node Efficiency.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

2011

Designing fast architecture-sensitive tree search on modern multicore/many-core processors.

[BibT_eX]

[DOI]

ACM Trans. Database Syst., 2011

PALM: Parallel Architecture-Friendly Latch-Free Modifications to B+ Trees on Many-Core Processors.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2011

Fast Updates on Read-Optimized Databases Using Multi-Core CPUs.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2011

2010

Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGMOD International Conference on Management of Data, 2010

FAST: fast architecture sensitive tree search on modern CPUs and GPUs.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGMOD International Conference on Management of Data, 2010

3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2010

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU.

[BibT_eX]

[DOI]

Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

2009

Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2009

ClearPath: highly parallel collision avoidance for multi-agent simulation.

[BibT_eX]

[DOI]

Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 2009

Interactive Modeling, Simulation and Control of Large-Scale Crowds and Traffic.

[BibT_eX]

[DOI]

Proceedings of the Motion in Games, Second International Workshop, 2009

Designing efficient sorting algorithms for manycore GPUs.

[BibT_eX]

[DOI]

Mark J. Harris

Michael Garland

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Optimizing the use of GPU memory in applications with large data sets.

[BibT_eX]

[DOI]

Kurt Keutzer

Proceedings of the 16th International Conference on High Performance Computing, 2009

2008

Scheduling task dependence graphs with variable task execution times onto heterogeneous multiprocessors.

[BibT_eX]

[DOI]

Kaushik Ravindran

Kurt Keutzer

Proceedings of the 8th ACM & IEEE International conference on Embedded software, 2008

2007

Efficient Parallelization of H.264 Decoding with Macro Block Level Scheduling.

[BibT_eX]

[DOI]

Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

A decomposition-based constraint optimization approach for statically scheduling task graphs with communication delays to multiprocessors.

[BibT_eX]

[DOI]