Pradeep Dubey
According to our database1,
Pradeep Dubey
authored at least 116 papers
between 1979 and 2023.
Collaborative distances:
Collaborative distances:
Awards
ACM Fellow
ACM Fellow 2023, "For contributions to emerging compute- and data-intensive applications and parallel processing computer architectures".
IEEE Fellow
IEEE Fellow 2001, "For contributions to computer architecture supporting multimedia processing.".
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2023
Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2023
2022
2020
SuSy: A Programming Model for Productive Construction of High-Performance Systolic Arrays on FPGAs.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020
2019
IEEE Trans. Parallel Distributed Syst., 2019
T2S-Tensor: Productively Generating High-Performance Spatial Hardware for Dense Tensor Computations.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019
2018
Proceedings of the 6th International Conference on Learning Representations, 2018
2017
Deep learning at 15PF: supervised and semi-supervised classification for scientific data.
Proceedings of the International Conference for High Performance Computing, 2017
Galactos: computing the anisotropic 3-point correlation function for 2 billion galaxies.
Proceedings of the International Conference for High Performance Computing, 2017
Proceedings of the 2017 ACM on International Symposium on Physical Design, 2017
ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017
Proceedings of the 5th International Conference on Learning Representations, 2017
2016
Full-Stack Architecting to Achieve a Billion-Requests-Per-Second Throughput on a Single Key-Value Store Server Platform.
ACM Trans. Comput. Syst., 2016
SIAM J. Sci. Comput., 2016
IEEE Micro, 2016
Optimizations in a high-performance conjugate gradient benchmark for IA-based multi- and many-core processors.
Int. J. High Perform. Comput. Appl., 2016
Int. J. High Perform. Comput. Appl., 2016
Int. J. Game Theory, 2016
BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies.
Proceedings of the 4th International Conference on Learning Representations, 2016
Proceedings of the High Performance Computing - 31st International Conference, 2016
Designing scalable <i>b</i>-Matching algorithms on distributed memory multiprocessors by approximation.
Proceedings of the International Conference for High Performance Computing, 2016
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016
2015
Beacon: Deployment and Application of Intel Xeon Phi Coprocessorsfor Scientific Computing.
Comput. Sci. Eng., 2015
Can traditional programming bridge the ninja performance gap for parallel computing applications?
Commun. ACM, 2015
Proceedings of the High Performance Computing - 30th International Conference, 2015
Proceedings of the International Conference for High Performance Computing, 2015
High-performance algebraic multigrid solver optimized for multi-core based distributed parallel systems.
Proceedings of the International Conference for High Performance Computing, 2015
Proceedings of the 5th Workshop on Irregular Applications - Architectures and Algorithms, 2015
Architecting to achieve a billion requests per second throughput on a single key-value store server platform.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015
Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015
2014
Sparsifying Synchronization for High-Performance Shared-Memory Sparse Triangular Solver.
Proceedings of the Supercomputing - 29th International Conference, 2014
Proceedings of the International Conference on Management of Data, 2014
Proceedings of the International Conference for High Performance Computing, 2014
Efficient Shared-Memory Implementation of High-Performance Conjugate Gradient Benchmark and its Application to Unstructured Matrices.
Proceedings of the International Conference for High Performance Computing, 2014
Proceedings of the International Conference for High Performance Computing, 2014
Petascale High Order Dynamic Rupture Earthquake Simulations on Heterogeneous Supercomputers.
Proceedings of the International Conference for High Performance Computing, 2014
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014
Improving Communication Performance and Scalability of Native Applications on Intel Xeon Phi Coprocessor Clusters.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
2013
SIGMOD Rec., 2013
Streaming Similarity Search over one Billion Tweets using Parallel Locality-Sensitive Hashing.
Proc. VLDB Endow., 2013
Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013
Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors.
Proceedings of the International Conference for High Performance Computing, 2013
Design and Implementation of the Linpack Benchmark for Single and Multi-node Systems Based on Intel® Xeon Phi Coprocessor.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013
Proceedings of the International Conference on Supercomputing, 2013
2012
ACM Trans. Graph., 2012
CloudRAMSort: fast and efficient large-scale distributed RAM sort on shared-nothing cluster.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2012
Proceedings of the SC Conference on High Performance Computing Networking, 2012
Analysis and Optimization of Financial Analytics Benchmark on Modern Multi- and Many-core IA-Based Architectures.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Large-scale energy-efficient graph traversal: a path to efficient data-intensive supercomputing.
Proceedings of the SC Conference on High Performance Computing Networking, 2012
Billion-particle SIMD-friendly two-point correlation on large-scale HPC cluster systems.
Proceedings of the SC Conference on High Performance Computing Networking, 2012
GPP-Grep: High-Speed Regular Expression Processing Engine on General Purpose Processors.
Proceedings of the Research in Attacks, Intrusions, and Defenses, 2012
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012
Fast and Efficient Graph Traversal Algorithm for CPUs: Maximizing Single-Node Efficiency.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012
2011
Designing fast architecture-sensitive tree search on modern multicore/many-core processors.
ACM Trans. Database Syst., 2011
PALM: Parallel Architecture-Friendly Latch-Free Modifications to B+ Trees on Many-Core Processors.
Proc. VLDB Endow., 2011
Proc. VLDB Endow., 2011
High-Performance 3D Compressive Sensing MRI Reconstruction Using Many-Core Architectures.
Int. J. Biomed. Imaging, 2011
Comput. Sci. Res. Dev., 2011
Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2011
Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2011
High-performance lattice QCD for multi-core based parallel systems using a cache-friendly hybrid threaded-MPI approach.
Proceedings of the Conference on High Performance Computing Networking, 2011
2010
Games Econ. Behav., 2010
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2010
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2010
Proceedings of the 2010 Eurographics/ACM SIGGRAPH Symposium on Computer Animation, 2010
Proceedings of the Conference on High Performance Computing Networking, 2010
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010
2009
Mapping High-Fidelity Volume Rendering for Medical Imaging to CPU, GPU and Many-Core Architectures.
IEEE Trans. Vis. Comput. Graph., 2009
Proc. VLDB Endow., 2009
Games Econ. Behav., 2009
Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 2009
Proceedings of the Motion in Games, Second International Workshop, 2009
2008
Proc. VLDB Endow., 2008
Proc. IEEE, 2008
2007
VLDB J., 2007
Scaling performance of interior-point method on large-scale chip multiprocessor system.
Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007
2006
Proceedings of the Internet and Network Economics, Second International Workshop, 2006
Proceedings of the Internet and Network Economics, Second International Workshop, 2006
2005
Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30, 2005
Proceedings of the Workshop on Data Management on New Hardware, 2005
2004
2003
2000
Proceedings of the 2000 International Conference on Image Processing, 2000
1997
Proceedings of the Fourth International on High-Performance Computing, 1997
1986
1984
Math. Program., 1984
1981
Math. Oper. Res., 1981
1980
1979