Practical nonvolatile multilevel-cell phase change memory.
Proceedings of the International Conference for High Performance Computing, 2013
Presto: distributed machine learning and graph processing with sparse matrices.
Proceedings of the Eighth Eurosys Conference 2013, 2013
On a Technique for Transparently Empowering Classical Compiler Optimizations on Multithreaded Code.
ACM Trans. Program. Lang. Syst., 2012
Improving System Energy Efficiency with Memory Rank Subsetting.
ACM Trans. Archit. Code Optim., 2012
Using R for Iterative and Incremental Processing.
Proceedings of the 4th USENIX Workshop on Hot Topics in Cloud Computing, 2012
Exploring latency-power tradeoffs in deep nonvolatile memory hierarchies.
Proceedings of the Computing Frontiers Conference, CF'12, 2012
A technique for the effective and automatic reuse of classical compiler optimizations on multithreaded code.
Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2011
The runtime abort graph and its application to software transactional memory optimization.
Proceedings of the CGO 2011, 2011
Multicore DIMM: an Energy Efficient Memory Module with Independently Controlled DRAMs.
IEEE Comput. Archit. Lett., 2009
Future scaling of processor-memory interfaces.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009
HyperX: topology, routing, and packaging of efficient large-scale networks.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009
A Nanophotonic Interconnect for High-Performance Many-Core Computation.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 16th Annual IEEE Symposium on High Performance Interconnects (HOTI 2008), 2008
A Preliminary Evaluation of HPF.
Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997
Stability of block <i>LU</i> factorization.
Numer. Linear Algebra Appl., 1995
Fast Polar Decomposition of an Arbitrary Matrix.
SIAM J. Sci. Comput., 1990