Peng Wu

Affiliations:
  • Meta, USA
  • Huawei Research Lab (former)
  • IBM T.J. Watson Research Center, Yorktown Heights, NY, USA (former)


According to our database1, Peng Wu authored at least 42 papers between 1998 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024

2019
CubeGen: Code Generation for Accelerated GEMM-Based Convolution with Tiling.
Proceedings of the Languages and Compilers for Parallel Computing, 2019

Replayable Execution Optimized for Page Sharing for a Managed Runtime Environment.
Proceedings of the Fourteenth EuroSys Conference 2019, Dresden, Germany, March 25-28, 2019, 2019

2018
SIMDization of Small Tensor Multiplication Kernels for Wide SIMD Vector Processors.
Proceedings of the 4th Workshop on Programming Models for SIMD/Vector Processing, 2018

2015
Software Support and Evaluation of Hardware Transactional Memory on Blue Gene/Q.
IEEE Trans. Computers, 2015

Vectorization of apply to reduce interpretation overhead of R.
Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, 2015

The Future of Programming Languages and Programmers.
Proceedings of the Companion Proceedings of the 2015 ACM SIGPLAN International Conference on Systems, 2015

2014
Simple, portable and fast SIMD intrinsic programming: generic simd library.
Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing, 2014

Optimizing R VM: Allocation Removal and Path Length Reduction via Interpreter-level Specialization.
Proceedings of the 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2014

2013
Optimizing IBM algorithmics' mark-to-future aggregation engine for real-time counterparty credit risk scoring.
Proceedings of WHPCF'13: 6th Workshop on High Performance Computational Finance, 2013

2012
Adaptive multi-level compilation in a trace-based Java JIT compiler.
Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2012

On the benefits and pitfalls of extending a statically typed language JIT compiler for dynamic scripting languages.
Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2012

Evaluation of blue Gene/Q hardware support for transactional memories.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Reducing trace selection footprint for large-scale Java applications without performance loss.
Proceedings of the 26th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2011

A trace-based Java JIT compiler retrofitted from a method-based compiler.
Proceedings of the CGO 2011, 2011

Improving the performance of trace-based systems by false loop filtering.
Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, 2011

2010
A Case for Including Transactions in OpenMP.
Proceedings of the Beyond Loop Level Parallelism in OpenMP: Accelerators, 2010

2009
Compiler and runtime techniques for software transactional memory optimization.
Concurr. Comput. Pract. Exp., 2009

Fastpath Speculative Parallelization.
Proceedings of the Languages and Compilers for Parallel Computing, 2009

Reducing Memory Ordering Overheads in Software Transactional Memory.
Proceedings of the CGO 2009, 2009

2008
Software Transactional Memory: Why Is It Only a Research Toy?
ACM Queue, 2008

Compiler-Driven Dependence Profiling to Guide Program Parallelization.
Proceedings of the Languages and Compilers for Parallel Computing, 2008

2006
Using advanced compiler technology to exploit the performance of the Cell Broadband Engine<sup>TM</sup> architecture.
IBM Syst. J., 2006

Optimizing data permutations for SIMD devices.
Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, 2006

2005
Design and exploitation of a high-performance SIMD floating-point unit for Blue Gene/L.
IBM J. Res. Dev., 2005

An Empirical Study On the Vectorization of Multimedia Applications for Multimedia Extensions.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

An integrated simdization framework using virtual vectors.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

Efficient SIMD Code Generation for Runtime Alignment and Length Conversion.
Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005

Optimizing Compiler for the CELL Processor.
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005

2004
Vectorization for SIMD architectures with alignment constraints.
Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation 2004, 2004

2003
A comparison of empirical and model-driven optimization.
Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation 2003, 2003

A Preliminary Study on the Vectorization of Multimedia Applications for Multimedia Extensions.
Proceedings of the Languages and Compilers for Parallel Computing, 2003

2002
NINJA: Java for high performance numerical computing.
Sci. Program., 2002

Instance-wise points-to analysis for loop-based dependence testing.
Proceedings of the 16th international conference on Supercomputing, 2002

2001
Analyses of Pointers, Induction Variables, and Container Objects for Dependence Testing
PhD thesis, 2001

The NINJA project.
Commun. ACM, 2001

Induction Variable Analysis without Idiom Recognition: Beyond Monotonicity.
Proceedings of the Languages and Compilers for Parallel Computing, 2001

Monotonic evolution: an alternative to induction variable substitution for dependence analysis.
Proceedings of the 15th international conference on Supercomputing, 2001

2000
Containers on the Parallelization of General-Purpose Java Programs.
Int. J. Parallel Program., 2000

1999
Semantic Inlining - the Compiler Support for Java in Technical Computing.
Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999

Efficient Support for Complex Numbers in Java.
Proceedings of the ACM 1999 Conference on Java Grande, JAVA '99, San Francisco, CA, USA, 1999

1998
Beyond Arrays - A Container-Centric Approach for Parallelization of Real-World Symbolic Applications.
Proceedings of the Languages and Compilers for Parallel Computing, 1998


  Loading...