An evolutionary framework for automatic and guided discovery of algorithms.
Proceedings of the 17th ACM International Conference on Computing Frontiers, 2020
Large-scale GW calculations on pre-exascale HPC systems.
Comput. Phys. Commun., 2019
A Framework for Auto-Parallelization and Code Generation: An Integrative Case Study with Legacy FORTRAN Codes.
Proceedings of the 47th International Conference on Parallel Processing, 2018
An efficient and portable SIMD algorithm for charge/current deposition in Particle-In-Cell codes.
Comput. Phys. Commun., 2017
Optimizing Excited-State Electronic-Structure Codes for Intel Knights Landing: A Case Study on the BerkeleyGW Software.
Proceedings of the High Performance Computing, 2016
Enhancing application performance using heterogeneous memory architectures on a many-core platform.
Proceedings of the International Conference on High Performance Computing & Simulation, 2016
Bridging the FPGA programmability-portability Gap via automatic OpenCL code generation and tuning.
Proceedings of the 27th IEEE International Conference on Application-specific Systems, 2016
GLAF: A Visual Programming and Auto-tuning Framework for Parallel Computing.
Proceedings of the 44th International Conference on Parallel Processing, 2015
ALP: Efficient support for all levels of parallelism for complex media applications.
ACM Trans. Archit. Code Optim., 2007
Energy Efficient Support for All Levels of Parallelism for Complex Media Applications
PhD thesis, 2005
The energy efficiency of CMP vs. SMT for multimedia workloads.
Proceedings of the 18th Annual International Conference on Supercomputing, 2004
Joint local and global hardware adaptations for energy.
Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2002