Marc González

Orcid: 0000-0002-3780-1106

  • Oak Ridge National Laboratory, TN, USA
  • Polytechnic University of Catalonia (UPC), Computer Architecture Department, Barcelona, Spain

According to our database1, Marc González authored at least 55 papers between 1997 and 2024.

Collaborative distances:



In proceedings 
PhD thesis 


Online presence:



Compute units in OpenMP: Extensions for heterogeneous parallel programming.
Concurr. Comput. Pract. Exp., 2024

eCC++ : A Compiler Construction Framework for Embedded Domain-Specific Languages.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

sKokkos: Enabling Kokkos with Transparent Device Selection on Heterogeneous Systems using OpenACC.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2024

Heterogeneous programming using OpenMP and CUDA/HIP for hybrid CPU-GPU scientific applications.
Int. J. High Perform. Comput. Appl., September, 2023

Abisko: Deep codesign of an architecture for spiking neural networks using novel neuromorphic materials.
Int. J. High Perform. Comput. Appl., July, 2023

Evaluating performance and portability of high-level programming models: Julia, Python/Numba, and Kokkos on exascale nodes.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

KokkACC: Enhancing Kokkos with OpenACC.
Proceedings of the 9th Workshop on Accelerator Programming Using Directives, 2022

Multi-GPU Parallelization of the NAS Multi-Zone Parallel Benchmarks.
IEEE Trans. Parallel Distributed Syst., 2021

Multi-GPU systems and Unified Virtual Memory for scientific applications: The case of the NAS multi-zone parallel benchmarks.
J. Parallel Distributed Comput., 2021

Coarse grain parallelization of deep neural networks.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

Hardware-Software Coherence Protocol for the Coexistence of Caches and Local Memories.
IEEE Trans. Computers, 2015

Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

A Methodology to Build Models and Predict Performance-Power in CMPs.
Proceedings of the 44th International Conference on Parallel Processing Workshops, 2015

User experience on heterogenous Liquid Galaxy cluster display walls.
Proceedings of the Proceeding of IEEE International Symposium on a World of Wireless, 2014

A Systematic Methodology to Generate Decomposable and Responsive Power Models for CMPs.
IEEE Trans. Computers, 2013

Counter-Based Power Modeling Methods: Top-Down vs. Bottom-Up.
Comput. J., 2013

DMA++: On the Fly Data Realignment for On-Chip Memories.
IEEE Trans. Computers, 2012

Energy accounting for shared virtualized environments under DVFS using PMC-based power models.
Future Gener. Comput. Syst., 2012

POTRA: a framework for building power models for next generation multicore architectures.
Proceedings of the ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, 2012

Systematic Energy Characterization of CMP/SMT Processor Systems via Automated Micro-Benchmarks.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

DMA-circular: an enhanced high level programmable DMA controller for optimized management of on-chip local memories.
Proceedings of the Computing Frontiers Conference, CF'12, 2012

Local Memory Design Space Exploration for High-Performance Computing.
Comput. J., 2011

Design space exploration for aggressive core replication schemes in CMPs.
Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing, 2011

Automatic Prefetch and Modulo Scheduling Transformations for the Cell BE Architecture.
IEEE Trans. Parallel Distributed Syst., 2010

Extending OpenMP to Survive the Heterogeneous Multi-Core Era.
Int. J. Parallel Program., 2010

Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL.
Proceedings of the Languages and Compilers for Parallel Computing, 2010

Decomposable and responsive power models for multicore processors using performance counters.
Proceedings of the 24th International Conference on Supercomputing, 2010

Analysis of Task Offloading for Accelerators.
Proceedings of the High Performance Embedded Architectures and Compilers, 2010

Accurate energy accounting for shared virtualized environments using PMC-based power modeling techniques.
Proceedings of the 2010 11th IEEE/ACM International Conference on Grid Computing, 2010

Achieving high memory performance from heterogeneous architectures with the SARC programming model.
Proceedings of the 10th workshop on MEmory performance, 2009

Adaptive and Speculative Memory Consistency Support for Multi-core Architectures with On-Chip Local Memories.
Proceedings of the Languages and Compilers for Parallel Computing, 2009

A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures.
Proceedings of the Evolving OpenMP in an Age of Extreme Parallelism, 2009

Speeding Up Distributed MapReduce Applications Using Hardware Accelerators.
Proceedings of the ICPP 2009, 2009

Evaluation of memory performance on the cell BE with the SARC programming model.
Proceedings of the 9th workshop on MEmory performance, 2008

Automatic Pre-Fetch and Modulo Scheduling Transformations for the Cell BE Architecture.
Proceedings of the Languages and Compilers for Parallel Computing, 2008

Prefetching irregular references for software cache on cell.
Proceedings of the Sixth International Symposium on Code Generation and Optimization (CGO 2008), 2008

Hybrid access-specific software cache techniques for the cell BE architecture.
Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

A Proposal for Error Handling in OpenMP.
Int. J. Parallel Program., 2007

A Novel Asynchronous Software Cache Implementation for the Cell-BE Processor.
Proceedings of the Languages and Compilers for Parallel Computing, 2007

Employing nested OpenMP for the parallelization of multi-zone computational fluid dynamics applications.
J. Parallel Distributed Comput., 2006

Runtime Address Space Computation for SDSM Systems.
Proceedings of the Languages and Compilers for Parallel Computing, 2006

Techniques supporting threadprivate in OpenMP.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Experiences Parallelizing a Web Server with OpenMP.
Proceedings of the OpenMP Shared Memory Parallel Programming - International Workshops, 2005

Automatic thread distribution for nested parallelism in OpenMP.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

Automatic multilevel parallelization using OpenMP.
Sci. Program., 2003

Dual-Level Parallelism Exploitation with OpenMP in Coastal Ocean Circulation Modeling.
Proceedings of the High Performance Computing, 4th International Symposium, 2002

Defining and Supporting Pipelined Executions in OpenMP.
Proceedings of the OpenMP Shared Memory Parallel Programming, 2001

Complex Pipelined Executions in OpenMP Parallel Applications.
Proceedings of the 2001 International Conference on Parallel Processing, 2001

NanosCompiler: supporting flexible multilevel parallelism exploitation in OpenMP.
Concurr. Pract. Exp., 2000

OpenMP Extensions for Thread Groups and Their Run-Time Support.
Proceedings of the Languages and Compilers for Parallel Computing, 2000

Applying Interposition Techniques for Performance Analysis of OpenMP Parallel Applications.
Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors.
Proceedings of the 13th international conference on Supercomputing, 1999

Exploiting Multiple Levels of Parallelism in OpenMP: A Case Study.
Proceedings of the International Conference on Parallel Processing 1999, 1999

Exploiting Parallelism Through Directives on the Nano-Threads Programming Model.
Proceedings of the Languages and Compilers for Parallel Computing, 1997
