Joan-Manuel Parcerisa

Orcid: 0000-0001-5771-8118

According to our database1, Joan-Manuel Parcerisa authored at least 34 papers between 1997 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 


Online presence:



WaSP: Warp Scheduling to Mimic Prefetching in Graphics Workloads.
CoRR, 2024

LIBRA: Memory Bandwidth- and Locality-Aware Parallel Tile Rendering.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

Boustrophedonic Frames: Quasi-Optimal L2 Caching for Textures in GPUs.
Proceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques, 2023

Omega-Test: A Predictive Early-Z Culling to Improve the Graphics Pipeline Energy-Efficiency.
IEEE Trans. Vis. Comput. Graph., 2022

Dynamic sampling rate: harnessing frame coherence in graphics applications for energy-efficient GPUs.
J. Supercomput., 2022

Triangle Dropping: An Occluded-geometry Predictor for Energy-efficient Mobile GPUs.
ACM Trans. Archit. Code Optim., 2022

DTM-NUCA: Dynamic Texture Mapping-NUCA for Energy-Efficient Graphics Rendering.
Proceedings of the 30th Euromicro International Conference on Parallel, 2022

DTexL: Decoupled Raster Pipeline for Texture Locality.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

TCOR: A Tile Cache with Optimal Replacement.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

Visibility Rendering Order: Improving Energy Efficiency on Mobile GPUs through Frame Coherence.
IEEE Trans. Parallel Distributed Syst., 2019

Rendering Elimination: Early Discard of Redundant Tiles in the Graphics Pipeline.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

Early Visibility Resolution for Removing Ineffectual Computations in the Graphics Pipeline.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

An Energy-Efficient Memory Unit for Clustered Microarchitectures.
IEEE Trans. Computers, 2016

Ultra-low power render-based collision detection for CPU/GPU systems.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

Eliminating redundant fragment shader executions on a mobile GPU via hardware memoization.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

TEAPOT: a toolset for evaluating performance, power and image quality on mobile graphics systems.
Proceedings of the International Conference on Supercomputing, 2013

Parallel frame rendering: Trading responsiveness for energy on a mobile GPU.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

Boosting mobile GPU performance with a decoupled access/execute fragment processor.
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

Leveraging Register Windows to Reduce Physical Registers to the Bare Minimum.
IEEE Trans. Computers, 2010

Improving Branch Prediction and Predicated Execution in Out-of-Order Processors.
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

Early Register Release for Out-of-Order Processors with RegisterWindows.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

Selective predicate prediction for out-of-order processors.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

On-Chip Interconnects and Instruction Steering Schemes for Clustered Microarchitectures.
IEEE Trans. Parallel Distributed Syst., 2005

Memory Bank Predictors.
Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005

Design of Clustered Superscalar Microarchitectures.
PhD thesis, 2004

Efficient Interconnects for Clustered Microarchitectures.
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 2002), 2002

Improving Latency Tolerance of Multithreading through Decoupling.
IEEE Trans. Computers, 2001

Dynamic Code Partitioning for Clustered Architectures.
Int. J. Parallel Program., 2001

Reducing wire delay penalty through value prediction.
Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000

Dynamic Cluster Assignment Mechanisms.
Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000

The Synergy of Multithreading and Access/Execute Decoupling.
Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, 1999

A Cost-Effective Clustered Architecture.
Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, 1999

The Latency Hiding Effectiveness of Decoupled Access/Execute Processors.
Proceedings of the 24th EUROMICRO '98 Conference, 1998

Eliminating Cache Conflict Misses through XOR-Based Placement Functions.
Proceedings of the 11th international conference on Supercomputing, 1997
