Diego Andrade

Orcid: 0000-0001-5670-7425

According to our database1, Diego Andrade authored at least 40 papers between 2003 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
STuning-DL: Model-Driven Autotuning of Sparse GPU Kernels for Deep Learning.
IEEE Access, 2024

2023
VENOM: A Vectorized N: M Format for Unleashing the Power of Sparse Tensor Cores.
Proceedings of the International Conference for High Performance Computing, 2023

2022
The New UPC++ DepSpawn High Performance Library for Data-Flow Computing with Hybrid Parallelism.
Proceedings of the Computational Science - ICCS 2022, 2022

Probing the Efficacy of Hardware-Aware Weight Pruning to Optimize the SpMM Routine on Ampere GPUs.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021
A software cache autotuning strategy for dataflow computing with UPC++ DepSpawn.
Comput. Math. Methods, November, 2021

High-performance dataflow computing in hybrid memory systems with UPC++ DepSpawn.
J. Supercomput., 2021

ScalaParBiBit: scaling the binary biclustering in distributed-memory systems.
Clust. Comput., 2021

2020
An automatic optimizer for heterogeneous devices.
Future Gener. Comput. Syst., 2020

Reusing Trained Layers of Convolutional Neural Networks to Shorten Hyperparameters Tuning Time.
CoRR, 2020

A Hybrid Approach for Tracking Individual Players in Broadcast Match Videos.
CoRR, 2020

2019
Easy Dataflow Programming in Clusters with UPC++ DepSpawn.
IEEE Trans. Parallel Distributed Syst., 2019

A Fast Solver for Large Tridiagonal Systems on Multi-Core Processors (Lass Library).
IEEE Access, 2019

2018
Heterogeneous distributed computing based on high-level abstractions.
Concurr. Comput. Pract. Exp., 2018

Guiding the Optimization of Parallel Codes on Multicores Using an Analytical Cache Model.
Proceedings of the Computational Science - ICCS 2018, 2018

2017
High productivity multi-device exploitation with the Heterogeneous Programming Library.
J. Parallel Distributed Comput., 2017

Facilitating the development of stencil applications using the Heterogeneous Programming Library.
Concurr. Comput. Pract. Exp., 2017

2016
Writing a performance-portable matrix multiplication.
Parallel Comput., 2016

Towards a High Level Approach for the Programming of Heterogeneous Clusters.
Proceedings of the 45th International Conference on Parallel Processing Workshops, 2016

2015
Developing adaptive multi-device applications with the Heterogeneous Programming Library.
J. Supercomput., 2015

Automatic Generation of Optimized OpenCL Codes Using OCLoptimizer.
Comput. J., 2015

Improving OpenCL Programmability with the Heterogeneous Programming Library.
Proceedings of the International Conference on Computational Science, 2015

2014
Address independent estimation of the boundaries of cache performance.
Microprocess. Microsystems, 2014

A fine-grained thread-aware management policy for shared caches.
Concurr. Comput. Pract. Exp., 2014

Writing Self-adaptive Codes for Heterogeneous Systems.
Proceedings of the Euro-Par 2014 Parallel Processing, 2014

2013
Numerical simulation of pollutant transport in a shallow-water system on the Cell heterogeneous processor.
J. Supercomput., 2013

Accurate prediction of the behavior of multithreaded applications in shared caches.
Parallel Comput., 2013

OCLoptimizer: An Iterative Optimization Tool for OpenCL.
Proceedings of the International Conference on Computational Science, 2013

2012
Static analysis of the worst-case memory performance for irregular codes with indirections.
ACM Trans. Archit. Code Optim., 2012

Using an Analytical Model of Shared Caches for Selecting the Optimal Parallelization Scheme.
Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, 2012

2011
An efficient parallel set container for multicore architectures.
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011

2010
Address-Independent Estimation of the Worst-case Memory Performance.
IEEE Trans. Ind. Informatics, 2010

2009
Static Prediction of Worst-Case Data Cache Performance in the Absence of Base Address Information.
Proceedings of the 15th IEEE Real-Time and Embedded Technology and Applications Symposium, 2009

Task-Parallel versus Data-Parallel Library-Based Programming in Multicore Systems.
Proceedings of the 17th Euromicro International Conference on Parallel, 2009

2007
Precise automatable analytical modeling of the cache behavior of codes with indirections.
ACM Trans. Archit. Code Optim., 2007

Automated and accurate cache behavior analysis for codes with irregular access patterns.
Concurr. Comput. Pract. Exp., 2007

2006
Analytical modeling of codes with arbitrary data-dependent conditional structures.
J. Syst. Archit., 2006

Cache Behavior Modelling for Codes Involving Banded Matrices.
Proceedings of the Languages and Compilers for Parallel Computing, 2006

2005
Optimal Tile Size Selection Guided by Analytical Models.
Proceedings of the Parallel Computing: Current & Future Issues of High-End Computing, 2005

2004
Modeling the Cache Behavior of Codes with Arbitrary Data-Dependent Conditional Structures.
Proceedings of the Advances in Computer Systems Architecture, 9th Asia-Pacific Conference, 2004

2003
Cache Behavior Modeling of Codes with Data-Dependent Conditionals.
Proceedings of the Software and Compilers for Embedded Systems, 7th International Workshop, 2003


  Loading...