Adrián Castelló
Orcid: 0000-0002-8576-8451Affiliations:
- Universitat Jaume I de Castello, Spain
According to our database1,
Adrián Castelló
authored at least 53 papers
between 2014 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on linkedin.com
-
on orcid.org
On csauthors.net:
Bibliography
2025
Experience-guided, mixed-precision matrix multiplication with apache TVM for ARM processors.
J. Supercomput., January, 2025
2024
Communication-Avoiding Fusion of GEMM-Based Convolutions for Deep Learning in the RISC-V GAP8 MCU.
IEEE Internet Things J., November, 2024
J. Supercomput., July, 2024
J. Supercomput., June, 2024
Algorithm 1039: Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM.
ACM Trans. Math. Softw., March, 2024
Microprocess. Microsystems, 2024
Parallel GEMM-based convolutions for deep learning on multicore ARM and RISC-V architectures.
J. Syst. Archit., 2024
Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors.
Int. J. High Perform. Comput. Appl., 2024
Proceedings of the Euro-Par 2024: Parallel Processing, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2024
2023
J. Supercomput., July, 2023
Performance-energy trade-offs of deep learning convolution algorithms on ARM processors.
J. Supercomput., June, 2023
J. Supercomput., May, 2023
Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks.
Computing, May, 2023
Using machine learning to model the training scalability of convolutional neural networks on clusters of GPUs.
Computing, May, 2023
Reformulating the direct convolution for high-performance deep learning inference on ARM processors.
J. Syst. Archit., February, 2023
CoRR, 2023
Automatic Generation of Micro-kernels for Performance Portability of Matrix Multiplication on RISC-V Vector Processors.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
2022
A BLIS-like matrix multiplication for machine learning in the RISC-V ISA-based GAP8 processor.
J. Supercomput., 2022
BestOf: an online implementation selector for the training and inference of deep neural networks.
J. Supercomput., 2022
High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS.
J. Syst. Archit., 2022
Proceedings of the High Performance Computing. ISC High Performance 2022 International Workshops - Hamburg, Germany, May 29, 2022
Proceedings of the High Performance Computing. ISC High Performance 2022 International Workshops - Hamburg, Germany, May 29, 2022
Towards Portable Realizations of Winograd-based Convolution with Vector Intrinsics and OpenMP.
Proceedings of the 30th Euromicro International Conference on Parallel, 2022
Proceedings of the 30th Euromicro International Conference on Parallel, 2022
Proceedings of the 25th Euromicro Conference on Digital System Design, 2022
2021
J. Supercomput., 2021
CoRR, 2021
Clust. Comput., 2021
Evaluation of MPI Allreduce for Distributed Training of Convolutional Neural Networks.
Proceedings of the 29th Euromicro International Conference on Parallel, 2021
Proceedings of the 29th Euromicro International Conference on Parallel, 2021
A Flexible Research-Oriented Framework for Distributed Training of Deep Neural Networks.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021
2020
IEEE Trans. Computers, 2020
High Performance and Portable Convolution Operators for ARM-based Multicore Processors.
CoRR, 2020
Clust. Comput., 2020
Proceedings of the 32nd IEEE International Symposium on Computer Architecture and High Performance Computing, 2020
2019
Proceedings of the 26th European MPI Users' Group Meeting, 2019
Proceedings of the 19th IEEE/ACM International Symposium on Cluster, 2019
2018
Unification of Lightweight Thread Solutions and their Application in High Performance Programming.
PhD thesis, 2018
IEEE Trans. Parallel Distributed Syst., 2018
Exploring the interoperability of remote GPGPU virtualization using rCUDA and directive-based programming models.
J. Supercomput., 2018
On the adequacy of lightweight thread approaches for high-level parallel programming models.
Future Gener. Comput. Syst., 2018
2017
Proceedings of the 46th International Conference on Parallel Processing, 2017
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017
2016
Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016
Proceedings of the CLOSER 2016, 2016
2015
Concurr. Comput. Pract. Exp., 2015
Proceedings of the 2015 IEEE TrustCom/BigDataSE/ISPA, 2015
Exploring the Suitability of Remote GPGPU Virtualization for the OpenACC Programming Model Using rCUDA.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015
2014
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014
Boosting the performance of remote GPU virtualization using InfiniBand connect-IB and PCIe 3.0.
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014