Manuel F. Dolz

Orcid: 0000-0001-9466-3398

Affiliations:
  • Charles III University of Madrid


According to our database1, Manuel F. Dolz authored at least 82 papers between 2010 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Automatic generation of ARM NEON micro-kernels for matrix multiplication.
J. Supercomput., July, 2024

Urban sound classification using neural networks on embedded FPGAs.
J. Supercomput., June, 2024

Optimizing Convolutions for Deep Learning Inference on ARM Cortex-M Processors.
IEEE Internet Things J., 2024

2023
Efficient and portable Winograd convolutions for multi-core processors.
J. Supercomput., July, 2023

Performance-energy trade-offs of deep learning convolution algorithms on ARM processors.
J. Supercomput., June, 2023

Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks.
Computing, May, 2023

Using machine learning to model the training scalability of convolutional neural networks on clusters of GPUs.
Computing, May, 2023

Reformulating the direct convolution for high-performance deep learning inference on ARM processors.
J. Syst. Archit., February, 2023

GreenLightningAI: An Efficient AI System with Decoupled Structural and Quantitative Knowledge.
CoRR, 2023

2022
BestOf: an online implementation selector for the training and inference of deep neural networks.
J. Supercomput., 2022

High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS.
J. Syst. Archit., 2022

Efficient and portable GEMM-based convolution operators for deep neural network training on multicore processors.
J. Parallel Distributed Comput., 2022

Convolution Operators for Deep Learning Inference on the Fujitsu A64FX Processor.
Proceedings of the 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2022

Towards Portable Realizations of Winograd-based Convolution with Vector Intrinsics and OpenMP.
Proceedings of the 30th Euromicro International Conference on Parallel, 2022

2021
PyDTNN: A user-friendly and extensible framework for distributed deep learning.
J. Supercomput., 2021

Convolutional neural nets for estimating the run time and energy consumption of the sparse matrix-vector product.
Int. J. High Perform. Comput. Appl., 2021

Acoustic Echo Cancellation using Residual U-Nets.
CoRR, 2021

High performance and energy efficient inference for deep learning on ARM processors.
CoRR, 2021

Evaluation of MPI Allreduce for Distributed Training of Convolutional Neural Networks.
Proceedings of the 29th Euromicro International Conference on Parallel, 2021

Performance Modeling for Distributed Training of Convolutional Neural Networks.
Proceedings of the 29th Euromicro International Conference on Parallel, 2021

A Flexible Research-Oriented Framework for Distributed Training of Deep Neural Networks.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

2020
Detecting semantic violations of lock-free data structures through C++ contracts.
J. Supercomput., 2020

Performance modeling of the sparse matrix-vector product via convolutional neural networks.
J. Supercomput., 2020

High Performance and Portable Convolution Operators for ARM-based Multicore Processors.
CoRR, 2020

A pipeline for the QR update in digital signal processing.
Comput. Math. Methods, 2020

High Performance and Portable Convolution Operators for Multicore Processors.
Proceedings of the 32nd IEEE International Symposium on Computer Architecture and High Performance Computing, 2020

2019
A similarity study of I/O traces via string kernels.
J. Supercomput., 2019

A pipeline structure for the block QR update in digital signal processing.
J. Supercomput., 2019

Hybrid static-dynamic selection of implementation alternatives in heterogeneous environments.
J. Supercomput., 2019

Exploring stream parallel patterns in distributed MPI environments.
Parallel Comput., 2019

Analysis of model parallelism for distributed neural networks.
Proceedings of the 26th European MPI Users' Group Meeting, 2019

Theoretical Scalability Analysis of Distributed Deep Convolutional Neural Networks.
Proceedings of the 19th IEEE/ACM International Symposium on Cluster, 2019

2018
Understanding hardware and software metrics with respect to power consumption.
Sustain. Comput. Informatics Syst., 2018

Energy monitoring as an essential building block towards sustainable ultrascale systems.
Sustain. Comput. Informatics Syst., 2018

Finding parallel patterns through static analysis in C++ applications.
Int. J. High Perform. Comput. Appl., 2018

An adaptive offline implementation selector for heterogeneous parallel platforms.
Int. J. High Perform. Comput. Appl., 2018

Paving the way towards high-level parallel pattern interfaces for data stream processing.
Future Gener. Comput. Syst., 2018

Towards Automatic Parallelization of Stream Processing Applications.
IEEE Access, 2018

Supporting MPI-distributed stream parallel patterns in GrPPI.
Proceedings of the 25th European MPI Users' Group Meeting, 2018

Parallelizing and Optimizing LHCb-Kalman for Intel Xeon Phi KNL Processors.
Proceedings of the 26th Euromicro International Conference on Parallel, 2018

Comparison of Clang Abstract Syntax Trees using String Kernels.
Proceedings of the 2018 International Conference on High Performance Computing & Simulation, 2018

2017
Adapting concurrency throttling and voltage-frequency scaling for dense eigensolvers.
J. Supercomput., 2017

Enabling semantics to improve detection of data races and misuses of lock-free data structures.
Concurr. Comput. Pract. Exp., 2017

A generic parallel pattern interface for stream and data processing.
Concurr. Comput. Pract. Exp., 2017

A Novel String Representation and Kernel Function for the Comparison of I/O Access Patterns.
Proceedings of the Parallel Computing Technologies, 2017

Probabilistic-Based Selection of Alternate Implementations for Heterogeneous Platforms.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2017

Supporting Advanced Patterns in GrPPI, a Generic Parallel Pattern Interface.
Proceedings of the Euro-Par 2017: Parallel Processing Workshops, 2017

2016
Analyzing the energy consumption of the storage data path.
J. Supercomput., 2016

An analytical methodology to derive power models based on hardware and software metrics.
Comput. Sci. Res. Dev., 2016

CID: A Compile-Time Implementation Decider for Heterogeneous Platforms Based on C++ Attributes.
Proceedings of the 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, 2016

Embedding Semantics of the Single-Producer/Single-Consumer Lock-Free Queue into a Race Detection Tool.
Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, 2016

Discovering Pipeline Parallel Patterns in Sequential Legacy C++ Codes.
Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, 2016

Porting Matlab Applications to High-Performance C++ Codes: CPU/GPU-Accelerated Spherical Deconvolution of Diffusion MRI Data.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2016

A C++ Generic Parallel Pattern Interface for Stream Processing.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2016

2015
Evaluating the performance and energy efficiency of the COSMO-ART model system.
Comput. Sci. Res. Dev., 2015

Are our dense linear algebra libraries energy-friendly?
Comput. Sci. Res. Dev., 2015

Balancing task- and data-level parallelism to improve performance and energy consumption of matrix computations on the Intel Xeon Phi.
Comput. Electr. Eng., 2015

ARDUPOWER: A low-cost wattmeter to improve energy efficiency of HPC applications.
Proceedings of the Sixth International Green and Sustainable Computing Conference, 2015

2014
Energy-aware matrix computacion on multirhreaded architectures.
PhD thesis, 2014

Assessing Power Monitoring Approaches for Energy and Power Analysis of Computers.
Sustain. Comput. Informatics Syst., 2014

Tools and methods for measuring and tuning the energy efficiency of HPC systems.
Sci. Program., 2014

Block pivoting implementation of a symmetric Toeplitz solver.
J. Parallel Distributed Comput., 2014

Automatic detection of power bottlenecks in parallel scientific applications.
Comput. Sci. Res. Dev., 2014

Modeling power and energy of the task-parallel Cholesky factorization on multicore processors.
Comput. Sci. Res. Dev., 2014

Modeling power and energy consumption of dense matrix factorizations on multicore processors.
Concurr. Comput. Pract. Exp., 2014

Enhancing performance and energy consumption of runtime schedulers for dense linear algebra.
Concurr. Comput. Pract. Exp., 2014

Assessing the impact of the CPU power-saving modes on the task-parallel solution of sparse linear systems.
Clust. Comput., 2014

Evaluating Lustre's metadata server on a multi-socket platform.
Proceedings of the 9th Parallel Data Storage Workshop, 2014

2013
Energy-efficient execution of dense linear algebra algorithms on multi-core processors.
Clust. Comput., 2013

Solving Some Mysteries in Power Monitoring of Servers: Take Care of Your Wattmeters!
Proceedings of the Energy Efficiency in Large Scale Distributed Systems, 2013

Runtime Scheduling of the LU Factorization: Performance and Energy.
Proceedings of the Energy Efficiency in Large Scale Distributed Systems, 2013

2012
A simulator to assess energy saving strategies and policies in HPC workloads.
ACM SIGOPS Oper. Syst. Rev., 2012

DVFS-control techniques for dense linear algebra operations on multi-core processors.
Comput. Sci. Res. Dev., 2012

Saving Energy in the LU Factorization with Partial Pivoting on Multi-core Processors.
Proceedings of the 20th Euromicro International Conference on Parallel, 2012

Binding Performance and Power of Dense Linear Algebra Operations.
Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, 2012

Reducing Energy Consumption of Dense Linear Algebra Operations on Hybrid CPU-GPU Platforms.
Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, 2012

Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners.
Proceedings of the ICT as Key Technology against Global Warming, 2012

Tools for Power-Energy Modelling and Analysis of Parallel Scientific Applications.
Proceedings of the 41st International Conference on Parallel Processing, 2012

2011
Power-aware Dense Linear Algebra Implementations on Multi-core and Many-core Processors.
Proceedings of the 3rd Many-core Applications Research Community (MARC) Symposium. Proceedings of the 3rd MARC Symposium, 2011

Evaluation of the Energy Performance of Dense Linear Algebra Kernels on Multi-core and Many-Core Processors.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Improving power efficiency of dense linear algebra algorithms on multi-core processors via slack control.
Proceedings of the 2011 International Conference on High Performance Computing & Simulation, 2011

2010
EnergySaving Cluster Roll: Power Saving System for Clusters.
Proceedings of the Architecture of Computing Systems, 2010


  Loading...