Gabriel Falcão Paiva Fernandes

Orcid: 0000-0001-9805-6747

  • University of Coimbra, Department of Electrical and Computer Engineering, Portugal

According to our database1, Gabriel Falcão Paiva Fernandes authored at least 104 papers between 2006 and 2025.

Collaborative distances:



In proceedings 
PhD thesis 


Online presence:



A Systematic Mapping Study on Quantum and Quantum-inspired Algorithms in Operations Research.
ACM Comput. Surv., March, 2025

Functional Validation of the RISC-V Unlimited Vector Extension.
IEEE Embed. Syst. Lett., February, 2025

Special Issue on The Past, Present, and Future of Warehouse-Scale Computing.
IEEE Micro, 2024

gem5-accel: A Pre-RTL Simulation Toolchain for Accelerator Architecture Validation.
IEEE Comput. Archit. Lett., 2024

NDPmulator: Enabling Full-System Simulation for Near-Data Accelerators From Caches to DRAM.
IEEE Access, 2024

Vehicle-to-Vehicle Charging: Model, Complexity, and Heuristics.
Proceedings of the IEEE International Conference on Communications, 2024

Highly accurate and fast YOLOv4-based polyp detection.
Expert Syst. Appl., December, 2023

To PiM or Not to PiM.
Commun. ACM, June, 2023

Enabling High-Level Design Strategies for High-Throughput and Low-Power NB-LDPC Decoders.
IEEE Des. Test, February, 2023

RedBit: An End-to-End Flexible Framework for Evaluating the Accuracy of Quantized CNNs.
CoRR, 2023

Benchmarking Convolutional Neural Network Inference on Low-Power Edge Devices.
Proceedings of the IEEE International Conference on Acoustics, 2023

pLUTo: Enabling Massively Parallel Computation In DRAM via Lookup Tables.
Dataset, July, 2022

Guest Editorial: Special Issue on Advances in Signal Processing Systems.
J. Signal Process. Syst., 2022

Uncertainty Estimation via Monte Carlo Dropout in CNN-Based mmWave MIMO Localization.
IEEE Signal Process. Lett., 2022

An Empirical Study on the Use of Quantum Computing for Financial Portfolio Optimization.
SN Comput. Sci., 2022

To PiM or Not to PiM: The case for in-memory inferencing of quantized CNNs at the edge.
ACM Queue, 2022

Compiling for Vector Extensions With Stream-Based Specialization.
IEEE Micro, 2022

Special Issue on Artificial Intelligence at the Edge.
IEEE Micro, 2022

A Survey on High-Throughput Non-Binary LDPC Decoders: ASIC, FPGA, and GPU Architectures.
IEEE Commun. Surv. Tutorials, 2022

gem5-ndp: Near-Data Processing Architecture Simulation From Low Level Caches to DRAM.
Proceedings of the 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2022

pLUTo: Enabling Massively Parallel Computation in DRAM via Lookup Tables.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

A Compute Cache System for Signal Processing Applications.
J. Signal Process. Syst., 2021

PureMIC: A New Audio Dataset for the Classification of Musical Instruments based on Convolutional Neural Networks.
J. Signal Process. Syst., 2021

Hyperspectral Parallel Image Compression on Edge GPUs.
Remote. Sens., 2021

GPU-accelerated uncapacitated facility location and semi-dense SymStereo pipelines for piecewise-planar-based 3D reconstruction.
J. Real Time Image Process., 2021

Gbit/s Throughput Under 6.3-W Lossless Hyperspectral Image Compression on Parallel Embedded Devices.
IEEE Embed. Syst. Lett., 2021

pLUTo: In-DRAM Lookup Tables to Enable Massively Parallel General-Purpose Computation.
CoRR, 2021

Benchmarking Vulkan vs OpenGL Rendering on Low-Power Edge GPUs.
Proceedings of the International Conference on Graphics and Interaction, 2021

On the Performance of Link Space Communications using NB-LDPC Codes on Embedded Parallel Systems.
Proceedings of the 55th Asilomar Conference on Signals, Systems, and Computers, 2021

Deep Learning Architectures for Accurate Millimeter Wave Positioning in 5G.
Neural Process. Lett., 2020

1.2 Watt Classification of 3D Voxel Based Point-clouds using a CNN on a Neural Compute Stick.
Neurocomputing, 2020

Optimized Voronoi-based algorithms for parallel shortest vector computations.
IACR Cryptol. ePrint Arch., 2020

Dethroning GPS: Low-Power Accurate 5G Positioning Systems Using Machine Learning.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2020

Can 5G and Machine Learning Replace the Global Positioning System?
ERCIM News, 2020

Pushing the Limits of Energy Efficiency for Non-Binary LDPC Decoders on GPUs and FPGAs.
Proceedings of the IEEE Workshop on Signal Processing Systems, 2020

Processing Convolutional Neural Networks on Cache.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

1.5GBIT/S 4.9W Hyperspectral Image Encoders on a Low-Power Parallel Heterogeneous Processing Platform.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Enhancing the Labelling of Audio Samples for Automatic Instrument Classification Based on Neural Networks.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Gbit/s Non-Binary LDPC Decoders: High-Throughput using High-Level Specifications.
Proceedings of the 28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2020

Parallel refinement of slanted 3D reconstruction using dense stereo induced from symmetry.
J. Real Time Image Process., 2019

Heterogeneous Implementation of a Voronoi Cell-Based SVP Solver.
IEEE Access, 2019

Pragma-Oriented Parallelization of the Direct Sparse Odometry SLAM Algorithm.
Proceedings of the 27th Euromicro International Conference on Parallel, 2019

Enhancing Beamformed Fingerprint Outdoor Positioning with Hierarchical Convolutional Neural Networks.
Proceedings of the IEEE International Conference on Acoustics, 2019

Memory-Optimized Voronoi Cell-based Parallel Kernels for the Shortest Vector Problem on Lattices.
Proceedings of the 27th European Signal Processing Conference, 2019

Simulating electron wave dynamics in graphene superlattices exploiting parallel processing advantages.
Comput. Phys. Commun., 2018

Distributed Learning of CNNs on Heterogeneous CPU/GPU Architectures.
Appl. Artif. Intell., 2018

Beamformed Fingerprint Learning for Accurate Millimeter Wave Positioning.
Proceedings of the 88th IEEE Vehicular Technology Conference, 2018

Exploiting Compute Caches for Memory Bound Vector Operations.
Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018

Low-Effort Task Distribution of Stencil Computation on Heterogeneous Multi-GPUs: Simulating Graphene Superlattices.
Proceedings of the 26th Euromicro International Conference on Parallel, 2018

Data-Aided Fast Beamforming Selection for 5G.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Hybrid multi-GPU computing: accelerated kernels for segmentation and object detection with medical image processing applications.
J. Real Time Image Process., 2017

A Practical View of the State-of-the-Art of Lattice-Based Cryptanalysis.
IEEE Access, 2017

Design Space Exploration of LDPC Decoders Using High-Level Synthesis.
IEEE Access, 2017

Unreliable memory operation on a convolutional neural network processor.
Proceedings of the 2017 IEEE International Workshop on Signal Processing Systems, 2017

On the Evaluation of Energy-Efficient Deep Learning Using Stacked Autoencoders on Mobile GPUs.
Proceedings of the 25th Euromicro International Conference on Parallel, 2017

Energy-Efficient and Portable Least Squares Prediction for Image Coding on a Mobile GPU.
Proceedings of the 25th Euromicro International Conference on Parallel, 2017

SCRATCH: an end-to-end application-aware soft-GPGPU architecture and trimming tool.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Classify 3D voxel based point-cloud using convolutional neural network on a neural compute stick.
Proceedings of the 13th International Conference on Natural Computation, 2017

Convolutional neural network on neural compute stick for voxelized point-clouds classification.
Proceedings of the 10th International Congress on Image and Signal Processing, 2017

Mobile 4K / 2K / HD video streaming supported by real-time FEC raptorQ codes.
IEEE Trans. Consumer Electron., 2016

Stacked Autoencoders Using Low-Power Accelerated Architectures for Object Recognition in Autonomous Systems.
Neural Process. Lett., 2016

Real-time HD image distortion correction in heterogeneous parallel computing systems using efficient memory access patterns.
J. Real Time Image Process., 2016

A Survey on Programmable LDPC Decoders.
IEEE Access, 2016

Optimized fast Walsh-Hadamard transform on OpenCL-GPU and OpenCL-CPU.
Proceedings of the Sixth International Conference on Image Processing Theory, 2016

Optimizing GPU Code for CPU Execution Using OpenCL and Vectorization: A Case Study on Image Coding.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2016

High-Level Designs of Complex FIR Filters on FPGAs for the SKA.
Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, 2016

Enhancing Design Space Exploration by Extending CPU/GPU Specifications onto FPGAs.
ACM Trans. Embed. Comput. Syst., 2015

Distributed dense stereo matching for 3D reconstruction using parallel-based processing advantages.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Accelerating and deceleratingmin-sum-based gear-shift LDPC decoders.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

From low-architectural expertise up to high-throughput non-binary LDPC decoders: Optimization guidelines using high-level synthesis.
Proceedings of the 25th International Conference on Field Programmable Logic and Applications, 2015

Fast Design Space Exploration Using Vivado HLS: Non-binary LDPC Decoders.
Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2015

The impact of faulty memory bit cells on the decoding of spatially-coupled LDPC codes.
Proceedings of the 49th Asilomar Conference on Signals, Systems and Computers, 2015

Optimized Fast Walsh-Hadamard Transform on GPUs for non-binary LDPC decoding.
Parallel Comput., 2014

Using the GPU for fast symmetry-based dense stereo matching in high resolution images.
Proceedings of the IEEE International Conference on Acoustics, 2014

Flexible non-binary LDPC decoding on FPGAs.
Proceedings of the IEEE International Conference on Acoustics, 2014

Cooperative use of parallel processing with time or frequency-domain filtering for shape recognition.
Proceedings of the 22nd European Signal Processing Conference, 2014

Combining flexibility with low power: Dataflow and wide-pipeline LDPC decoding engines in the Gbit/s era.
Proceedings of the IEEE 25th International Conference on Application-Specific Systems, 2014

On the performance of LDPC and turbo decoder architectures with unreliable memories.
Proceedings of the 48th Asilomar Conference on Signals, Systems and Computers, 2014

Stressing the BER simulation of LDPC codes in the error floor region using GPU clusters.
Proceedings of the ISWCS 2013, 2013

Unsupervised Intrinsic Calibration from a Single Frame Using a "Plumb-Line" Approach.
Proceedings of the IEEE International Conference on Computer Vision, 2013

Near-LSPA performance at MSA complexity.
Proceedings of IEEE International Conference on Communications, 2013

Fast aberrant crypt foci segmentation on the GPU.
Proceedings of the IEEE International Conference on Acoustics, 2013

FFT-SPA non-binary LDPC decoding on GPU.
Proceedings of the IEEE International Conference on Acoustics, 2013

Portable parallel kernels for high-speed beamforming in synthetic aperture ultrasound imaging.
Proceedings of the IEEE International Conference on Acoustics, 2013

Open the Gates: Using High-level Synthesis towards programmable LDPC decoders on FPGAs.
Proceedings of the IEEE Global Conference on Signal and Information Processing, 2013

From OpenCL to gates: The FFT.
Proceedings of the IEEE Global Conference on Signal and Information Processing, 2013

Erratum to "A New Solution for Camera Calibration and Real-Time Image Distortion Correction in Medical Endoscopy-Initial Technical Evaluation".
IEEE Trans. Biomed. Eng., 2012

A New Solution for Camera Calibration and Real-Time Image Distortion Correction in Medical Endoscopy-Initial Technical Evaluation.
IEEE Trans. Biomed. Eng., 2012

Portable LDPC Decoding on Multicores Using OpenCL [Applications Corner].
IEEE Signal Process. Mag., 2012

Configurable M-factor VLSI DVB-S2 LDPC decoder architecture with optimized memory tiling design.
EURASIP J. Wirel. Commun. Netw., 2012

LDPC Decoding on the Intel SCC.
Proceedings of the 20th Euromicro International Conference on Parallel, 2012

Shortening Design Time through Multiplatform Simulations with a Portable OpenCL Golden-model: The LDPC Decoder Case.
Proceedings of the 2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines, 2012

Massively LDPC Decoding on Multicore Architectures.
IEEE Trans. Parallel Distributed Syst., 2011

Real-time DVB-S2 LDPC decoding on many-core GPU accelerators.
Proceedings of the IEEE International Conference on Acoustics, 2011

Embedded multicore architectures for LDPC decoding.
Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, 2010

Parallel LDPC Decoding on GPUs Using a Stream-Based Computing Approach.
J. Comput. Sci. Technol., 2009

How GPUs can outperform ASICs for fast LDPC decoding.
Proceedings of the 23rd international conference on Supercomputing, 2009

Multi-core platforms for signal processing: source and channel coding.
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

Parallel LDPC Decoding on the Cell/B.E. Processor.
Proceedings of the High Performance Embedded Architectures and Compilers, 2009

Massive parallel LDPC decoding on GPU.
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

Edge Stream Oriented LDPC Decoding.
Proceedings of the 16th Euromicro International Conference on Parallel, 2008

Scalable and parallel codec architectures for the DVB-S2 FEC system.
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2008

Flexible Parallel Architecture for DVB-S2 LDPC Decoders.
Proceedings of the Global Communications Conference, 2007

HDL Library of Processing Units for Generic and DVB-S2 LDPC Decoding.
Proceedings of the SIGMAP 2006, 2006
