Juan Gómez-Luna
Orcid: 0000-0002-6514-1571
According to our database1,
Juan Gómez-Luna
authored at least 139 papers
between 2008 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
PyGim : An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures.
Proc. ACM Meas. Anal. Comput. Syst., December, 2024
IEEE Trans. Computers, September, 2024
ACM Trans. Archit. Code Optim., September, 2024
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., March, 2024
ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-efficient Genome Analysis.
ACM Trans. Archit. Code Optim., March, 2024
Hardware Acceleration for Knowledge Graph Processing: Challenges & Recent Developments.
CoRR, 2024
Analysis of Distributed Optimization Algorithms on a Real Processing-In-Memory System.
CoRR, 2024
PUMA: Efficient and Low-Cost Memory Allocation and Alignment Support for Processing-Using-Memory Architectures.
CoRR, 2024
MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Processing.
CoRR, 2024
IEEE Access, 2024
SwiftRL: Towards Efficient Reinforcement Learning on Real Processing-In-Memory Systems.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2024
Functionally-Complete Boolean Logic in Real DRAM Chips: Experimental Characterization and Analysis.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024
MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Computing.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024
Simultaneous Many-Row Activation in Off-the-Shelf DRAM Chips: Experimental Characterization and Analysis.
Proceedings of the 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2024
Read Disturbance in High Bandwidth Memory: A Detailed Experimental Study on HBM2 DRAM Chips.
Proceedings of the 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2024
PIM-Opt: Demystifying Distributed Optimization Algorithms on a Real-World Processing-In-Memory System.
Proceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques, 2024
2023
J. Supercomput., May, 2023
Scrooge: a fast and memory-frugal genomic sequence aligner for CPUs, GPUs, and ASICs.
Bioinform., May, 2023
A framework for high-throughput sequence alignment using real processing-in-memory systems.
Bioinform., May, 2023
ACM Trans. Archit. Code Optim., March, 2023
IEEE Trans. Emerg. Top. Comput., 2023
PULSAR: Simultaneous Many-Row Activation for Reliable and High-Performance Computing in Off-the-Shelf DRAM Chips.
CoRR, 2023
Understanding Read Disturbance in High Bandwidth Memory: An Experimental Analysis of Real HBM2 DRAM Chips.
CoRR, 2023
TransPimLib: A Library for Efficient Transcendental Functions on Processing-in-Memory Systems.
CoRR, 2023
Extending Memory Capacity in Modern Consumer Systems With Emerging Non-Volatile Memory: Experimental Analysis and Characterization Using the Intel Optane SSD.
IEEE Access, 2023
IEEE Access, 2023
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023
Venice: Improving Solid-State Drive Parallelism at Low Cost via Conflict-Free Accesses.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023
Proceedings of the IEEE International Symposium on Workload Characterization, 2023
SPARTA: Spatial Acceleration for Efficient and Scalable Horizontal Diffusion Weather Stencil Computation.
Proceedings of the 37th International Conference on Supercomputing, 2023
Proceedings of the 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2023
Proceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques, 2023
2022
Dataset, July, 2022
ACM Trans. Reconfigurable Technol. Syst., 2022
SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures.
Proc. ACM Meas. Anal. Comput. Syst., 2022
Accelerating Neural Network Inference With Processing-in-DRAM: From the Edge to the Cloud.
IEEE Micro, 2022
J. Real Time Image Process., 2022
CoRR, 2022
RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory.
CoRR, 2022
An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System.
CoRR, 2022
Going From Molecules to Genomic Variations to Scientific Discovery: Intelligent Algorithms and Architectures for Intelligent Genome Analysis.
CoRR, 2022
Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems.
CoRR, 2022
SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems.
CoRR, 2022
Benchmarking a New Paradigm: Experimental Analysis and Characterization of a Real Processing-in-Memory System.
IEEE Access, 2022
Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures.
Proceedings of the SIGMETRICS/PERFORMANCE '22: ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, Mumbai, India, June 6, 2022
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022
Morpheus: Extending the Last Level Cache Capacity in GPU Systems Using Idle GPU Core Resources.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022
Flash-Cosmos: In-Flash Bulk Bitwise Operations Using Inherent Computation Capability of NAND Flash Memory.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022
Methodologies, Workloads, and Tools for Processing-in-Memory: Enabling the Adoption of Data-Centric Architectures.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022
Heterogeneous Data-Centric Architectures for Modern Data-Intensive Applications: Case Studies in Machine Learning and Databases.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022
PiDRAM: An FPGA-based Framework for End-to-end Evaluation of Processing-in-DRAM Techniques.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022
SparseP: Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022
SeGraM: a universal hardware accelerator for genomic sequence-to-graph and sequence-to-sequence mapping.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022
Sibyl: adaptive and extensible data placement in hybrid storage systems using online reinforcement learning.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022
High-throughput Pairwise Alignment with the Wavefront Algorithm using Processing-in-Memory.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022
LEAPER: Fast and Accurate FPGA-based System Performance Prediction via Transfer Learning.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2022
2021
IEEE Micro, 2021
Extending Memory Capacity in Consumer Devices with Emerging Non-Volatile Memory: An Experimental Study.
CoRR, 2021
CoRR, 2021
Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture.
CoRR, 2021
pLUTo: In-DRAM Lookup Tables to Enable Massively Parallel General-Purpose Computation.
CoRR, 2021
SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems.
CoRR, 2021
BurstLink: Techniques for Energy-Efficient Conventional and Virtual Reality Video Display.
CoRR, 2021
SneakySnake: a fast and accurate universal genome pre-alignment filter for CPUs, GPUs and FPGAs.
Bioinform., 2021
DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks.
IEEE Access, 2021
BurstLink: Techniques for Energy-Efficient Video Display for Conventional and Virtual Reality Systems.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021
SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021
IChannels: Exploiting Current Management Mechanisms to Create Covert Channels in Modern Processors.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021
CODIC: A Low-Cost Substrate for Enabling Custom In-DRAM Functionalities and Optimizations.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021
Benchmarking Memory-Centric Computing Systems: Analysis of Real Processing-In-Memory Hardware.
Proceedings of the 12th International Green and Sustainable Computing Workshops, 2021
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021
2020
Accelerating B-spline interpolation on GPUs: Application to medical image registration.
Comput. Methods Programs Biomed., 2020
Comput. Methods Programs Biomed., 2020
Comput. Electr. Eng., 2020
Accelerating Chan-Vese model with cross-modality guided contrast enhancement for liver segmentation.
Comput. Biol. Medicine, 2020
FIGARO: Improving System Performance via Fine-Grained In-DRAM Data Relocation and Caching.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020
GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020
Proceedings of the 38th IEEE International Conference on Computer Design, 2020
NERO: A Near High-Bandwidth Memory Stencil Accelerator for Weather Prediction Modeling.
Proceedings of the 30th International Conference on Field-Programmable Logic and Applications, 2020
Boyi: A Systematic Framework for Automatically Deciding the Right Execution Model of OpenCL Applications on FPGAs.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020
2019
Microprocess. Microsystems, 2019
CoRR, 2019
Analysis and Modeling of Collaborative Execution Strategies for Heterogeneous CPU-FPGA Architectures.
Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering, 2019
SMASH: Co-designing Software Compression and Hardware-Accelerated Indexing for Efficient Sparse Matrix Operations.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019
NAPEL: Near-Memory Computing Application Performance Prediction via Ensemble Learning.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019
Proceedings of the 56th Annual Design Automation Conference 2019, 2019
Automatic Generation of Warp-Level Primitives and Atomic Instructions for Fast and Portable Parallel Reduction on GPUs.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019
2018
J. Parallel Distributed Comput., 2018
High-Performance Computation of Bézier Surfaces on Parallel and Heterogeneous Platforms.
Int. J. Parallel Program., 2018
Enabling Efficient RDMA-based Synchronous Mirroring of Persistent Memory Transactions.
CoRR, 2018
CoRR, 2018
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018
Proceedings of the 16th USENIX Conference on File and Storage Technologies, 2018
2017
J. Parallel Distributed Comput., 2017
Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, 2017
Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017
Proceedings of the International Conference on Computational Science, 2017
2016
IEEE Trans. Computers, 2016
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016
Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications.
Proceedings of the 2016 IEEE International Symposium on Workload Characterization, 2016
2015
Calculation of dense trajectory descriptors on a heterogeneous embedded architecture.
J. Syst. Archit., 2015
Proceedings of the 44th International Conference on Parallel Processing, 2015
2014
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014
Proceedings of the International Conference on High Performance Computing & Simulation, 2014
Proceedings of the 2014 Conference on Design and Architectures for Signal and Image Processing, 2014
2013
IEEE Trans. Parallel Distributed Syst., 2013
Proceedings of the 2012 International Conference on Reconfigurable Computing and FPGAs, 2013
Simulation and architecture improvements of atomic operations on GPU scratchpad memory.
Proceedings of the 2013 IEEE 31st International Conference on Computer Design, 2013
2012
Performance models for asynchronous data transfers on consumer Graphics Processing Units.
J. Parallel Distributed Comput., 2012
2011
Load Balancing versus Occupancy Maximization on Graphics Processing Units: The Generalized Hough Transform as a Case Study.
Int. J. High Perform. Comput. Appl., 2011
Proceedings of the IT Revolutions, 2011
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011
2010
Proceedings of the Trends in Applied Intelligent Systems, 2010
2009
Proceedings of the ReConFig'09: 2009 International Conference on Reconfigurable Computing and FPGAs, 2009
Parallelization of a Video Segmentation Algorithm on CUDA-Enabled Graphics Processing Units.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009
2008
Proceedings of the 8th IEEE International Conference on Advanced Learning Technologies, 2008