José Nelson Amaral
Orcid: 0000-0002-9943-1809Affiliations:
- University of Alberta, Edmonton, Canada
According to our database1,
José Nelson Amaral
authored at least 131 papers
between 1995 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on ufmg.br
-
on orcid.org
-
on dl.acm.org
On csauthors.net:
Bibliography
2024
Proceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction, 2024
2023
Advancing Direct Convolution Using Convolution Slicing Optimization and ISA Extensions.
ACM Trans. Archit. Code Optim., December, 2023
Fast matrix multiplication via compiler-only layered data reorganization and intrinsic lowering.
Softw. Pract. Exp., September, 2023
ACM Trans. Archit. Code Optim., March, 2023
Proceedings of the 20th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes, 2023
Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization, 2023
Efficient Auto-Vectorization for Control-flow Dependent Loops through Data Permutation.
Proceedings of the 33rd Annual International Conference on Computer Science and Software Engineering, 2023
Stub Folding: Retaining Type Specialization to Increase the Efficiency of Highly Polymorphic Inline Caches.
Proceedings of the 33rd Annual International Conference on Computer Science and Software Engineering, 2023
2022
Vectorizing divergent control flow with active-lane consolidation on long-vector architectures.
J. Supercomput., 2022
Proceedings of the 32nd Annual International Conference on Computer Science and Software Engineering, 2022
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022
2021
Methodological Principles for Reproducible Performance Evaluation in Cloud Computing.
IEEE Trans. Software Eng., 2021
ACM Trans. Archit. Code Optim., 2021
Pooling Acceleration in the DaVinci Architecture Using Im2col and Col2im Instructions.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021
Vulkan Vision: Ray Tracing Workload Characterization using Automatic Graphics Instrumentation.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021
Proceedings of the CASCON '21: Proceedings of the 31st Annual International Conference on Computer Science and Software Engineering, Toronto, Ontario, Canada, November 22, 2021
2020
Flexibility Is Key in Organizing a Global Professional Conference Online: The ICPE 2020 Experience in the COVID-19 Era.
CoRR, 2020
Proceedings of the 32nd IEEE International Symposium on Computer Architecture and High Performance Computing, 2020
2019
Memory-access-aware Safety and Profitability Analysis for Transformation of Accelerator-bound OpenMP Loops.
ACM Trans. Archit. Code Optim., 2019
Proceedings of the XXIII Brazilian Symposium on Programming Languages, 2019
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019
Proceedings of the 29th Annual International Conference on Computer Science and Software Engineering, 2019
2018
IEEE Trans. Parallel Distributed Syst., 2018
Proceedings of the 25th International Conference on Software Analysis, 2018
OpenMP Code Offloading: Splitting GPU Kernels, Pipelining Communication and Computation, and Selecting Better Grid Geometries.
Proceedings of the Accelerator Programming Using Directives - 5th International Workshop, 2018
Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2018
Proceedings of the 47th International Conference on Parallel Processing, 2018
2017
PeerJ Prepr., 2017
Performance Evaluation of Thread-Level Speculation in Off-the-Shelf Hardware Transactional Memories.
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017
2016
IEEE Trans. Parallel Distributed Syst., 2016
The Truth, The Whole Truth, and Nothing But the Truth: A Pragmatic Guide to Assessing Empirical Evaluations.
ACM Trans. Program. Lang. Syst., 2016
Softw. Pract. Exp., 2016
Study of hardware transactional memory characteristics and serialization policies on Haswell.
Parallel Comput., 2016
Using shared-data localization to reduce the cost of inspector-execution in unified-parallel-C programs.
Parallel Comput., 2016
Evaluating and Improving Thread-Level Speculation in Hardware Transactional Memories.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016
2015
IEEE Trans. Computers, 2015
J. Parallel Distributed Comput., 2015
Proceedings of the 2015 International Symposium on Computer Architecture and High Performance Computing Workshops, 2015
Proceedings of the 27th International Symposium on Computer Architecture and High Performance Computing, 2015
Stratified Sampling for Even Workload Partitioning Applied to IDA* and Delaunay Algorithms.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015
Stratified sampling for even workload partitioning applied to single source shortest path algorithm.
Proceedings of 25th Annual International Conference on Computer Science and Software Engineering, 2015
Proceedings of 25th Annual International Conference on Computer Science and Software Engineering, 2015
2014
Concurr. Comput. Pract. Exp., 2014
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014
Proceedings of the 11th Working Conference on Mining Software Repositories, 2014
Proceedings of the 43rd International Conference on Parallel Processing, 2014
Proceedings of the 2014 IEEE International Conference on Data Mining, 2014
Proceedings of the 21st International Conference on High Performance Computing, 2014
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014
2013
Proceedings of the third ACM SIGPLAN X10 Workshop, 2013
Improving performance of all-to-all communication through loop scheduling in PGAS environments.
Proceedings of the International Conference on Supercomputing, 2013
Proceedings of the International Conference on Supercomputing, 2013
Proceedings of the 42nd International Conference on Parallel Processing, 2013
Proceedings of the First International Workshop on Code Optimisation for Multi and Many Cores, 2013
Proceedings of the Center for Advanced Studies on Collaborative Research, 2013
2012
Combined profiling: A methodology to capture varied program behavior across multiple inputs.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012
Proceedings of the Center for Advanced Studies on Collaborative Research, 2012
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012
2011
ACM Trans. Embed. Comput. Syst., 2011
Combined profiling: practical collection of feedback information for code optimization.
Proceedings of the ICPE'11, 2011
Proceedings of the CGO 2011, 2011
Proceedings of the Center for Advanced Studies on Collaborative Research, 2011
2010
IEEE Trans. Computers, 2010
Proceedings of the 22st International Symposium on Computer Architecture and High Performance Computing, 2010
Proceedings of the Advances in Data Mining. Applications and Theoretical Aspects, 2010
Proceedings of the Compiler Construction, 19th International Conference, 2010
Proceedings of 3rd Workshop on General Purpose Processing on Graphics Processing Units, 2010
2009
Proceedings of the CGO 2009, 2009
2008
Proceedings of the 7th International Symposium on Memory Management, 2008
The MAP3S Static-and-Regular Mesh Simulation and Wavefront Parallel-Programming Patterns.
Proceedings of the 2008 International Conference on Parallel Processing, 2008
Proceedings of the Euro-Par 2008, 2008
2007
ACM Trans. Program. Lang. Syst., 2007
Softw. Pract. Exp., 2007
Using SIMD registers and instructions to enable instruction-level parallelism in sorting algorithms.
Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007
Proceedings of the Languages and Compilers for Parallel Computing, 2007
Proceedings of the Languages and Compilers for Parallel Computing, 2007
Proceedings of the High Performance Embedded Architectures and Compilers, 2007
Proceedings of the Fifth International Symposium on Code Generation and Optimization (CGO 2007), 2007
2006
J. Univers. Comput. Sci., 2006
Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, 2006
Proceedings of the Languages and Compilers for Parallel Computing, 2006
Proceedings of the Languages and Compilers for Parallel Computing, 2006
Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006
A Parallel External-Memory Frontier Breadth-First Traversal Algorithm for Clusters of Workstations.
Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006
Proceedings of the 2006 conference of the Centre for Advanced Studies on Collaborative Research, 2006
Proceedings of the Proceedings, 2006
2005
IEEE Trans. Educ., 2005
Proceedings of the 17th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2005), 2005
Proceedings of the NETWORKING 2005: Networking Technologies, 2005
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2005), 2005
Proceedings of the 34th International Conference on Parallel Processing Workshops (ICPP 2005 Workshops), 2005
Proceedings of the Compiler Construction, 14th International Conference, 2005
2004
Microprocess. Microsystems, 2004
A performance study of data layout techniques for improving data locality in refinement-based pathfinding.
ACM J. Exp. Algorithmics, 2004
Proceedings of the ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays, 2004
Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research, 2004
2003
Minimum Register Instruction Sequencing to Reduce Register Spills in Out-of-Order Issue Superscalar Architectures.
IEEE Trans. Computers, 2003
Implementation of the EARTH programming model on SMP clusters: a multi-threaded language and runtime system.
Concurr. Comput. Pract. Exp., 2003
Proceedings of the Languages and Compilers for Parallel Computing, 2003
Crafting Data Structures: A Study of Reference Locality in Refinement-Based Pathfinding.
Proceedings of the High Performance Computing - HiPC 2003, 10th International Conference, 2003
Proceedings of the Field Programmable Logic and Application, 13th International Conference, 2003
Proceedings of the 2003 conference of the Centre for Advanced Studies on Collaborative Research, 2003
2002
On the Tamability of the Location Consistency Memory Model.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2002
Proceedings of the Languages and Compilers for Parallel Computing, 15th Workshop, 2002
Proceedings of the Languages and Compilers for Parallel Computing, 15th Workshop, 2002
2001
Parallel Process. Lett., 2001
An Abstract State Machine Specification and Verification of the Location Consistency Memory Model and Cache Protocol.
J. Univers. Comput. Sci., 2001
Exploiting Locality in Single Assignment Data Structures Updated Through Split-Phase Transactions.
Clust. Comput., 2001
Minimum Register Instruction Sequence Problem: Revisiting Optimal Code Generation for DAGs.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001
Proceedings of the Compiler Construction, 10th International Conference, 2001
2000
Proceedings of the High Performance Computing, Third International Symposium, 2000
Caching Single-Assignment Structures to Build a Robust Fine-Grain Multi-Threading System.
Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000
Proceedings of the 14th international conference on Supercomputing, 2000
1999
Proceedings of the High Performance Computing, Second International Symposium, 1999
1997
Invariant pattern recognition of 2D images using neural networks and frequency-domain representation.
Proceedings of International Conference on Neural Networks (ICNN'97), 1997
1996
IEEE Trans. Parallel Distributed Syst., 1996
1995
IEEE Trans. Syst. Man Cybern., 1995
Performance measurements of a concurrent production system architecture without global synchronization.
Proceedings of IPPS '95, 1995