José Nelson Amaral

Fernando Magno Quintão Pereira

Proceedings of the XXIII Brazilian Symposium on Programming Languages, 2019

Toward an Analytical Performance Model to Select between GPU and CPU Execution.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

Compiler-driven performance workshop.

[BibT_eX]

[DOI]

Proceedings of the 29th Annual International Conference on Computer Science and Software Engineering, 2019

2018

Using Hardware-Transactional-Memory Support to Implement Thread-Level Speculation.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2018

Syntax and sensibility: Using language models to detect and correct syntax errors.

[BibT_eX]

[DOI]

Eddie Antonio Santos

Dhvani Patel

Proceedings of the 25th International Conference on Software Analysis, 2018

OpenMP Code Offloading: Splitting GPU Kernels, Pipelining Communication and Computation, and Selecting Better Grid Geometries.

[BibT_eX]

[DOI]

Artem Chikin

Tyler Gobran

Joao Henrique Stange Hoffmam

Proceedings of the Accelerator Programming Using Directives - 5th International Workshop, 2018

Automated GPU Grid Geometry Selection for OPENMP Kernels.

[BibT_eX]

[DOI]

Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018

The Alberta Workloads for the SPEC CPU 2017 Benchmark Suite.

[BibT_eX]

[DOI]

Marcus Karpoff

Erick Ochoa

Morgan Redshaw

Raphael Ernani Rodrigues

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2018

Run-Length Base-Delta Encoding for High-Speed Compression.

[BibT_eX]

[DOI]

Proceedings of the 47th International Conference on Parallel Processing, 2018

2017

Finding and correcting syntax errors using recurrent neural networks.

[BibT_eX]

[DOI]

Eddie A. Santos

PeerJ Prepr., 2017

Performance Evaluation of Thread-Level Speculation in Off-the-Shelf Hardware Transactional Memories.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

2016

Combining Static and Dynamic Data Coalescing in Unified Parallel C.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2016

The Truth, The Whole Truth, and Nothing But the Truth: A Pragmatic Guide to Assessing Empirical Evaluations.

[BibT_eX]

[DOI]

Sebastian Fischmeister

ACM Trans. Program. Lang. Syst., 2016

SafeType: detecting type violations for type-basedalias analysis of C.

[BibT_eX]

[DOI]

Softw. Pract. Exp., 2016

Study of hardware transactional memory characteristics and serialization policies on Haswell.

[BibT_eX]

[DOI]

Márcio Machado Pereira

Parallel Comput., 2016

Using shared-data localization to reduce the cost of inspector-execution in unified-parallel-C programs.

[BibT_eX]

[DOI]

Parallel Comput., 2016

Evaluating and Improving Thread-Level Speculation in Hardware Transactional Memories.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

2015

Software Support and Evaluation of Hardware Transactional Memory on Blue Gene/Q.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2015

Error location in Python: where the mutants hide.

[BibT_eX]

[DOI]

PeerJ Prepr., 2015

Hybrid parallel task placement in irregular applications.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2015

Guest Editorial: SBAC-PAD 2013.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2015

In defense of soundiness: a manifesto.

[BibT_eX]

[DOI]

Dimitrios Vardoulakis

Commun. ACM, 2015

Using Hardware Transactional Memory to Enable Speculative Trace Optimization.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Symposium on Computer Architecture and High Performance Computing Workshops, 2015

Serialization Management for Best-Effort Hardware Transactional Memory.

[BibT_eX]

[DOI]

Proceedings of the 27th International Symposium on Computer Architecture and High Performance Computing, 2015

Stratified Sampling for Even Workload Partitioning Applied to IDA* and Delaunay Algorithms.

[BibT_eX]

[DOI]

Levi H. S. Lelis

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Stratified sampling for even workload partitioning applied to single source shortest path algorithm.

[BibT_eX]

[DOI]

Levi H. S. Lelis

Proceedings of 25th Annual International Conference on Computer Science and Software Engineering, 2015

Data-dependence profiling to enable safe thread level speculation.

[BibT_eX]

[DOI]

Arnamoy Bhattacharyya

Hal Finkel

Proceedings of 25th Annual International Conference on Computer Science and Software Engineering, 2015

2014

A special issue from the international conference on performance engineering 2013.

[BibT_eX]

[DOI]

A. J. Field

Concurr. Comput. Pract. Exp., 2014

Multi-dimensional Evaluation of Haswell's Transactional Memory Performance.

[BibT_eX]

[DOI]

Márcio Machado Pereira

Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Reducing Compiler-Inserted Instrumentation in Unified-Parallel-C Code Generation.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Syntax errors just aren't natural: improving error reporting with language models.

[BibT_eX]

[DOI]

Proceedings of the 11th Working Conference on Mining Software Repositories, 2014

Measuring Effective Work to Reward Success in Dynamic Transaction Scheduling.

[BibT_eX]

[DOI]

Márcio Machado Pereira

Proceedings of the 43rd International Conference on Parallel Processing, 2014

Heavyweight Pattern Mining in Attributed Flow Graphs.

[BibT_eX]

[DOI]

Carolina Simoes Gomes

Proceedings of the 2014 IEEE International Conference on Data Mining, 2014

Optimizing shared data accesses in distributed-memory X10 systems.

[BibT_eX]

[DOI]

Olivier Tardieu

Proceedings of the 21st International Conference on High Performance Computing, 2014

Stratified sampling for even workload partitioning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013

Hybrid parallel task placement in X10.

[BibT_eX]

[DOI]

Olivier Tardieu

Proceedings of the third ACM SIGPLAN X10 Workshop, 2013

Improving performance of all-to-all communication through loop scheduling in PGAS environments.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2013

Improving communication in PGAS environments: static and dynamic coalescing in UPC.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2013

On the Merits of Distributed Work-Stealing on Selective Locality-Aware Tasks.

[BibT_eX]

[DOI]

Olivier Tardieu

Proceedings of the 42nd International Conference on Parallel Processing, 2013

Automatic speculative parallelization of loops using polyhedral dependence analysis.

[BibT_eX]

[DOI]

Arnamoy Bhattacharyya

Proceedings of the First International Workshop on Code Optimisation for Multi and Many Cores, 2013

12th Compiler-Driven Performance Workshop.

[BibT_eX]

[DOI]

Proceedings of the Center for Advanced Studies on Collaborative Research, 2013

2012

Combined profiling: A methodology to capture varied program behavior across multiple inputs.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012

11th Compiler-Driven Performance Workshop.

[BibT_eX]

[DOI]

Proceedings of the Center for Advanced Studies on Collaborative Research, 2012

Evaluation of blue Gene/Q hardware support for transactional memories.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Transactional event profiling in a best-effort hardware transactional memory system.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011

Evaluating address register assignment and offset assignment algorithms.

[BibT_eX]

[DOI]

ACM Trans. Embed. Comput. Syst., 2011

Combined profiling: practical collection of feedback information for code optimization.

[BibT_eX]

[DOI]

Adam Preuss

Proceedings of the ICPE'11, 2011

Using machines to learn method-specific compilation strategies.

[BibT_eX]

[DOI]

Ricardo Nabinger Sanchez

Proceedings of the CGO 2011, 2011

10th Workshop on Compiler-Driven Performance.

[BibT_eX]

[DOI]

Proceedings of the Center for Advanced Studies on Collaborative Research, 2011

2010

An Optimal Encoding to Represent a Single Set in an ROBDD.

[BibT_eX]

[DOI]

Ondrej Lhoták

IEEE Trans. Computers, 2010

Using Support Vector Machines to Learn How to Compile a Method.

[BibT_eX]

[DOI]

Ricardo Nabinger Sanchez

Proceedings of the 22st International Symposium on Computer Architecture and High Performance Computing, 2010

Mining for Paths in Flow Graphs.

[BibT_eX]

[DOI]

Adam Jocksch

Marcel Mitran

Proceedings of the Advances in Data Mining. Applications and Theoretical Aspects, 2010

Mining Opportunities for Code Improvement in a Just-In-Time Compiler.

[BibT_eX]

[DOI]

Proceedings of the Compiler Construction, 19th International Conference, 2010

Compiling Python to a hybrid execution environment.

[BibT_eX]

[DOI]

Rahul Garg

Proceedings of 3rd Workshop on General Purpose Processing on Graphics Processing Units, 2010

2009

Using XBDDs and ZBDDs in points-to analysis.

[BibT_eX]

[DOI]

Ondrej Lhoták

Softw. Pract. Exp., 2009

Workload Reduction for Multi-input Feedback-Directed Optimization.

[BibT_eX]

[DOI]

Proceedings of the CGO 2009, 2009

2008

A cache-based internet protocol address lookup architecture.

[BibT_eX]

[DOI]

Comput. Networks, 2008

MPADS: memory-pooling-assisted data splitting.

[BibT_eX]

[DOI]

Proceedings of the 7th International Symposium on Memory Management, 2008

The MAP3S Static-and-Regular Mesh Simulation and Wavefront Parallel-Programming Patterns.

[BibT_eX]

[DOI]

Duane Szafron

Proceedings of the 2008 International Conference on Parallel Processing, 2008

Topic 9: Parallel and Distributed Programming.

[BibT_eX]

[DOI]

Joaquim Gabarró

Proceedings of the Euro-Par 2008, 2008

2007

<i>Forma</i>: A framework for safe automatic array reshaping.

[BibT_eX]

[DOI]

ACM Trans. Program. Lang. Syst., 2007

<i>Ablego</i>: a function outlining and partial inlining framework.

[BibT_eX]

[DOI]

Softw. Pract. Exp., 2007

Using SIMD registers and instructions to enable instruction-level parallelism in sorting algorithms.

[BibT_eX]

[DOI]

Timothy Furtak

Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007

Using ZBDDs in Points-to Analysis.

[BibT_eX]

[DOI]

Ondrej Lhoták

Proceedings of the Languages and Compilers for Parallel Computing, 2007

Multidimensional Blocking in UPC.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2007

Evaluation of Offset Assignment Heuristics.

[BibT_eX]

[DOI]

Proceedings of the High Performance Embedded Architectures and Compilers, 2007

A Dimension Abstraction Approach to Vectorization in Matlab.

[BibT_eX]

[DOI]

Neil Birkbeck

Jonathan Levesque

Proceedings of the Fifth International Symposium on Code Generation and Optimization (CGO 2007), 2007

2006

Is MPI suitable for a generative design-pattern system?

[BibT_eX]

[DOI]

Paras Mehta

Duane Szafron

Parallel Comput., 2006

Eliminating Redundant Join-Set Computations in Static Single Assignment.

[BibT_eX]

[DOI]

Angela French

J. Univers. Comput. Sci., 2006

Shared memory programming for large scale machines.

[BibT_eX]

[DOI]

Siddhartha Chatterjee

Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, 2006

A Characterization of Shared Data Access Patterns in UPC Programs.

[BibT_eX]

[DOI]

Christopher Barton

Calin Cascaval

Proceedings of the Languages and Compilers for Parallel Computing, 2006

Tree-Traversal Orientation Analysis.

[BibT_eX]

[DOI]

Kevin Andrusky

Proceedings of the Languages and Compilers for Parallel Computing, 2006

Aestimo: a feedback-directed optimization evaluation tool.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006

A Parallel External-Memory Frontier Breadth-First Traversal Algorithm for Clusters of Workstations.

[BibT_eX]

[DOI]

Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006

Utilizing field usage patterns for Java heap space optimization.

[BibT_eX]

[DOI]

Proceedings of the 2006 conference of the Centre for Advanced Studies on Collaborative Research, 2006

Sequential and Parallel Algorithms for Frontier A* with Delayed Duplicate Detection.

[BibT_eX]

[DOI]

Proceedings of the Proceedings, 2006

2005

Teaching digital design to computing science students in a single academic term.

[BibT_eX]

[DOI]

Paras Mehta

IEEE Trans. Educ., 2005

Function Outlining and Partial Inlining.

[BibT_eX]

[DOI]

Proceedings of the 17th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2005), 2005

A Multizone Pipelined Cache for IP Routing.

[BibT_eX]

[DOI]

Proceedings of the NETWORKING 2005: Networking Technologies, 2005

A hardware-based longest prefix matching scheme for TCAMs.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2005), 2005

Feedback-Directed Switch-Case Statement Optimization.

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Parallel Processing Workshops (ICPP 2005 Workshops), 2005

Generalized Index-Set Splitting.

[BibT_eX]

[DOI]

Proceedings of the Compiler Construction, 14th International Conference, 2005

2004

FPGA implementation and experimental evaluation of a multizone network cache.

[BibT_eX]

[DOI]

Mike H. MacGregor

Microprocess. Microsystems, 2004

A performance study of data layout techniques for improving data locality in refinement-based pathfinding.

[BibT_eX]

[DOI]

ACM J. Exp. Algorithmics, 2004

An FPGA prototype for the experimental evaluation of a multizone network cache.

[BibT_eX]

[DOI]

Mike H. MacGregor

Proceedings of the ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays, 2004

Identifying opportunities for automatic remote field cloning.

[BibT_eX]

[DOI]

Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research, 2004

2003

Minimum Register Instruction Sequencing to Reduce Register Spills in Out-of-Order Issue Superscalar Architectures.

[BibT_eX]

[DOI]

Ramaswamy Govindarajan

IEEE Trans. Computers, 2003

Implementation of the EARTH programming model on SMP clusters: a multi-threaded language and runtime system.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2003

To Inline or Not to Inline? Enhanced Inlining Decisions.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2003

Crafting Data Structures: A Study of Reference Locality in Refinement-Based Pathfinding.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - HiPC 2003, 10th International Conference, 2003

The Bank Nth Chance Replacement Policy for FPGA-Based CAMs.

[BibT_eX]

[DOI]

Proceedings of the Field Programmable Logic and Application, 13th International Conference, 2003

Should potential loop optimizations influence inlining decisions?

[BibT_eX]

[DOI]

Christopher Barton

Bob Blainey

Proceedings of the 2003 conference of the Centre for Advanced Studies on Collaborative Research, 2003

2002

On the Tamability of the Location Consistency Memory Model.

[BibT_eX]

Charles Wallace

Guy Tremblay

Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2002

Fine-Grain Stacked Register Allocation for the Itanium Architecture.

[BibT_eX]

[DOI]

Alban Douillet

Guang R. Gao

Proceedings of the Languages and Compilers for Parallel Computing, 15th Workshop, 2002

Removing Impediments to Loop Fusion Through Code Transformations.

[BibT_eX]

[DOI]

Bob Blainey

Christopher Barton

Proceedings of the Languages and Compilers for Parallel Computing, 15th Workshop, 2002

2001

Dynamic Load Balancers for a Multithreaded Multiprocessor System.

[BibT_eX]

[DOI]

Parallel Process. Lett., 2001

An Abstract State Machine Specification and Verification of the Location Consistency Memory Model and Cache Protocol.

[BibT_eX]

[DOI]

Charles Wallace

Guy Tremblay

Fernando César Comparsi de Castro

J. Univers. Comput. Sci., 2001

Exploiting Locality in Single Assignment Data Structures Updated Through Split-Phase Transactions.

[BibT_eX]

[DOI]

Clust. Comput., 2001

Minimum Register Instruction Sequence Problem: Revisiting Optimal Code Generation for DAGs.

[BibT_eX]

[DOI]

Ramaswamy Govindarajan

Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

Speculative Prefetching of Induction Pointers.

[BibT_eX]

[DOI]

Proceedings of the Compiler Construction, 10th International Conference, 2001

2000

Design and Implementation of an Efficient Thread Partitioning Algorithm.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, Third International Symposium, 2000

Caching Single-Assignment Structures to Build a Robust Fine-Grain Multi-Threading System.

[BibT_eX]

[DOI]

Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

Automatic compiler techniques for thread coarsening for multithreaded architectures.

[BibT_eX]

[DOI]

Proceedings of the 14th international conference on Supercomputing, 2000

1999

Coping with very High Latencies in Petaflop Computer Systems.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, Second International Symposium, 1999

1997

Invariant pattern recognition of 2D images using neural networks and frequency-domain representation.

[BibT_eX]

[DOI]

Paulo Roberto Girardello Franco

Proceedings of International Conference on Neural Networks (ICNN'97), 1997

1996

A Concurrent Architecture for Serializable Production Systems.

[BibT_eX]

[DOI]

Joydeep Ghosh

IEEE Trans. Parallel Distributed Syst., 1996

1995

Designing genetic algorithms for the state assignment problem.

[BibT_eX]

[DOI]

Kagan Tumer

Joydeep Ghosh

IEEE Trans. Syst. Man Cybern., 1995

Performance measurements of a concurrent production system architecture without global synchronization.

[BibT_eX]

[DOI]