Martin Burtscher

Orcid: 0000-0001-7717-3354

According to our database1, Martin Burtscher authored at least 114 papers between 1999 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Indigo3: A Parallel Graph Analytics Benchmark Suite for Exploring Implementation Styles and Common Bugs.
ACM Trans. Parallel Comput., September, 2024

Lessons Learned on the Path to Guaranteeing the Error Bound in Lossy Quantizers.
CoRR, 2024

HiRace: Accurate and Fast Source-Level Race Checking of GPU Programs.
CoRR, 2024

A Deep Dive into Task-Based Parallelism in Python.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Adaptive Per-File Lossless Compression of Floating-Point Data.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Using Machine Learning to Predict Effective Compression Algorithms for Heterogeneous Datasets.
Proceedings of the Data Compression Conference, 2024

LICO: An Effective, High-Speed, Lossless Compressor for Images.
Proceedings of the Data Compression Conference, 2024

2023
Choosing the Best Parallelization and Implementation Styles for Graph Analytics Codes: Lessons Learned from 1106 Programs.
Proceedings of the International Conference for High Performance Computing, 2023

A High-Performance MST Implementation for GPUs.
Proceedings of the International Conference for High Performance Computing, 2023

A GPU Algorithm for Detecting Strongly Connected Components.
Proceedings of the International Conference for High Performance Computing, 2023

2022
Improving the Speed and Quality of Parallel Graph Coloring.
ACM Trans. Parallel Comput., 2022

Parla: A Python Orchestration System for Heterogeneous Architectures.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Compressed In-memory Graphs for Accelerating GPU-based Analytics.
Proceedings of the 12th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, 2022

Reducing Memory-Bus Energy Consumption of GPUs via Software-Based Bit-Flip Minimization.
Proceedings of the IEEE/ACM Workshop on Memory Centric High Performance Computing, 2022

The Indigo Program-Verification Microbenchmark Suite of Irregular Parallel Code Patterns.
Proceedings of the International IEEE Symposium on Performance Analysis of Systems and Software, 2022

A Simple, Fast, and GPU-friendly Steiner-Tree Heuristic.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

2021
The Use of Pulse Oximetry in the Assessment of Acclimatization to High Altitude.
Sensors, 2021

Discovering and balancing fundamental cycles in large signed graphs.
Proceedings of the International Conference for High Performance Computing, 2021

BiPart: a parallel and deterministic hypergraph partitioner.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

2020
BiPart: A Parallel and Deterministic Multilevel Hypergraph Partitioner.
CoRR, 2020

Increasing the parallelism of graph coloring via shortcutting.
Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

2019
A Tool for Automatically Suggesting Source-Code Optimizations for Complex GPU Kernels.
CoRR, 2019

SPRoute: A Scalable Parallel Negotiation-based Global Router.
Proceedings of the International Conference on Computer-Aided Design, 2019

DiffTrace: Efficient Whole-Program Trace Analysis and Diffing for Debugging.
Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

2018
A High-Quality and Fast Maximal Independent Set Implementation for GPUs.
ACM Trans. Parallel Comput., 2018

ParLoT: Efficient Whole-Program Call Tracing for HPC Applications.
Proceedings of the Programming and Performance Visualization Tools, 2018

Peachy Parallel Assignments (EduHPC 2018).
Proceedings of the 2018 IEEE/ACM Workshop on Education for High-Performance Computing, 2018

A high-performance connected components implementation for GPUs.
Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, 2018

SPDP: An Automatically Synthesized Lossless Compression Algorithm for Floating-Point Data.
Proceedings of the 2018 Data Compression Conference, 2018

Automatic Hierarchical Parallelization of Linear Recurrences.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

2016
Geometric Representations of the n-anacci Constants and Generalizations Thereof.
J. Integer Seq., 2016

Real-time synthesis of compression algorithms for scientific data.
Proceedings of the International Conference for High Performance Computing, 2016

Higher-order and tuple-based massively-parallel prefix sums.
Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2016

Parallel Graph Partitioning on a CPU-GPU Architecture.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Energy, Power, and Performance Characterization of GPGPU Benchmark Programs.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

2015
Analytic Representations of the n-anacci Constants and Generalizations Thereof.
J. Integer Seq., 2015

A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries.
Comput. Vis. Image Underst., 2015

A Module-based Approach to Adopting the 2013 ACM Curricular Recommendations on Parallel Computing.
Proceedings of the 46th ACM Technical Symposium on Computer Science Education, 2015

Rethinking the parallelization of random-restart hill climbing: a case study in optimizing a 2-opt TSP solver for GPU execution.
Proceedings of the 8th Workshop on General Purpose Processing using GPUs, 2015

Effects of source-code optimizations on GPU performance and energy consumption.
Proceedings of the 8th Workshop on General Purpose Processing using GPUs, 2015

Quantifying Benefits of Lossless Compression Utilities on Modern Smartphones.
Proceedings of the 24th International Conference on Computer Communication and Networks, 2015

Maximizing Hardware Prefetch Effectiveness with Machine Learning.
Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

MPC: A Massively Parallel Compression Algorithm for Scientific Data.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014
Using Branch Predictors and Variable Encoding for On-the-Fly Program Tracing.
IEEE Trans. Computers, 2014

The future of accelerator programming: abstraction, performance or can we have both?
Proceedings of the Symposium on Applied Computing, 2014

Performance and Energy Modeling for Cooperative Hybrid Computing.
Proceedings of the 9th IEEE International Conference on Networking, 2014

Microarchitectural performance characterization of irregular GPU kernels.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

PEACH: a model for performance and energy aware cooperative hybrid computing.
Proceedings of the Computing Frontiers Conference, CF'14, 2014

Measuring GPU Power with the K20 Built-in Sensor.
Proceedings of the Seventh Workshop on General Purpose Processing Using GPUs, 2014

Extended Large Scale Sketch-Based 3D Shape Retrieval.
Proceedings of the 7th Eurographics Workshop on 3D Object Retrieval, 2014

2013
Morph algorithms on GPUs.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013

Performance and Energy Consumption of Lossless Compression/Decompression Utilities on Mobile Computing Platforms.
Proceedings of the 2013 IEEE 21st International Symposium on Modelling, 2013

Energy efficiency of lossless data compression on a mobile device: An experimental evaluation.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

Data-Driven Versus Topology-driven Irregular Computations on GPUs.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

A Scalable Heterogeneous Parallelization Framework for Iterative Local Searches.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Evaluating the performance and energy efficiency of n-body codes on multi-core CPUs and GPUs.
Proceedings of the IEEE 32nd International Performance Computing and Communications Conference, 2013

Effects of Dynamic Voltage and Frequency Scaling on a K20 GPU.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

Atomic-free irregular computations on GPUs.
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, 2013

2012
Efficient Runtime Detection and Toleration of Asymmetric Races.
IEEE Trans. Computers, 2012

A GPU implementation of inclusion-based points-to analysis.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

A quantitative study of irregular programs on GPUs.
Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

Hardware support for enforcing isolation in lock-based parallel programs.
Proceedings of the International Conference on Supercomputing, 2012

2011
Caches and Predictors for Real-Time, Unobtrusive, and Cost-Effective Program Tracing in Embedded Systems.
IEEE Trans. Computers, 2011

Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms.
Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011

The tao of parallelism in algorithms.
Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, 2011

Evaluation and optimization of multicore performance bottlenecks in supercomputing applications.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011

Floating-point data compression at 75 Gb/s on a GPU.
Proceedings of 4th Workshop on General Purpose Processing on Graphics Processing Units, 2011

2010
JSZap: Compressing JavaScript Code.
Proceedings of the USENIX Conference on Web Application Development, 2010

PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications.
Proceedings of the Conference on High Performance Computing Networking, 2010

Structure-driven optimizations for amorphous data-parallel programs.
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Parallel Graph Partitioning on Multicore Architectures.
Proceedings of the Languages and Compilers for Parallel Computing, 2010

gFPC: A Self-Tuning Compression Algorithm.
Proceedings of the 2010 Data Compression Conference (DCC 2010), 2010

Real-time unobtrusive program execution trace compression using branch predictor events.
Proceedings of the 2010 International Conference on Compilers, 2010

Ordered and unordered algorithms for parallel breadth first search.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009
FPC: A High-Speed Compressor for Double-Precision Floating-Point Data.
IEEE Trans. Computers, 2009

Real-Time Message Compression in Software.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2009

Detecting and tolerating asymmetric races.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

How much parallelism is there in irregular applications?
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Lonestar: A suite of parallel irregular programs.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

Real-time, unobtrusive, and efficient program execution tracing with stream caches and last stream predictors.
Proceedings of the 27th International Conference on Computer Design, 2009

pFPC: A Parallel Compressor for Floating-Point Data.
Proceedings of the 2009 Data Compression Conference (DCC 2009), 2009

2008
On the Scalability of an Automatically Parallelized Irregular Application.
Proceedings of the Languages and Compilers for Parallel Computing, 2008

Program Phase Detection based on Critical Basic Block Transitions.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2008

On the Role of a Nonlinear Stress-Strain Relation in Brain Trauma.
Proceedings of the International Conference on Bioinformatics & Computational Biology, 2008

2007
Computational Modeling of Brain Dynamics during Repetitive Head Motions.
Proceedings of the 2007 International Conference on Modeling, 2007

Algorithms and Hardware Structures for Unobtrusive Real-Time Compression of Instruction and Data Address Traces.
Proceedings of the 2007 Data Compression Conference (DCC 2007), 2007

High Throughput Compression of Double-Precision Floating-Point Data.
Proceedings of the 2007 Data Compression Conference (DCC 2007), 2007

2006
Future execution: A prefetching mechanism that uses multiple cores to speed up single threads.
ACM Trans. Archit. Code Optim., 2006

TCgen 2.0: a tool to automatically generate lossless trace compressors.
SIGARCH Comput. Archit. News, 2006

Computational Simulation and Visualization of Traumatic Brain Injuries.
Proceedings of the 2006 International Conference on Modeling, 2006

Load Instruction Characterization and Acceleration of the BioPerf Programs.
Proceedings of the 2006 IEEE International Symposium on Workload Characterization, 2006

Fast Lossless Compression of Scientific Floating-Point Data.
Proceedings of the 2006 Data Compression Conference (DCC 2006), 2006

Efficient emulation of hardware prefetchers via event-driven helper threading.
Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006

2005
The VPC Trace-Compression Algorithms.
IEEE Trans. Computers, 2005

Improving memory system performance with energy-efficient value speculation.
SIGARCH Comput. Archit. News, 2005

Bridging the Processor-Memory Performance Gapwith 3D IC Technology.
IEEE Des. Test Comput., 2005

Reducing Communication Time through Message Prefetching.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2005

Numerical Modeling of Brain Dynamics in Traumatic Situations - Impulsive Translations.
Proceedings of The 2005 International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences, 2005

Tolerating Message Latency Through the Early Release of Blocked Receives.
Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

Automatic Generation of High-Performance Trace Compressors.
Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005

On the energy-efficiency of speculative hardware.
Proceedings of the Second Conference on Computing Frontiers, 2005

Future Execution: A Hardware Prefetching Technique for Chip Multiprocessors.
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005

On the importance of optimizing the configuration of stream prefetchers.
Proceedings of the 2005 workshop on Memory System Performance, 2005

2004
VPC3: a fast and effective trace-compression algorithm.
Proceedings of the International Conference on Measurements and Modeling of Computer Systems, 2004

Runtime Compression of MPI Messanes to Improve the Performance and Scalability of Parallel Applications.
Proceedings of the ACM/IEEE SC2004 Conference on High Performance Networking and Computing, 2004

Automatic Synthesis of High-Speed Processor Simulators.
Proceedings of the 37th Annual International Symposium on Microarchitecture (MICRO-37 2004), 2004

2003
Compressing Extended Program Traces Using Value Predictors.
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT 2003), 27 September, 2003

2002
Hybrid Load-Value Predictors.
IEEE Trans. Computers, 2002

An improved index function for (D)FCM predictors.
SIGARCH Comput. Archit. News, 2002

Static Load Classification for Improving the Value Predictability of Data-Cache Misses.
Proceedings of the 2002 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2002

Delphi: Predition-based Page Prefetching to Improve the Performance of Shared Virtual Memory Systems.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2002

2000
Hybridizing and Coalescing Load Value Predictors.
Proceedings of the IEEE International Conference On Computer Design: VLSI In Computers & Processors, 2000

1999
Prediction Outcome History-Based Confidence Estimation for Load Value Prediction.
J. Instr. Level Parallelism, 1999

Exploring Last n Value Prediction.
Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, 1999


  Loading...