H. Peter Hofstee

Orcid: 0000-0001-9649-7338

Affiliations:
  • TU Delft, The Netherlands
  • IBM Research Austin, TX, USA


According to our database1, H. Peter Hofstee authored at least 88 papers between 1990 and 2024.

Collaborative distances:
  • Dijkstra number2 of two.
  • Erdős number3 of three.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Vanishing Variance Problem in Fully Decentralized Neural-Network Systems.
CoRR, 2024

Leveraging Apache Arrow for Zero-copy, Zero-serialization Cluster Shared Memory.
CoRR, 2024

Tywaves: A Typed Waveform Viewer for Chisel.
Proceedings of the 2024 IEEE Nordic Circuits and Systems Conference (NorCAS), 2024

Learning Structured Sparsity for Efficient Nanopore DNA Basecalling Using Delayed Masking.
Proceedings of the 15th ACM International Conference on Bioinformatics, 2024

2023
An Intermediate Representation for Composable Typed Streaming Dataflow Designs.
Proceedings of the Joint Proceedings of Workshops at the 49th International Conference on Very Large Data Bases (VLDB 2023), Vancouver, Canada, August 28, 2023

Tydi-lang: A Language for Typed Streaming Hardware.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

OctoRay: Framework for Scalable FPGA Cluster Acceleration of Python Big Data Applications.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Tydi-Chisel: Collaborative and Interface-Driven Data-Streaming Accelerators.
Proceedings of the IEEE Nordic Circuits and Systems Conference, 2023

2022
The Future of FPGA Acceleration in Datacenters and the Cloud.
ACM Trans. Reconfigurable Technol. Syst., 2022

A Toolchain for Streaming Dataflow Accelerator Designs for Big Data Analytics: Defining an IR for Composable Typed Streaming Dataflow Designs.
CoRR, 2022

Tydi-lang: a language for typed streaming hardware - A manual for future Tydi-lang compiler developers.
CoRR, 2022

Benchmarking Apache Arrow Flight - A wire-speed protocol for data transfer, querying and microservices.
CoRR, 2022

Communication-Efficient Cluster Scalable Genomics Data Processing Using Apache Arrow Flight.
Proceedings of the 21st International Symposium on Parallel and Distributed Computing, 2022

SALoBa: Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Improving Gradient Paths for Binary Convolutional Neural Networks.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021
Generating High-Performance FPGA Accelerator Designs for Big Data Analytics with Fletcher and Apache Arrow.
J. Signal Process. Syst., 2021

Low Latency and High Throughput Write-Ahead Logging Using CAPI-Flash.
IEEE Trans. Cloud Comput., 2021

AutoReCon: Neural Architecture Search-based Reconstruction for Data-free Compression.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

An Attention Module for Convolutional Neural Networks.
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2021, 2021

2020
An Efficient High-Throughput LZ77-Based Decompressor in Reconfigurable Logic.
J. Signal Process. Syst., 2020

In-memory database acceleration on FPGAs: a survey.
VLDB J., 2020

Tydi: An Open Specification for Complex Data Structures Over Hardware Streams.
IEEE Micro, 2020

SoFAr: Shortcut-based Fractal Architectures for Binary Convolutional Neural Networks.
CoRR, 2020

Optimizing performance of GATK workflows using Apache Arrow In-Memory data framework.
BMC Genom., 2020

REAF: Reducing Approximation of Channels by Reducing Feature Reuse Within Convolution.
IEEE Access, 2020

ThymesisFlow: A Software-Defined, HW/SW co-Designed Interconnect Stack for Rack-Scale Memory Disaggregation.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

NASB: Neural Architecture Search for Binary Convolutional Neural Networks.
Proceedings of the 2020 International Joint Conference on Neural Networks, 2020

Battling the CPU Bottleneck in Apache Parquet to Arrow Conversion Using FPGA.
Proceedings of the International Conference on Field-Programmable Technology, 2020

2019
Video-Text Compliance: Activity Verification Based on Natural Language Instructions.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

Fletcher: A Framework to Efficiently Integrate FPGA Accelerators with Apache Arrow.
Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019

A Fine-Grained Parallel Snappy Decompressor for FPGAs Using a Relaxed Execution Model.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

Refine and Recycle: A Method to Increase Decompression Parallelism.
Proceedings of the 30th IEEE International Conference on Application-specific Systems, 2019

Supporting Columnar In-memory Formats on FPGA: The Hardware Design of Fletcher for Apache Arrow.
Proceedings of the Applied Reconfigurable Computing - 15th International Symposium, 2019

2018
A hardware compilation framework for text analytics queries.
J. Parallel Distributed Comput., 2018

A 64-GB Sort at 28 GB/s on a 4-GPU POWER9 Node for Uniformly-Distributed 16-Byte Records with 8-Byte Keys.
Proceedings of the High Performance Computing, 2018

A high-bandwidth snappy decompressor in reconfigurable logic: work-in-progress.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2018

CAPI-Flash Accelerated Persistent Read Cache for Apache Cassandra.
Proceedings of the 11th IEEE International Conference on Cloud Computing, 2018

2017
ExtraV: Boosting Graph Processing Near Storage with a Coherent Accelerator.
Proc. VLDB Endow., 2017

Analyzing In-Memory Hash Join: Granularity Matters.
Proceedings of the International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures, 2017

SparkGA: A Spark Framework for Cost Effective, Fast and Accurate DNA Analysis at Scale.
Proceedings of the 8th ACM International Conference on Bioinformatics, 2017

2016
PATer: A Hardware Prefetching Automatic Tuner on IBM POWER8 Processor.
IEEE Comput. Archit. Lett., 2016

RAW 2016 Keynotes.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Auto-tuning Spark Big Data Workloads on POWER8: Prediction-Based Dynamic SMT Threading.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

Optimized Durable Commitlog for Apache Cassandra Using CAPI-Flash.
Proceedings of the 9th IEEE International Conference on Cloud Computing, 2016

2015
Feature detection for image analytics via FPGA acceleration.
IBM J. Res. Dev., 2015

Second-Generation Big Data Systems.
Computer, 2015

2014
Giving Text Analytics a Boost.
IEEE Micro, 2014

Hardware-accelerated text analytics.
Proceedings of the 2014 IEEE Hot Chips 26 Symposium (HCS), 2014

2013
True hardware random number generation implemented in the 32-nm SOI POWER7+ processor.
IBM J. Res. Dev., 2013

Understanding system design for Big Data workloads.
IBM J. Res. Dev., 2013

Big Data text-oriented benchmark creation for Hadoop.
IBM J. Res. Dev., 2013

2011
Cell Broadband Engine Processor.
Proceedings of the Encyclopedia of Parallel Computing, 2011

2009
Heterogeneous Multi-core Processors: The Cell Broadband Engine.
Proceedings of the Multicore Processors and Systems, 2009

The Next 25 Years of Computer Architecture?
Proceedings of the Euro-Par 2009, 2009

HPPC 2009 Panel: Are Many-Core Computer Vendors on Track?
Proceedings of the Euro-Par 2009, 2009

2008
Rome Reborn.
Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2008

2007
Cell Broadband Engine processor vault security architecture.
IBM J. Res. Dev., 2007

Preface.
IBM J. Res. Dev., 2007

Microarchitecture and implementation of the synergistic processor in 65-nm and 90-nm SOI.
IBM J. Res. Dev., 2007

The future of multi-core technologies.
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

Cell Broadband Engine Processor Design Methodology.
Proceedings of the IEEE 2007 Custom Integrated Circuits Conference, 2007

2006
Synergistic Processing in Cell's Multicore Architecture.
IEEE Micro, 2006

Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor.
IEEE J. Solid State Circuits, 2006

The microarchitecture of the synergistic processor for a cell processor.
IEEE J. Solid State Circuits, 2006

Invited speakers II - Real-time supercomputing and technology for games and entertainment.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Key features of the design methodology enabling a multi-core SoC implementation of a first-generation CELL processor.
Proceedings of the 2006 Conference on Asia South Pacific Design Automation: ASP-DAC 2006, 2006

2005
Introduction to the Cell multiprocessor.
IBM J. Res. Dev., 2005

Communication and Synchronization in the Cell Processor - Invited Talk.
Proceedings of the 28th Communicating Process Architectures Conference, 2005

Power Efficient Processor Architecture and The Cell Processor.
Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

Hardware and software architectures for the CELL processor.
Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2005

The design methodology and implementation of a first-generation CELL processor: a multi-core SoC.
Proceedings of the IEEE 2005 Custom Integrated Circuits Conference, 2005

2002
Power-Constrained Microprocessor Design.
Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002

2001
Timed circuit verification using TEL structures.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2001

Derivation of a rotator circuit with homogeneous interconnect.
Inf. Process. Lett., 2001

2000
Custom circuit design as a driver of microprocessor performance.
IBM J. Res. Dev., 2000

"Timing closure by design, " a high frequency microprocessor design methodology.
Proceedings of the 37th Conference on Design Automation, 2000

1999
Beyond 1 GHz.
Proceedings of the IEEE 1999 Custom Integrated Circuits Conference, 1999

Verification of Delayed-Reset Domino Circuits Using ATACS.
Proceedings of the 5th International Symposium on Advanced Research in Asynchronous Circuits and Systems (ASYNC '99), 1999

1998
Designing for a gigahertz [guTS integer processor].
IEEE Micro, 1998

A 1.0-GHz single-issue 64-bit powerPC integer processor.
IEEE J. Solid State Circuits, 1998

High-Speed Serializing/De-Serializing Design-For-Test Method for Evaluating a 1 GHz Microprocessor.
Proceedings of the 16th IEEE VLSI Test Symposium (VTS '98), 28 April, 1998

A 690 ps read-access latency register file for a GHz integer microprocessor.
Proceedings of the International Conference on Computer Design: VLSI in Computers and Processors, 1998

Design methodology for a 1.0 GHz microprocessor.
Proceedings of the International Conference on Computer Design: VLSI in Computers and Processors, 1998

1997
Circuits and Microarchitecture for Gigahertz VLSI Designs.
Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97), 1997

1995
Synchronizing processes.
PhD thesis, 1995

1994
Distributing a Class of Sequential Programs.
Sci. Comput. Program., 1994

1991
A Distributed Implementation of a Task Pool.
Proceedings of the Research Directions in High-Level Parallel Programming Languages, 1991

1990
Distributed Sorting.
Sci. Comput. Program., 1990


  Loading...