Magnus Jahre

S. Omid Fatemi

Proceedings of the 8th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, 2017

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference.

[BibT_eX]

[DOI]

Philip Heng Wai Leong

Kees A. Vissers

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

Towards Efficient Design Space Exploration of FPGA-based Accelerators for Streaming HPC Applications (Abstract Only).

[BibT_eX]

[DOI]

Mostafa Koraei

S. Omid Fatemi

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

Towards efficient quantized neural network inference on mobile devices: work-in-progress.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Conference on Compilers, 2017

2016

Random access schemes for efficient FPGA SpMV acceleration.

[BibT_eX]

[DOI]

Microprocess. Microsystems, 2016

TULIPP: Towards ubiquitous low-power image processing platforms.

[BibT_eX]

[DOI]

Per Gunnar Kjeldsberg

Proceedings of the International Conference on Embedded Computer Systems: Architectures, 2016

Efficient control flow restructuring for GPUs.

[BibT_eX]

[DOI]

Nico Reissmann

Thomas L. Falch

Benjamin A. Bjørnseth

Helge Bahmann

Jan Christian Meyer

Proceedings of the International Conference on High Performance Computing & Simulation, 2016

2015

Tuning the victim selection policy of Intel TBB.

[BibT_eX]

[DOI]

J. Syst. Archit., 2015

ParVec: vectorizing the PARSEC benchmark suite.

[BibT_eX]

[DOI]

Juan M. Cebrian

Computing, 2015

Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform.

[BibT_eX]

[DOI]

Donn Morrison

Proceedings of the 25th International Conference on Field Programmable Logic and Applications, 2015

A Vector Caching Scheme for Streaming FPGA SpMV Accelerators.

[BibT_eX]

[DOI]

Proceedings of the Applied Reconfigurable Computing - 11th International Symposium, 2015

2014

Perfect Reconstructability of Control Flow from Demand Dependence Graphs.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2014

Patterned Heterogeneous CMPs: The Case for Regularity-Driven System-Level Synthesis.

[BibT_eX]

[DOI]

Nikita Nikitin

Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2014

Optimized hardware for suboptimal software: The case for SIMD-aware benchmarks.

[BibT_eX]

[DOI]

Juan M. Cebrian

Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, 2014

A Study of Energy and Locality Effects Using Space-Filling Curves.

[BibT_eX]

[DOI]

Nico Reissmann

Jan Christian Meyer

Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

An energy efficient column-major backend for FPGA SpMV accelerators.

[BibT_eX]

[DOI]

Proceedings of the 32nd IEEE International Conference on Computer Design, 2014

Victim Selection Policies for Intel TBB: Overheads and Energy Footprint.

[BibT_eX]

[DOI]

Proceedings of the Architecture of Computing Systems - ARCS 2014, 2014

Graph-based performance accounting for chip multiprocessor memory systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013

On the energy footprint of task based parallel applications.

[BibT_eX]

[DOI]

Proceedings of the International Conference on High Performance Computing & Simulation, 2013

Challenges of Reducing Cycle-Accurate Simulation Time for TBP Applications.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2013

2011

A High Performance Adaptive Miss Handling Architecture for Chip Multiprocessors.

[BibT_eX]

[DOI]

Trans. High Perform. Embed. Archit. Compil., 2011

Storage Efficient Hardware Prefetching using Delta-Correlating Prediction Tables.

[BibT_eX]

[DOI]

J. Instr. Level Parallelism, 2011

Exploring the Prefetcher/Memory Controller Design Space: An Opportunistic Prefetch Scheduling Strategy.

[BibT_eX]

[DOI]

Proceedings of the Architecture of Computing Systems - ARCS 2011, 2011

2010

Computational Computer Architecture Research at NTNU.

[BibT_eX]

[DOI]

ERCIM News, 2010

DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems.

[BibT_eX]

[DOI]

Proceedings of the High Performance Embedded Architectures and Compilers, 2010

Multi-level Hardware Prefetching Using Low Complexity Delta Correlating Prediction Tables with Partial Matching.

[BibT_eX]

[DOI]

Proceedings of the High Performance Embedded Architectures and Compilers, 2010

2009

Experimental Validation of the Learning Effect for a Pedagogical Game on Computer Fundamentals.

[BibT_eX]

[DOI]

Guttorm Sindre

IEEE Trans. Educ., 2009

A Quantitative Study of Memory System Interference in Chip Multiprocessor Architectures.

[BibT_eX]

[DOI]

Proceedings of the 11th IEEE International Conference on High Performance Computing and Communications, 2009

A light-weight fairness mechanism for chip multiprocessor memory systems.

[BibT_eX]

[DOI]

Proceedings of the 6th Conference on Computing Frontiers, 2009

2008

Low-cost open-page prefetch scheduling in chip multiprocessors.

[BibT_eX]

[DOI]