Alan Tendler Leibel Bacellar

Neurocomputing, 2024

LogicNets vs. ULEEN : Comparing two novel high throughput edge ML inference techniques on FPGA.

[DOI]

Proceedings of the 67th IEEE International Midwest Symposium on Circuits and Systems, 2024

Soon Filter: Advancing Tiny Neural Architectures for High Throughput Edge Inference.

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2024

Differentiable Weightless Neural Networks.

[DOI]

Priscila Machado Vieira Lima

Eugene John

Lizy Kurian John

Rafael Fontella Katopodis

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

ULEEN: A Novel Architecture for Ultra-low-energy Edge Neural Networks.

[DOI]

ACM Trans. Archit. Code Optim., December, 2023

A conditional branch predictor based on weightless neural networks.

[DOI]

Neurocomputing, October, 2023

Dendrite-inspired Computing to Improve Resilience of Neural Networks to Faults in Emerging Memory Technologies.

[DOI]

Proceedings of the IEEE International Conference on Rebooting Computing, 2023

An FPGA-Based Weightless Neural Network for Edge Network Intrusion Detection.

[DOI]

Aman Arora

Alan T. L. Bacellar

Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2023

Efficient Knowledge Aggregation Methods for Weightless Neural Networks.

[DOI]

Otávio Oliveira Napoli

Ana Maria de Almeida

José Miguel Sales Dias

Luís Brás Rosário

Proceedings of the 31st European Symposium on Artificial Neural Networks, 2023

COIN: Combinational Intelligent Networks.

[DOI]

Proceedings of the 34th IEEE International Conference on Application-specific Systems, 2023

2022

A WiSARD-based conditional branch predictor.

[DOI]

Lizy K. John

Proceedings of the 30th European Symposium on Artificial Neural Networks, 2022

Pruning Weightless Neural Networks.

[DOI]

Igor D. S. Miranda

Lizy K. John

Proceedings of the 30th European Symposium on Artificial Neural Networks, 2022

Distributive Thermometer: A New Unary Encoding for Weightless Neural Networks.

[DOI]

Lizy K. John

Proceedings of the 30th European Symposium on Artificial Neural Networks, 2022

LogicWiSARD: Memoryless Synthesis of Weightless Neural Networks.

[DOI]

Igor D. S. Miranda

Aman Arora

Luis Armando Quintanilla Villon

Rafael Fontella Katopodis

Proceedings of the 33rd IEEE International Conference on Application-specific Systems, 2022

Weightless Neural Networks for Efficient Edge Inference.

[DOI]

Luis Armando Quintanilla Villon

Aman Arora

Igor D. S. Miranda

Rafael Fontella Katopodis

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021

Efficiency and scalability of multi-lane capsule networks (MLCN).

[DOI]

J. Parallel Distributed Comput., 2021

Smart selection of optimizations in dynamic compilers.

[DOI]

Anderson Faustino da Silva

Thais Aparecida Silva Camacho

Otávio Oliveira Napoli

Fabrício Firmino de Faria

Concurr. Comput. Pract. Exp., 2021

2020

Weightless Neural Networks as Memory Segmented Bloom Filters.

[DOI]

Leandro Santiago

Letícia Dias Verona

Fábio Medeiros Rangel

Daniel S. Menasché

Wouter Caarls

Sandip Kundu

Daniel Carlos Guimarães Pedronette

Neurocomputing, 2020

A unified model for accelerating unsupervised iterative re-ranking algorithms.

[DOI]

Flávia Pisani

Lucas Pascotti Valem

Ricardo da Silva Torres

Concurr. Comput. Pract. Exp., 2020

2019

The Multi-Lane Capsule Network.

[DOI]

IEEE Signal Process. Lett., 2019

The Multi-Lane Capsule Network (MLCN).

[DOI]

CoRR, 2019

Memory Efficient Weightless Neural Network using Bloom Filter.

[DOI]

Fabrício Firmino de Faria

Letícia Dias Verona

Fábio Medeiros Rangel

Daniel Sadoc Menasché

Wouter Caarls

Priscila Machado Vieira Lima

Sandip Kundu

Felipe Maia Galvão França

Proceedings of the 27th European Symposium on Artificial Neural Networks, 2019

2018

ComP-net: command processor networking for efficient intra-kernel communications on GPUs.

[DOI]

Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017

GPU triggered networking for intra-kernel communications.

[DOI]

Proceedings of the International Conference for High Performance Computing, 2017

2016

HadoopCL2: Motivating the Design of a Distributed, Heterogeneous Programming System With Machine-Learning Applications.

[DOI]

Max Grossman

Vivek Sarkar

IEEE Trans. Parallel Distributed Syst., 2016

Extended task queuing: active messages for heterogeneous systems.

[DOI]

Proceedings of the International Conference for High Performance Computing, 2016

PY-PITS: A Scalable Python Runtime System for the Computation of Partially Idempotent Tasks.

[DOI]

Proceedings of the 2016 International Symposium on Computer Architecture and High Performance Computing Workshops, 2016

2014

Adaptive global power optimization for Web servers.

[DOI]

Leonardo Piga

Reinaldo A. Bergamaschi

Sandro Rigo

J. Supercomput., 2014

Microcode Compression Using Structured-Constrained Clustering.

[DOI]

Guido Araujo

Daniel Carlos Guimarães Pedronette

Int. J. Parallel Program., 2014

Implementation and evaluation of deep neural networks (DNN) on mainstream heterogeneous systems.

[DOI]

Proceedings of the Asia-Pacific Workshop on Systems, 2014

2013

Image Re-ranking Acceleration on GPUs.

[DOI]

Ricardo da Silva Torres

Proceedings of the 25th International Symposium on Computer Architecture and High Performance Computing, 2013

HadoopCL: MapReduce on Distributed Heterogeneous Platforms through Seamless Integration of Hadoop and OpenCL.

[DOI]

Max Grossman

Daniel Carlos Guimarães Pedronette

Vivek Sarkar

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

2012

Cloud Workload Analysis with SWAT.

[DOI]

Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

Efficient Image Re-Ranking Computation on GPUs.

[DOI]

Ricardo da Silva Torres

Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, 2012

2011

Structure-Constrained Microcode Compression.

[DOI]

Guido Araujo

Proceedings of the 23rd International Symposium on Computer Architecture and High Performance Computing, 2011

LAR-CC: Large atomic regions with conditional commits.

[DOI]

Cheng Wang

Proceedings of the CGO 2011, 2011

2010

TAO: two-level atomicity for dynamic binary optimizations.

[DOI]

Proceedings of the CGO 2010, 2010

2008

A Segmented Bloom Filter Algorithm for Efficient Predictors.

[DOI]

Proceedings of the 20th International Symposium on Computer Architecture and High Performance Computing, 2008

2007

Impacts of Multiprocessor Configurations on Workloads in Bioinformatics.

[DOI]

Victor Ying

Proceedings of the 19th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2007), 2007

StarDBT: An Efficient Multi-platform Dynamic Binary Translation System.

[DOI]

Zhiwei Ying

Proceedings of the Advances in Computer Systems Architecture, 2007

2006

Clustering-Based Microcode Compression.

[DOI]

Guido Araujo

Proceedings of the 24th International Conference on Computer Design (ICCD 2006), 2006

2005

Enhanced code density of embedded CISC processors with echo technology.

[DOI]

Herbert H. J. Hum

Ramesh V. Peri

Jay Pickett

Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2005

2004

The Accuracy of Initial Prediction in Two-Phase Dynamic Binary Translators.

[DOI]

Justin Quek

Orna Etzion

Jesse Fang

Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004

Continuous Trip Count Profiling for Loop Optimizations in Two-Phase Dynamic Binary Translato.

[DOI]

Tevi Devor

Proceedings of the 8th Annual Workshop on Interaction between Compilers and Computer Architecture (INTERACT-8 2004), 2004

2003

Compilation, Architectural Support, and Evaluation of SIMD Graphics Pipeline Programs on a General-Purpose CPU.

[DOI]

Herbert H. J. Hum

Sanjeev Kumar

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT 2003), 27 September, 2003

1997

Enhanced Compression Techniques to Simplify Programm Decompression and Execution.

[DOI]

Roger Smith

Proceedings of the Proceedings 1997 International Conference on Computer Design: VLSI in Computers & Processors, 1997

1996

Design Tradeoffs and Experience with Motorola PowerPC? Migration Tool.

[DOI]

Proceedings of the 1996 International Conference on Computer Design (ICCD '96), 1996

Motorola PowerPC Migration Tools - Emulation and Translation.

[DOI]

Tariq Afzal

Proceedings of the Forty-First IEEE Computer Society International Conference: Technologies for the Information Superhighway, 1996

1995

Solutions and debugging for data consistency in multiprocessors with noncoherent caches.

[DOI]

David Bernstein

Ahmed Gheith

Bilha Mendelson

Int. J. Parallel Program., 1995

1994

An Optimal Asynchronous Scheduling Algorithm for Software Cache Consistence.

Barbara Simons

Vivek Sarkar

Michael Lai

Proceedings of the 27th Annual Hawaii International Conference on System Sciences (HICSS-27), 1994

1991

Implementation Optimization Techniques for Architecture Synthesis of Application-Specific Processors.

[DOI]

Proceedings of the 24th Annual IEEE/ACM International Symposium on Microarchitecture, 1991

1990

Architecture Synthesis of High-Performance Application-Specific Processors.

[DOI]

Proceedings of the 27th ACM/IEEE Design Automation Conference. Orlando, 1990

1988

Organization of array data for concurrent memory access.

[DOI]

Proceedings of the 21st Annual Workshop and Symposium on Microprogramming and Microarchitecture, 1988, San Diego, California, USA, November 28, 1988

The White Dwarf: A High-Performance Application-Specific Processor.

[DOI]

Andrew Wolfe

Chriss Stephens

A. L. Ting

David Blair Kirk

Ronald P. Bianchini Jr.