2024
Memory-efficient DRASiW Models.
Neurocomputing, 2024
LogicNets vs. ULEEN : Comparing two novel high throughput edge ML inference techniques on FPGA.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 67th IEEE International Midwest Symposium on Circuits and Systems, 2024
Soon Filter: Advancing Tiny Neural Architectures for High Throughput Edge Inference.
Proceedings of the International Joint Conference on Neural Networks, 2024
Differentiable Weightless Neural Networks.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
2023
ULEEN: A Novel Architecture for Ultra-low-energy Edge Neural Networks.
,
,
,
,
,
,
,
,
,
,
,
ACM Trans. Archit. Code Optim., December, 2023
A conditional branch predictor based on weightless neural networks.
Neurocomputing, October, 2023
Dendrite-inspired Computing to Improve Resilience of Neural Networks to Faults in Emerging Memory Technologies.
Proceedings of the IEEE International Conference on Rebooting Computing, 2023
An FPGA-Based Weightless Neural Network for Edge Network Intrusion Detection.
Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2023
Efficient Knowledge Aggregation Methods for Weightless Neural Networks.
Proceedings of the 31st European Symposium on Artificial Neural Networks, 2023
COIN: Combinational Intelligent Networks.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 34th IEEE International Conference on Application-specific Systems, 2023
2022
A WiSARD-based conditional branch predictor.
Proceedings of the 30th European Symposium on Artificial Neural Networks, 2022
Pruning Weightless Neural Networks.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 30th European Symposium on Artificial Neural Networks, 2022
Distributive Thermometer: A New Unary Encoding for Weightless Neural Networks.
Proceedings of the 30th European Symposium on Artificial Neural Networks, 2022
LogicWiSARD: Memoryless Synthesis of Weightless Neural Networks.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 33rd IEEE International Conference on Application-specific Systems, 2022
Weightless Neural Networks for Efficient Edge Inference.
,
,
,
,
,
,
,
,
,
,
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022
2021
Efficiency and scalability of multi-lane capsule networks (MLCN).
J. Parallel Distributed Comput., 2021
Smart selection of optimizations in dynamic compilers.
Concurr. Comput. Pract. Exp., 2021
2020
Weightless Neural Networks as Memory Segmented Bloom Filters.
Neurocomputing, 2020
A unified model for accelerating unsupervised iterative re-ranking algorithms.
Concurr. Comput. Pract. Exp., 2020
2019
The Multi-Lane Capsule Network.
IEEE Signal Process. Lett., 2019
The Multi-Lane Capsule Network (MLCN).
CoRR, 2019
Memory Efficient Weightless Neural Network using Bloom Filter.
Proceedings of the 27th European Symposium on Artificial Neural Networks, 2019
2018
ComP-net: command processor networking for efficient intra-kernel communications on GPUs.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018
2017
GPU triggered networking for intra-kernel communications.
Proceedings of the International Conference for High Performance Computing, 2017
2016
HadoopCL2: Motivating the Design of a Distributed, Heterogeneous Programming System With Machine-Learning Applications.
IEEE Trans. Parallel Distributed Syst., 2016
Extended task queuing: active messages for heterogeneous systems.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the International Conference for High Performance Computing, 2016
PY-PITS: A Scalable Python Runtime System for the Computation of Partially Idempotent Tasks.
Proceedings of the 2016 International Symposium on Computer Architecture and High Performance Computing Workshops, 2016
2014
Adaptive global power optimization for Web servers.
J. Supercomput., 2014
Microcode Compression Using Structured-Constrained Clustering.
Int. J. Parallel Program., 2014
Implementation and evaluation of deep neural networks (DNN) on mainstream heterogeneous systems.
Proceedings of the Asia-Pacific Workshop on Systems, 2014
2013
Image Re-ranking Acceleration on GPUs.
Proceedings of the 25th International Symposium on Computer Architecture and High Performance Computing, 2013
HadoopCL: MapReduce on Distributed Heterogeneous Platforms through Seamless Integration of Hadoop and OpenCL.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013
2012
Cloud Workload Analysis with SWAT.
Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012
Efficient Image Re-Ranking Computation on GPUs.
Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, 2012
2011
Structure-Constrained Microcode Compression.
Proceedings of the 23rd International Symposium on Computer Architecture and High Performance Computing, 2011
LAR-CC: Large atomic regions with conditional commits.
Proceedings of the CGO 2011, 2011
2010
TAO: two-level atomicity for dynamic binary optimizations.
Proceedings of the CGO 2010, 2010
2008
A Segmented Bloom Filter Algorithm for Efficient Predictors.
Proceedings of the 20th International Symposium on Computer Architecture and High Performance Computing, 2008
2007
Impacts of Multiprocessor Configurations on Workloads in Bioinformatics.
Proceedings of the 19th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2007), 2007
StarDBT: An Efficient Multi-platform Dynamic Binary Translation System.
Proceedings of the Advances in Computer Systems Architecture, 2007
2006
Clustering-Based Microcode Compression.
Proceedings of the 24th International Conference on Computer Design (ICCD 2006), 2006
2005
Enhanced code density of embedded CISC processors with echo technology.
Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2005
2004
The Accuracy of Initial Prediction in Two-Phase Dynamic Binary Translators.
Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004
Continuous Trip Count Profiling for Loop Optimizations in Two-Phase Dynamic Binary Translato.
Proceedings of the 8th Annual Workshop on Interaction between Compilers and Computer Architecture (INTERACT-8 2004), 2004
2003
Compilation, Architectural Support, and Evaluation of SIMD Graphics Pipeline Programs on a General-Purpose CPU.
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT 2003), 27 September, 2003
1997
Enhanced Compression Techniques to Simplify Programm Decompression and Execution.
Proceedings of the Proceedings 1997 International Conference on Computer Design: VLSI in Computers & Processors, 1997
1996
Design Tradeoffs and Experience with Motorola PowerPC? Migration Tool.
Proceedings of the 1996 International Conference on Computer Design (ICCD '96), 1996
Motorola PowerPC Migration Tools - Emulation and Translation.
Proceedings of the Forty-First IEEE Computer Society International Conference: Technologies for the Information Superhighway, 1996
1995
Solutions and debugging for data consistency in multiprocessors with noncoherent caches.
Int. J. Parallel Program., 1995
1994
An Optimal Asynchronous Scheduling Algorithm for Software Cache Consistence.
Proceedings of the 27th Annual Hawaii International Conference on System Sciences (HICSS-27), 1994
1991
Implementation Optimization Techniques for Architecture Synthesis of Application-Specific Processors.
Proceedings of the 24th Annual IEEE/ACM International Symposium on Microarchitecture, 1991
1990
Architecture Synthesis of High-Performance Application-Specific Processors.
Proceedings of the 27th ACM/IEEE Design Automation Conference. Orlando, 1990
1988
Organization of array data for concurrent memory access.
Proceedings of the 21st Annual Workshop and Symposium on Microprogramming and Microarchitecture, 1988, San Diego, California, USA, November 28, 1988
The White Dwarf: A High-Performance Application-Specific Processor.
Proceedings of the 15th Annual International Symposium on Computer Architecture, 1988