David R. Kaeli

Orcid: 0000-0002-5692-0151

Affiliations:
  • Northeastern University, Boston, USA


According to our database1, David R. Kaeli authored at least 277 papers between 1989 and 2024.

Collaborative distances:

Awards

ACM Fellow

ACM Fellow 2021, "For contributions to computer architecture and compilers".

IEEE Fellow

IEEE Fellow 2010, "For contributions to profile-guided optimization algorithms and dynamic branch prediction designs".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
ASPLOS 2024 Artifact for "MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training".
Dataset, February, 2024

ASPLOS 2024 Artifact for "MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training".
Dataset, February, 2024

Scalability Limitations of Processing-in-Memory using Real System Evaluations.
Proc. ACM Meas. Anal. Comput. Syst., 2024

Data Transfer Optimizations for Host-CPU and Accelerators in AXI4MLIR.
CoRR, 2024

NeuraChip: Accelerating GNN Computations with a Hash-based Decoupled Spatial Accelerator.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

DEFCON: Deformable Convolutions Leveraging Interval Search and GPU Texture Hardware.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Digital Avatars: Framework Development and Their Evaluation.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Energy-Aware Tile Size Selection for Affine Programs on GPUs.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2024

AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based Accelerators.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2024

MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
SECDA-TFLite: A toolkit for efficient development of FPGA-based DNN accelerators for edge inference.
J. Parallel Distributed Comput., March, 2023

Accelerating Finite Field Arithmetic for Homomorphic Encryption on GPUs.
IEEE Micro, 2023

MaxK-GNN: Towards Theoretical Speed Limits for Accelerating Graph Neural Networks Training.
CoRR, 2023

Memory Efficient Multithreaded Incremental Segmented Sieve Algorithm.
CoRR, 2023

GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Thought Bubbles: A Proxy into Players' Mental Model Development.
Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 2023

2022
VCSR: An Efficient GPU Memory-Aware Sparse Format.
IEEE Trans. Parallel Distributed Syst., 2022

Characterizing and Exploiting Soft Error Vulnerability Phase Behavior in GPU Applications.
IEEE Trans. Dependable Secur. Comput., 2022

Accelerating Polynomial Multiplication for Homomorphic Encryption on GPUs.
Proceedings of the 2022 IEEE International Symposium on Secure and Private Execution Environment Design (SEED), 2022

An MLIR-based Compiler Flow for System-Level Design and Hardware Acceleration.
Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, 2022

To Trust or to Stockpile: Modeling Human-Simulation Interaction in Supply Chain Shortages.
Proceedings of the CHI '22: CHI Conference on Human Factors in Computing Systems, New Orleans, LA, USA, 29 April 2022, 2022

SODA-OPT an MLIR based flow for co-design and high-level synthesis.
Proceedings of the CF '22: 19th ACM International Conference on Computing Frontiers, Turin, Italy, May 17, 2022

NaviSim: A Highly Accurate GPU Simulator for AMD RDNA GPUs.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021
Spartan: A Sparsity-Adaptive Framework to Accelerate Deep Neural Network Training on GPUs.
IEEE Trans. Parallel Distributed Syst., 2021

Daisen: A Framework for Visualizing Detailed GPU Execution.
Comput. Graph. Forum, 2021

Performance Evaluation and Improvement of Real-Time Computer Vision Applications for Edge Computing Devices.
Proceedings of the ICPE '21: ACM/SPEC International Conference on Performance Engineering, 2021

JAXED: Reverse Engineering DNN Architectures Leveraging JIT GEMM Libraries.
Proceedings of the 2021 International Symposium on Secure and Private Execution Environment Design (SEED), 2021

SECDA: Efficient Hardware/Software Co-Design of FPGA-based DNN Accelerators for Edge Inference.
Proceedings of the 33rd IEEE International Symposium on Computer Architecture and High Performance Computing, 2021

CALC: A Content-Aware Learning Cache for Storage Systems.
Proceedings of the IEEE International Conference on Networking, Architecture and Storage, 2021

GNNMark: A Benchmark Suite to Characterize Graph Neural Network Training on GPUs.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

Achieving on-Mobile Real-Time Super-Resolution with Neural Architecture and Pruning Search.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

GPU Overdrive Fault Attacks on Neural Networks.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

Trident: A Hybrid Correlation-Collision GPU Cache Timing Attack for AES Key Recovery.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

Overdrive Fault Attacks on GPUs.
Proceedings of the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2021

A Secure and Reusable Software Architecture for Supporting Online Data Harmonization.
Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), 2021

2020
Nacre<sup>*</sup>*Nacre, or mother-of-pearl, is one of nature's remarkable examples of a durable and break-resistant structure.: Durable, Secure and Energy-Efficient Non-Volatile Memory Utilizing Data Versioning.
IEEE Trans. Emerg. Top. Comput., 2020

ArmorAll: Compiler-based Resilience Targeting GPU Applications.
ACM Trans. Archit. Code Optim., 2020

Editorial: A Message from the Editor-in-Chief.
ACM Trans. Archit. Code Optim., 2020

Exploiting Bank Conflict-based Side-channel Timing Leakage of GPUs.
ACM Trans. Archit. Code Optim., 2020

Exploring GPU acceleration of Deep Neural Networks using Block Circulant Matrices.
Parallel Comput., 2020

MGPU-TSM: A Multi-GPU System with Truly Shared Memory.
CoRR, 2020

HALCONE : A Hardware-Level Timestamp-based Cache Coherence Scheme for Multi-GPU systems.
CoRR, 2020

Design Space Exploration of Accelerators and End-to-End DNN Evaluation with TFLITE-SOC.
Proceedings of the 32nd IEEE International Symposium on Computer Architecture and High Performance Computing, 2020

A Smart Background Scheduler for Storage Systems.
Proceedings of the 28th International Symposium on Modeling, 2020

Message from the Program Chairs : IISWC 2020.
Proceedings of the IEEE International Symposium on Workload Characterization, 2020

Using Undersampling with Ensemble Learning to Identify Factors Contributing to Preterm Birth.
Proceedings of the 19th IEEE International Conference on Machine Learning and Applications, 2020

Griffin: Hardware-Software Support for Efficient Page Migration in Multi-GPU Systems.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

Hardware/Software Obfuscation against Timing Side-channel Attack on a GPU.
Proceedings of the 2020 IEEE International Symposium on Hardware Oriented Security and Trust, 2020

Vega: A Computer Vision Processing Enhancement Framework with Graph-based Acceleration.
Proceedings of the 53rd Hawaii International Conference on System Sciences, 2020

A Novel GPU Overdrive Fault Attack.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Introducing Gamettes: A Playful Approach for Capturing Decision-Making for Informing Behavioral Models.
Proceedings of the CHI '20: CHI Conference on Human Factors in Computing Systems, 2020

Valkyrie: Leveraging Inter-TLB Locality to Enhance GPU Performance.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
Analyzing and Increasing the Reliability of Convolutional Neural Networks on GPUs.
IEEE Trans. Reliab., 2019

Intra-Cluster Coalescing and Distributed-Block Scheduling to Reduce GPU NoC Pressure.
IEEE Trans. Computers, 2019

Side-channel Timing Attack of RSA on a GPU.
ACM Trans. Archit. Code Optim., 2019

HAWS: Accelerating GPU Wavefront Execution through Selective Out-of-order Execution.
ACM Trans. Archit. Code Optim., 2019

Student cluster competition 2018, team northeastern university: Reproducing performance of a multi-physics simulations of the Tsunamigenic 2004 Sumatra Megathrust earthquake on the AMD EPYC 7551 architecture.
Parallel Comput., 2019

Summarizing CPU and GPU Design Trends with Product Data.
CoRR, 2019

Priority-Based PCIe Scheduling for Multi-Tenant Multi-GPU Systems.
IEEE Comput. Archit. Lett., 2019

MGPUSim: enabling multi-GPU performance modeling and optimization.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

Exploiting Adaptive Data Compression to Improve Performance and Energy-Efficiency of Compute Workloads in Multi-GPU Systems.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Discovering Programmer Intention Behind Written Source Code.
Proceedings of the 18th IEEE International Conference On Machine Learning And Applications, 2019

A Comprehensive Evaluation of the Effects of Input Data on the Resilience of GPU Applications.
Proceedings of the 2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, 2019

PCFI: Program Counter Guided Fault Injection for Accelerating GPU Reliability Assessment.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2019

2018
Lightweight Hardware Transactional Memory for GPU Scratchpad Memory.
IEEE Trans. Computers, 2018

Block Cooperation: Advancing Lifetime of Resistive Memories by Increasing Utilization of Error Correcting Codes.
ACM Trans. Archit. Code Optim., 2018

Student cluster competition 2017, team Northeastern University: Reproducing vectorization of the Tersoff multi-body potential on the NVIDIA V100.
Parallel Comput., 2018

Power Analysis Attack of an AES GPU Implementation.
J. Hardw. Syst. Secur., 2018

MGSim + MGMark: A Framework for Multi-GPU System Research.
CoRR, 2018

An Integrated simulation Framework for examining Resiliency in pharmaceutical supply Chains considering Human Behaviors.
Proceedings of the 2018 Winter Simulation Conference, 2018

Characterizing the Microarchitectural Implications of a Convolutional Neural Network (CNN) Execution on GPUs.
Proceedings of the 2018 ACM/SPEC International Conference on Performance Engineering, 2018

PRISM: predicting resilience of GPU applications using statistical methods.
Proceedings of the International Conference for High Performance Computing, 2018

Employing Student Retention Strategies for an Introductory GPU Programming Course.
Proceedings of the 2018 IEEE/ACM Workshop on Education for High-Performance Computing, 2018

Peachy Parallel Assignments (EduHPC 2018).
Proceedings of the 2018 IEEE/ACM Workshop on Education for High-Performance Computing, 2018

Evaluating Performance Tradeoffs on the Radeon Open Compute Platform.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2018

Intra-Cluster Coalescing to Reduce GPU NoC Pressure.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Profiling DNN Workloads on a Volta-based DGX-1 System.
Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

A Timing Side-Channel Attack on a Mobile GPU.
Proceedings of the 36th IEEE International Conference on Computer Design, 2018

Defensive dropout for hardening deep neural networks under adversarial attacks.
Proceedings of the International Conference on Computer-Aided Design, 2018

Effective simple-power analysis attacks of elliptic curve cryptography on embedded systems.
Proceedings of the International Conference on Computer-Aided Design, 2018

GPU acceleration of RSA is vulnerable to side-channel timing attacks.
Proceedings of the International Conference on Computer-Aided Design, 2018

Evaluating the Resilience of Parallel Applications.
Proceedings of the 2018 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, 2018

Evaluating the impact of execution parameters on program vulnerability in GPU applications.
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

Airavat: Improving energy efficiency of heterogeneous applications.
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

An Efficient Data Management Framework for Puerto Rico Testsite for Exploring Contamination Threats (PROTECT).
Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2018), 2018

A Hybrid Approach to Identifying Key Factors in Environmental Health Studies.
Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2018), 2018

Interactive Kernel Dimension Alternative Clustering on GPUs.
Proceedings of the IEEE/ACM 2018 International Conference on Advances in Social Networks Analysis and Mining, 2018

Iterative Spectral Method for Alternative Clustering.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

2017
Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms.
CoRR, 2017

DNNMark: A Deep Neural Network Benchmark Suite for GPUs.
Proceedings of the General Purpose GPUs, 2017

Combining architectural fault-injection and neutron beam testing approaches toward better understanding of GPU soft-error resilience.
Proceedings of the IEEE 60th International Midwest Symposium on Circuits and Systems, 2017

REMAP: a reliability/endurance mechanism for advancing PCM.
Proceedings of the International Symposium on Memory Systems, 2017

Multi2Sim Kepler: A detailed architectural GPU simulator.
Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017

Moka: Model-based concurrent kernel analysis.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

Dual Dictionary Compression for the Last Level Cache.
Proceedings of the 2017 IEEE International Conference on Computer Design, 2017

Quality of Service-Aware Dynamic Voltage and Frequency Scaling for Mobile 3D Graphics Applications.
Proceedings of the 2017 IEEE International Conference on Computer Design, 2017

Cost-effective write disturbance mitigation techniques for advancing PCM density.
Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design, 2017

A Novel Side-Channel Timing Attack on GPUs.
Proceedings of the on Great Lakes Symposium on VLSI 2017, 2017

Hardware Support for Scratchpad Memory Transactions on GPU Architectures.
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

Exploring the Potential for Collaborative Data Compression and Hard-Error Tolerance in PCM Memories.
Proceedings of the 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2017

Live together or Die Alone: Block cooperation to extend lifetime of resistive memories.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017

TwinKernels: an execution model to improve GPU hardware scheduling at compile time.
Proceedings of the 2017 International Symposium on Code Generation and Optimization, 2017

High-Performance Monte Carlo Simulations for Photon Migration and Applications in Optical Brain Functional Imaging.
Proceedings of the Handbook of Large-Scale Distributed Computing in Smart Healthcare, 2017

2016
UMH: A Hardware-Based Unified Memory Hierarchy for Systems with Multiple Discrete GPUs.
ACM Trans. Archit. Code Optim., 2016

21st Century Computer Architecture.
CoRR, 2016

A Fast Level-Set Segmentation Algorithm for Image Processing Designed For Parallel Architectures.
Proceedings of the 6th Workshop on Irregular Applications: Architecture and Algorithms, 2016

A comprehensive performance analysis of HSA and OpenCL 2.0.
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

Mystic: Predictive Scheduling for GPU Based Cloud Servers Using Machine Learning.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Balancing Scalar and Vector Execution on GPU Architectures.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Hetero-mark, a benchmark suite for CPU-GPU collaborative computing.
Proceedings of the 2016 IEEE International Symposium on Workload Characterization, 2016

Hardware thread reordering to boost OpenCL throughput on FPGAs.
Proceedings of the 34th IEEE International Conference on Computer Design, 2016

A complete key recovery timing attack on a GPU.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Modeling player decisions in a supply chain game.
Proceedings of the IEEE Conference on Computational Intelligence and Games, 2016

2015
Exploring the Efficiency of the OpenCL Pipe Semantic on an FPGA.
SIGARCH Comput. Archit. News, 2015

A reuse-based refresh policy for energy-aware eDRAM caches.
Microprocess. Microsystems, 2015

Side-Channel Analysis of MAC-Keccak Hardware Implementations.
IACR Cryptol. ePrint Arch., 2015

NUPAR: A Benchmark Suite for Modern GPU Architectures.
Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering, Austin, TX, USA, January 31, 2015

Visualization of OpenCL application execution on CPU-GPU systems.
Proceedings of the Workshop on Computer Architecture Education, 2015

Engaging sophomores in embedded design using robotics.
Proceedings of the Workshop on Computer Architecture Education, 2015

Field, experimental, and analytical data on large-scale HPC systems and evaluation of the implications for exascale system design.
Proceedings of the 33rd IEEE VLSI Test Symposium, 2015

High performance computing of fiber scattering simulation.
Proceedings of the 8th Workshop on General Purpose Processing using GPUs, 2015

Asymmetric NoC Architectures for GPU Systems.
Proceedings of the 9th International Symposium on Networks-on-Chip, 2015

Securing virtual execution environments through machine learning-based intrusion detection.
Proceedings of the 25th IEEE International Workshop on Machine Learning for Signal Processing, 2015

A framework for visualization of OpenCL applications execution: a tutorial.
Proceedings of the 3rd International Workshop on OpenCL, 2015

Exploring the features of OpenCL 2.0.
Proceedings of the 3rd International Workshop on OpenCL, 2015

Leveraging Silicon-Photonic NoC for Designing Scalable GPUs.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Side-channel power analysis of a GPU AES implementation.
Proceedings of the 33rd IEEE International Conference on Computer Design, 2015

Bridging Architecture and Programming for Throughput-Oriented Vision Processing (Abstract Only).
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

Performance of the NVIDIA Jetson TK1 in HPC.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014
Harnessing the Power of GPUs to Speed Up Feature Selection for Outlier Detection.
J. Comput. Sci. Technol., 2014

Aggressive Value Prediction on a GPU.
Int. J. Parallel Program., 2014

Analyzing power efficiency of optimization techniques and algorithm design methods for applications on heterogeneous platforms.
Int. J. High Perform. Comput. Appl., 2014

Power Analysis Attack on Hardware Implementation of MAC-Keccak on FPGAs.
IACR Cryptol. ePrint Arch., 2014

System Call Anomaly Detection Using Multi-HMMs.
Proceedings of the IEEE Eighth International Conference on Software Security and Reliability, 2014

Runtime Support for Adaptive Spatial Partitioning and Inter-Kernel Communication on GPUs.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Calculating Architectural Vulnerability Factors for Spatial Multi-Bit Transient Faults.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

A parallel clustering algorithm for placement.
Proceedings of the Fifteenth International Symposium on Quality Electronic Design, 2014

Scalable and efficient implementation of correlation power analysis using graphics processing units (GPUs).
Proceedings of the HASP 2014, 2014

Scalar Waving: Improving the Efficiency of SIMD Execution on GPUs.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

GPU-Accelerated HMM for Speech Recognition.
Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014

Accelerated Connected Component Labeling Using CUDA Framework.
Proceedings of the Computer Vision and Graphics - International Conference, 2014

Exploring the Heterogeneous Design Space for both Performance and Reliability.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

Performance Evaluation and Optimization Mechanisms for Inter-operable Graphics and Computation on GPUs.
Proceedings of the Seventh Workshop on General Purpose Processing Using GPUs, 2014

Fast Fourier Transform (FFT) on GPUs.
Proceedings of the Numerical Computations with GPUs, 2014

2013
Quantifying the energy efficiency of FFT on heterogeneous platforms.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

Characterizing scalar opportunities in GPGPU applications.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

HQL: A Scalable Synchronization Mechanism for GPUs.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Analyzing Optimization Techniques for Power Efficiency on Heterogeneous Platforms.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Unstructured Control Flow in GPGPU.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Datacenters as Controllable Load Resources in the Electricity Market.
Proceedings of the IEEE 33rd International Conference on Distributed Computing Systems, 2013

Architecture-Independent Dynamic Information Flow Tracking.
Proceedings of the Compiler Construction - 22nd International Conference, 2013

Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems.
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, 2013

Heterogeneous Computing with OpenCL - Revised OpenCL 1.2 Edition.
Morgan Kaufmann, ISBN: 978-0-12-405894-1, 2013

2012
A Sequentially Consistent Multiprocessor Architecture for Out-of-Order Retirement of Instructions.
IEEE Trans. Parallel Distributed Syst., 2012

Local Kernel Density Ratio-Based Feature Selection for Outlier Detection.
Proceedings of the 4th Asian Conference on Machine Learning, 2012

Dione: A Flexible Disk Monitoring and Analysis Framework.
Proceedings of the Research in Attacks, Intrusions, and Defenses, 2012

GPU-Accelerated Feature Selection for Outlier Detection Using the Local Kernel Density Ratio.
Proceedings of the 12th IEEE International Conference on Data Mining, 2012

Feature Weighting and Selection Using Hypothesis Margin of Boosting.
Proceedings of the 12th IEEE International Conference on Data Mining, 2012

Topic 16: GPU and Accelerators Computing.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

Enabling task-level scheduling on heterogeneous platforms.
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, 2012

Multi2Sim: a simulation framework for CPU-GPU computing.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures.
IEEE Trans. Parallel Distributed Syst., 2011

Guest Editor's Introduction: Special Issue on High-Performance Computing with Accelerators.
IEEE Trans. Parallel Distributed Syst., 2011

Accelerating an Imaging Spectroscopy Algorithm for Submerged Marine Environments Using Graphics Processing Units.
IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2011

Virtual machine monitor-based lightweight intrusion detection.
ACM SIGOPS Oper. Syst. Rev., 2011

Workload Characterization at the Virtualization Layer.
Proceedings of the MASCOTS 2011, 2011

A Novel Feature Selection for Intrusion Detection in Virtual Machine Environments.
Proceedings of the IEEE 23rd International Conference on Tools with Artificial Intelligence, 2011

Feature Selection Metric Using AUC Margin for Small Samples and Imbalanced Data Classification Problems.
Proceedings of the 10th International Conference on Machine Learning and Applications and Workshops, 2011

The convergence of HPC and embedded systems in our heterogeneous computing future.
Proceedings of the IEEE 29th International Conference on Computer Design, 2011

Increasing power/performance resource efficiency on virtualized enterprise servers.
Proceedings of the 8th Conference on Computing Frontiers, 2011

Analyzing program flow within a many-kernel OpenCL application.
Proceedings of 4th Workshop on General Purpose Processing on Graphics Processing Units, 2011

Caracal: dynamic translation of runtime environments for GPUs.
Proceedings of 4th Workshop on General Purpose Processing on Graphics Processing Units, 2011

2010
Quantifying load imbalance on virtualized enterprise servers.
Proceedings of the first joint WOSP/SIPEW International Conference on Performance Engineering, 2010

Data Structures and Transformations for Physically Based Simulation on a GPU.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2010, 2010

Toward Whole-System Dynamic Analysis for ARM-Based Mobile Devices.
Proceedings of the Recent Advances in Intrusion Detection, 13th International Symposium, 2010

Data transformations enabling loop vectorization on multithreaded data parallel architectures.
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Using hardware vulnerability factors to enhance AVF analysis.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

Effective Virtual Machine Monitor Intrusion Detection Using Feature Selection on Highly Imbalanced Data.
Proceedings of the Ninth International Conference on Machine Learning and Applications, 2010

Out-of-order retirement of instructions in sequentially consistent multiprocessors.
Proceedings of the 28th International Conference on Computer Design, 2010

Accelerating the local outlier factor algorithm on a GPU for intrusion detection systems.
Proceedings of 3rd Workshop on General Purpose Processing on Graphics Processing Units, 2010

2009
AGAMOS: A Graph-Based Approach to Modulo Scheduling for Clustered Microarchitectures.
IEEE Trans. Computers, 2009

Obtaining FPGA soft error rate in high performance information systems.
Microelectron. Reliab., 2009

Software transactional memory for multicore embedded systems.
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, 2009

Profile-Guided Optimization of Critical Medical Imaging Algorithms.
Proceedings of the 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Boston, MA, USA, June 28, 2009

Multi GPU Implementation of Iterative Tomographic Reconstruction Algorithms.
Proceedings of the 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Boston, MA, USA, June 28, 2009

Exploring the multiple-GPU design space.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Eliminating microarchitectural dependency from Architectural Vulnerability.
Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

Accelerating phase unwrapping and affine transformations for optical quadrature microscopy using CUDA.
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, 2009

Architecture-aware optimization targeting multithreaded stream computing.
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, 2009

2008
Acknowledgment to special issue reviewers.
J. Parallel Distributed Comput., 2008

Special issue: General-purpose processing using graphics processing units.
J. Parallel Distributed Comput., 2008

Interactive Deformable Registration Visualization and Analysis of 4D Computed Tomography.
Proceedings of the Medical Biometrics, First International Conference, 2008

A Field Analysis of System-level Effects of Soft Errors Occurring in Microprocessors used in Information Systems.
Proceedings of the 2008 IEEE International Test Conference, 2008

Archer: A Community Distributed Computing Infrastructure for Computer Architecture Research and Education.
Proceedings of the Collaborative Computing: Networking, 2008

Applying Spectral Analysis to Identify Individual Application Signatures.
Proceedings of the 34th International Computer Measurement Group Conference, 2008

Quantifying software vulnerability.
Proceedings of the 5th Conference on Computing Frontiers, 2008

2007
Power Aware External Bus Arbitration for System-on-a-Chip Embedded Systems.
Trans. High Perform. Embed. Archit. Compil., 2007

Characterization of file I/O activity for SPEC CPU2006.
SIGARCH Comput. Archit. News, 2007

Case Study: Soft Error Rate Analysis in Storage Systems.
Proceedings of the 25th IEEE VLSI Test Symposium (VTS 2007), 2007

Exploring Novel Parallelization Technologies for 3-D Imaging Applications.
Proceedings of the 19th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2007), 2007

Stream Image Processing on a Dual-Core Embedded System.
Proceedings of the Embedded Computer Systems: Architectures, 2007

External memory page remapping for embedded multimedia systems.
Proceedings of the 2007 ACM SIGPLAN/SIGBED Conference on Languages, 2007

Heterogeneous Clustered VLIW Microarchitectures.
Proceedings of the Fifth International Symposium on Code Generation and Optimization (CGO 2007), 2007

2006
Addressing a workload characterization study to the design of consistency protocols.
J. Supercomput., 2006

Reducing Data Cache Susceptibility to Soft Errors.
IEEE Trans. Dependable Secur. Comput., 2006

An adjustable linear time parallel algorithm for maximum weight bipartite matching.
Inf. Process. Lett., 2006

Experiences with the Blackfin architecture in an embedded systems lab.
Proceedings of the 2006 Workshop on Computer Architecture Education, 2006

Performance Characterization of SPEC CPU2006 Integer Benchmarks on x86-64 Architecture.
Proceedings of the 2006 IEEE International Symposium on Workload Characterization, 2006

Acceleration of Maximum Likelihood Estimation for Tomosynthesis Mammography.
Proceedings of the 12th International Conference on Parallel and Distributed Systems, 2006

Vulnerability analysis of L2 cache elements to single event upsets.
Proceedings of the Conference on Design, Automation and Test in Europe, 2006

Hunting Trojan Horses.
Proceedings of the 1st Workshop on Architectural and System Support for Improving Software Dependability, 2006

2005
A reliable return address stack: microarchitectural features to defeat stack smashing.
SIGARCH Comput. Archit. News, 2005

Characterizing antivirus workload execution.
SIGARCH Comput. Archit. News, 2005

ASM: application security monitor.
SIGARCH Comput. Archit. News, 2005

Introduction to the special issue.
SIGARCH Comput. Archit. News, 2005

Subsequence Matching on Structured Time Series Data.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2005

Demystifying on-the-fly spill code.
Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, 2005

A multinomial clustering model for fast simulation of computer architecture designs.
Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005

Balancing Performance and Reliability in the Memory Hierarchy.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Load Balancing using Grid-based Peer-to-Peer Parallel I/O.
Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005

Exploiting temporal locality in drowsy cache policies.
Proceedings of the Second Conference on Computing Frontiers, 2005

2004
Removing communications in clustered microarchitectures through instruction replication.
ACM Trans. Archit. Code Optim., 2004

Developing object-oriented parallel iterative methods.
Int. J. High Perform. Comput. Netw., 2004

Characterizing the Dynamic Behavior of Workload Execution in SVM systems.
Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2004), 2004

A Study of Errant Pipeline Flushes Caused by Value Misspeculation.
Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2004), 2004

Bus Power Estimation and Power-Efficient Bus Arbitration for System-on-a-Chip Embedded Systems.
Proceedings of the Power-Aware Computer Systems, 4th International Workshop, 2004

Execution-Driven Simulation of Network Storage Systems.
Proceedings of the 12th International Workshop on Modeling, 2004

Parallel Maximum Weight Bipartite Matching Algorithms for Scheduling in Input-Queued Switches.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

A MATLAB toolbox for Hyperspectral Image Analysis.
Proceedings of the 2004 IEEE International Geoscience and Remote Sensing Symposium, 2004

Bi-Criteria Models for All-Uses Test Suite Reduction.
Proceedings of the 26th International Conference on Software Engineering (ICSE 2004), 2004

2003
Realizing high IPC through a scalable memory-latency tolerant multipath microarchitecture.
SIGARCH Comput. Archit. News, 2003

Levo - A Scalable Processor With High IPC.
J. Instr. Level Parallelism, 2003

The CenSSIS Image Database.
Proceedings of the 15th International Conference on Scientific and Statistical Database Management (SSDBM 2003), 2003

Source level transformations to improve I/O data partitioning.
Proceedings of the International Workshop on Storage Network Architecture and Parallel I/Os, 2003

Dynamic Input Buffer Allocation (DIBA) for Fault Tolerant Ethernet Packet Switching.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2003

Instruction Replication for Clustered Microarchitectures.
Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

Profile-guided I/O partitioning.
Proceedings of the 17th Annual International Conference on Supercomputing, 2003

2002
Localized Message Passing Structure for High Speed Ethernet Packet Switching.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2002

Realizing High IPC Using Time-Tagged Resource-Flow Computing.
Proceedings of the Euro-Par 2002, 2002

Exploiting Pseudo-Schedules to Guide Data Dependence Graph Partitioning.
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 2002), 2002

2001
Introduction to the Special Section on High Performance Memory Systems.
IEEE Trans. Computers, 2001

Workshop on binary translation - 2001.
SIGARCH Comput. Archit. News, 2001

WBT-2000: workshop on binary translation - 2000.
SIGARCH Comput. Archit. News, 2001

2000
Using cache line coloring to perform aggressive procedure inlining.
SIGARCH Comput. Archit. News, 2000

Welcome to the Opportunities of Binary Translation.
Computer, 2000

Learning outside of the classroom: the Northeastern University research co-op fellowship program.
Proceedings of the 2000 workshop on Computer architecture education, 2000

Accurate simulation and evaluation of code reordering.
Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software, 2000

DSPTune: A Performance Evaluation Toolset for the SHARC Signal Processor.
Proceedings of the Proceedings 33th Annual Simulation Symposium (SS 2000), 2000

1999
Analysis of Temporal-Based Program Behavior for Improved Instruction Cache Performance.
IEEE Trans. Computers, 1999

Improving the accuracy of indirect branch prediction via branch classification.
SIGARCH Comput. Archit. News, 1999

Branch-directed and pointer-based data cache prefetching.
J. Syst. Archit., 1999

Indirect Branch Prediction Using Data Compression Techniques.
J. Instr. Level Parallelism, 1999

Fifth Annual Workshop on Computer Education.
Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, 1999

1998
VLSI design in the 3rd dimension.
Integr., 1998

Tracing and Characterization of Windows NT-based System Workloads.
Digit. Tech. J., 1998

Predicting Indirect Branches via Data Compression.
Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998

Temporal-Based Procedure Reordering for Improved Instruction Cache Performance.
Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, Las Vegas, Nevada, USA, January 31, 1998

Operating System Impact on Trace-Driven Simulation.
Proceedings of the Proceedings 31st Annual Simulation Symposium (SS '98), 1998

1997
Improving the Accuracy of History Based Branch Prediction.
IEEE Trans. Computers, 1997

Performance analysis on a CC-NUMA prototype.
IBM J. Res. Dev., 1997

Operating-system level tracing tools for the DEC AXP architecture.
Proceedings of the 1997 workshop on Computer architecture education, 1997

Efficient Procedure Mapping Using Cache Line Coloring.
Proceedings of the ACM SIGPLAN '97 Conference on Programming Language Design and Implementation (PLDI), 1997

Analytic Models of Workload Behavior and Pipeline Performance.
Proceedings of the MASCOTS 1997, 1997

Digital Computer Architecture.
Proceedings of the Computer Science and Engineering Handbook, 1997

1996
A discussion on non-blocking/lockup-free caches.
SIGARCH Comput. Archit. News, 1996

Real-Time Trace Generation.
Int. J. Comput. Simul., 1996

Improving Multiprocessor Scalability Using Lockup Free Caches.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1996

Branch-Directed and Stride-Based Data Cache Prefetching.
Proceedings of the 1996 International Conference on Computer Design (ICCD '96), 1996

Performance Modeling Using Object-Oriented Execution-Driven Simulation.
Proceedings of the Proceedings 29st Annual Simulation Symposium (SS '96), 1996

The DLX instruction set architecture handbook.
Morgan Kaufmann, ISBN: 978-1-55860-371-4, 1996

1995
Combining object-oriented design and computer architecture into a single senior-level course.
Proceedings of the 1995 Workshop on Computer Architecture Education, 1995

Scalable Performance on a Distributed Shared-Memory Machine.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1995

1993
Issues in Trace-Driven Simulation.
Proceedings of the Performance Evaluation of Computer and Communication Systems, 1993

1992
Contrasting instruction-fetch time and instruction-decode time branch prediction mechanisms: Achieving synergy through their cooperative operation.
Microprocess. Microprogramming, 1992

1991
A Study of 80X86/80X87 Floating-Point Execution.
Proceedings of the 1991 ACM SIGSMALL/PC Symposium on Small Systems, 1991

Branch History Table Prediction of Moving Target Branches due to Subroutine Returns.
Proceedings of the 18th Annual International Symposium on Computer Architecture. Toronto, 1991

1989
PC Workload Characterization.
Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, 1989


  Loading...