Scalability Limitations of Processing-in-Memory using Real System Evaluations.
Proc. ACM Meas. Anal. Comput. Syst., 2024

Photonics for Sustainable Computing.
CoRR, 2024

SOPHIE: A Scalable Recurrent Ising Machine Using Optically Addressed Phase Change Memory.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

NeuraChip: Accelerating GNN Computations with a Hash-based Decoupled Spatial Accelerator.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Mirage: An RNS-Based Photonic Accelerator for DNN Training.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

HEAP: A Fully Homomorphic Encryption Accelerator with Parallelized Bootstrapping.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

A Robot-Administered ICU Confusion Assessment with Brain-Computer Interface Control.
Proceedings of the Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction, 2024

IOMMU Deferred Invalidation Vulnerability: Exploit and Defense.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

RISE: RISC-V SoC for En/Decryption Acceleration on the Edge for Homomorphic Encryption.
IEEE Trans. Very Large Scale Integr. Syst., October, 2023

An Electro-Photonic System for Accelerating Deep Neural Networks.
ACM J. Emerg. Technol. Comput. Syst., October, 2023

Puppeteer: A Random Forest Based Manager for Hardware Prefetchers Across the Memory Hierarchy.
ACM Trans. Archit. Code Optim., March, 2023

On Architecting Fully Homomorphic Encryption-based Computing Systems
Synthesis Lectures on Computer Architecture, Springer, ISBN: 978-3-031-31753-8, 2023

Accelerating Finite Field Arithmetic for Homomorphic Encryption on GPUs.
IEEE Micro, 2023

Towards Efficient Hyperdimensional Computing Using Photonics.
CoRR, 2023

Accelerating DNN Training With Photonics: A Residue Number System-Based Design.
CoRR, 2023

A Blueprint for Precise and Fault-Tolerant Analog Neural Networks.
CoRR, 2023

Leveraging Residue Number System for Designing High-Precision Analog Deep Neural Network Accelerators.
CoRR, 2023

GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

MAD: Memory-Aware Design Techniques for Accelerating Fully Homomorphic Encryption.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Processing-in-Memory Using Optically-Addressed Phase Change Memory.
Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2023

FAB: An FPGA-based Accelerator for Bootstrappable Fully Homomorphic Encryption.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

ProcessorFuzz: Processor Fuzzing with Control and Status Registers Guidance.
Proceedings of the IEEE International Symposium on Hardware Oriented Security and Trust, 2023

SIGFuzz: A Framework for Discovering Microarchitectural Timing Side Channels.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

Architecting Optically Controlled Phase Change Memory.
ACM Trans. Archit. Code Optim., 2022

ProcessorFuzz: Guiding Processor Fuzzing using Control and Status Registers.
CoRR, 2022

Accelerating Polynomial Multiplication for Homomorphic Encryption on GPUs.
Proceedings of the 2022 IEEE International Symposium on Secure and Private Execution Environment Design (SEED), 2022

RACE: RISC-V SoC for En/decryption Acceleration on the Edge for Homomorphic Computation.
Proceedings of the ISLPED '22: ACM/IEEE International Symposium on Low Power Electronics and Design, Boston, MA, USA, August 1, 2022

Hydra: A near hybrid memory accelerator for CNN inference.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

TargetFuzz: Using DARTs to Guide Directed Greybox Fuzzers.
Proceedings of the ASIA CCS '22: ACM Asia Conference on Computer and Communications Security, Nagasaki, Japan, 30 May 2022, 2022

NaviSim: A Highly Accurate GPU Simulator for AMD RDNA GPUs.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

Hardware Trojan Detection Using Backside Optical Imaging.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

Accelerating Data-Parallel Neural Network Training with Weighted-Averaging Reparameterisation.
Parallel Process. Lett., 2021

Does Fully Homomorphic Encryption Need Compute Acceleration?
IACR Cryptol. ePrint Arch., 2021

A Cautionary Tale About Detecting Malware Using Hardware Performance Counters and Machine Learning.
IEEE Des. Test, 2021

Network-on-Chip Microarchitecture-based Covert Channel in GPUs.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

AI Tax in Mobile SoCs: End-to-end Performance Analysis of Machine Learning in Smartphones.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

GNNMark: A Benchmark Suite to Characterize Graph Neural Network Training on GPUs.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

Hardware Acceleration for DBMS Machine Learning Scoring: Is It Worth the Overheads?
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

TAP-2.5D: A Thermally-Aware Chiplet Placement Methodology for 2.5D Systems.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

SealPK: Sealable Protection Keys for RISC-V.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

DirectFuzz: Automated Test Generation for RTL Designs using Directed Graybox Fuzzing.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

FlexFilt: Towards Flexible Instruction Filtering for Security.
Proceedings of the ACSAC '21: Annual Computer Security Applications Conference, Virtual Event, USA, December 6, 2021

Cross-Layer Co-Optimization of Network Design and Chiplet Placement in 2.5-D Systems.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

BlackParrot: An Agile Open-Source RISC-V Multicore for Accelerator SoCs.
IEEE Micro, 2020

Efficient Sealable Protection Keys for RISC-V.
CoRR, 2020

MGPU-TSM: A Multi-GPU System with Truly Shared Memory.
CoRR, 2020

Custom Tailored Suite of Random Forests for Prefetcher Adaptation.
CoRR, 2020

HALCONE : A Hardware-Level Timestamp-based Cache Coherence Scheme for Multi-GPU systems.
CoRR, 2020

Gate-Level Validation of Integrated Circuits With Structured-Illumination Read-Out of Embedded Optical Signatures.
IEEE Access, 2020

LEAF-QA: Locate, Encode & Attend for Figure Question Answering.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

PHMon: A Programmable Hardware Monitor and Its Security Use Cases.
Proceedings of the 29th USENIX Security Symposium, 2020

Bandwidth Allocation in Silicon-Photonic Networks Using Application Instrumentation.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

Griffin: Hardware-Software Support for Efficient Page Migration in Multi-GPU Systems.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

Efficient Context-Sensitive CFI Enforcement Through a Hardware Monitor.
Proceedings of the Detection of Intrusions and Malware, and Vulnerability Assessment, 2020

System-level Evaluation of Chip-Scale Silicon Photonic Networks for Emerging Data-Intensive Applications.
Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition, 2020

Valkyrie: Leveraging Inter-TLB Locality to Enhance GPU Performance.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

Editorial TVLSI Positioning - Continuing and Accelerating an Upward Trajectory.
IEEE Trans. Very Large Scale Integr. Syst., 2019

The efficacy of various machine learning models for multi-class classification of RNA-seq expression data.
CoRR, 2019

MGPUSim: enabling multi-GPU performance modeling and optimization.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

CUDA Optimized Neural Network Predicts Blood Glucose Control from Quantified Joint Mobility and Anthropometrics.
Proceedings of the 3rd International Conference on Information System and Data Mining, 2019

MGSim + MGMark: A Framework for Multi-GPU System Research.
CoRR, 2018

Nile: A Programmable Monitoring Coprocessor.
IEEE Comput. Archit. Lett., 2018

Profiling DNN Workloads on a Volta-based DGX-1 System.
Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

A cross-layer methodology for design and optimization of networks in 2.5D systems.
Proceedings of the International Conference on Computer-Aided Design, 2018

Leveraging thermally-aware chiplet organization in 2.5D systems to reclaim dark silicon.
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

Hardware Performance Counters Can Detect Malware: Myth or Fact?
Proceedings of the 2018 on Asia Conference on Computer and Communications Security, 2018

Adaptive Tuning of Photonic Devices in a Photonic NoC Through Dynamic Workload Allocation.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2017

Field of Groves: An Energy-Efficient Random Forest.
CoRR, 2017

High-performance low-energy implementation of cryptographic algorithms on a programmable SoC for IoT devices.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

Using Machine Learning techniques for identification of Chronic Traumatic Encephalopathy related Spectroscopic Biomarkers.
Proceedings of the 2017 IEEE Applied Imagery Pattern Recognition Workshop, 2017

Designing Tunable Subthreshold Logic Circuits Using Adaptive Feedback Equalization.
IEEE Trans. Very Large Scale Integr. Syst., 2016

UMH: A Hardware-Based Unified Memory Hierarchy for Systems with Multiple Discrete GPUs.
ACM Trans. Archit. Code Optim., 2016

Electro-Photonic NoC Designs for Kilocore Systems.
ACM J. Emerg. Technol. Comput. Syst., 2016

Energy-Efficient Adaptive Classifier Design for Mobile Systems.
Proceedings of the 2016 International Symposium on Low Power Electronics and Design, 2016

Cross-layer floorplan optimization for silicon photonic NoCs in many-core systems.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

Managing Laser Power in Silicon-Photonic NoC Through Cache and NoC Reconfiguration.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2015

Using GUI Design Theory to Develop an Open Source Touchscreen Smartphone GUI.
Comput. Inf. Sci., 2015

Asymmetric NoC Architectures for GPU Systems.
Proceedings of the 9th International Symposium on Networks-on-Chip, 2015

Leveraging Silicon-Photonic NoC for Designing Scalable GPUs.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Dictionary-based sparse representation for resolution improvement in laser voltage imaging of CMOS integrated circuits.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

Detecting hardware trojans using backside optical imaging of embedded watermarks.
Proceedings of the 52nd Annual Design Automation Conference, 2015

Towards General-Purpose Neural Network Computing.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

Design and Optimization of Nonvolatile Multibit 1T1R Resistive RAM.
IEEE Trans. Very Large Scale Integr. Syst., 2014

Learning to navigate in a virtual world using optic flow and stereo disparity signals.
Artif. Life Robotics, 2014

Sharing and placement of on-chip laser sources in silicon-photonic NoCs.
Proceedings of the Eighth IEEE/ACM International Symposium on Networks-on-Chip, 2014

Neural network-based accelerators for transcendental function approximation.
Proceedings of the Great Lakes Symposium on VLSI 2014, GLSVLSI '14, Houston, TX, USA - May 21, 2014

Thermal management of manycore systems with silicon-photonic networks.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

Sub-threshold logic circuit design using feedback equalization.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

Energy-efficient pass-transistor-logic using decision feedback equalization.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Nonlinear Multi-Error Correction Codes for Reliable MLC nand Flash Memories.
IEEE Trans. Very Large Scale Integr. Syst., 2012

Secure Multipliers Resilient to Strong Fault-Injection Attacks Using Multilinear Arithmetic Codes.
IEEE Trans. Very Large Scale Integr. Syst., 2012

Designing Chip-Level Nanophotonic Interconnection Networks.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2012

Tutorial T8A: Designing Silicon-Photonic Communication Networks for Manycore Systems.
Proceedings of the 25th International Conference on VLSI Design, 2012

Error mitigation in digital logic using a feedback equalization with schmitt trigger (FEST) circuit.
Proceedings of the Thirteenth International Symposium on Quality Electronic Design, 2012

A multi-layer approach to green computing: Designing energy-efficient digital circuits and manycore architectures.
Proceedings of the 2012 International Green Computing Conference, 2012

Performance and energy models for memristor-based 1T1R RRAM cell.
Proceedings of the Great Lakes Symposium on VLSI 2012, 2012

Express Virtual Channels with Taps (EVC-T): A Flow Control Technique for Network-on-Chip (NoC) in Manycore Systems.
Proceedings of the IEEE 19th Annual Symposium on High Performance Interconnects, 2011

Influence of metallic tubes on the reliability of CNTFET SRAMs: error mechanisms and countermeasures.
Proceedings of the 21st ACM Great Lakes Symposium on VLSI 2010, 2011

Run-time energy management of manycore systems through reconfigurable interconnects.
Proceedings of the 21st ACM Great Lakes Symposium on VLSI 2010, 2011

A preliminary look at error avoidance in digital logic via feedback equalization.
Proceedings of the 49th Annual Allerton Conference on Communication, 2011

Re-architecting DRAM memory systems with monolithically integrated silicon photonics.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

Reliable MLC NAND flash memories based on nonlinear t-error-correcting codes.
Proceedings of the 2010 IEEE/IFIP International Conference on Dependable Systems and Networks, 2010

Building Many-Core Processor-to-DRAM Networks with Monolithic CMOS Silicon Photonics.
IEEE Micro, 2009

A Modeling and exploration framework for interconnect network design in the nanometer era.
Proceedings of the Third International Symposium on Networks-on-Chips, 2009

Silicon-photonic clos networks for global on-chip communication.
Proceedings of the Third International Symposium on Networks-on-Chips, 2009

Designing multi-socket systems using silicon photonics.
Proceedings of the 23rd international conference on Supercomputing, 2009

Design of Reliable and Secure Multipliers by Multilinear Arithmetic Codes.
Proceedings of the Information and Communications Security, 11th International Conference, 2009

Designing Energy-Efficient Low-Diameter On-Chip Networks with Equalized Interconnects.
Proceedings of the 17th IEEE Symposium on High Performance Interconnects, 2009

Distilling the essence of proprietary workloads into miniature benchmarks.
ACM Trans. Archit. Code Optim., 2008

Analysing and improving clustering based sampling for microprocessor simulation.
Int. J. High Perform. Comput. Netw., 2008

Automatically countering imbalance and its empirical relationship to cost.
Data Min. Knowl. Discov., 2008

Building Manycore Processor-to-DRAM Networks with Monolithic Silicon Photonics.
Proceedings of the 16th Annual IEEE Symposium on High Performance Interconnects (HOTI 2008), 2008

Design and Optimization of On-Chip Interconnects Using Wave-Pipelined Multiplexed Routing.
IEEE Trans. Very Large Scale Integr. Syst., 2007

Applying Statistical Sampling for Fast and Efficient Simulation of Commercial Workloads.
IEEE Trans. Computers, 2007

Subsetting the SPEC CPU2006 benchmark suite.
SIGARCH Comput. Archit. News, 2007

Scaling and evaluation of carbon nanotube interconnects for VLSI applications.
Proceedings of the 2nd Internationa ICST Conference on Nano-Networks, 2007

Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

Measuring Benchmark Similarity Using Inherent Program Characteristics.
IEEE Trans. Computers, 2006

Low Power Multilevel Interconnect Networks Using Wave-Pipelined Multiplexed (WPM) Routing.
Proceedings of the 19th International Conference on VLSI Design (VLSI Design 2006), 2006

Evaluating the efficacy of statistical simulation for design space exploration.
Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006

Evaluating Benchmark Subsetting Approaches.
Proceedings of the 2006 IEEE International Symposium on Workload Characterization, 2006

Performance Cloning: A Technique for Disseminating Proprietary Applications as Benchmarks.
Proceedings of the 2006 IEEE International Symposium on Workload Characterization, 2006

Wave-pipelined multiplexed (WPM) routing for gigascale integration (GSI).
IEEE Trans. Very Large Scale Integr. Syst., 2005

Gigascale ASIC/SoC design using wave-pipelined multiplexed (WPM) routing.
Proceedings of the Proceedings 2005 IEEE International SOC Conference, 2005

Analyzing and Improving Clustering Based Sampling for Microprocessor Simulation.
Proceedings of the 17th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2005), 2005

Measuring Program Similarity: Experiments with SPEC CPU Benchmark Suites.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Wave-pipelined 2-slot time division multiplexed (WP/2-TDM) routing.
Proceedings of the 15th ACM Great Lakes Symposium on VLSI 2005, 2005

A 2-slot time-division multiplexing (TDM) interconnect network for gigascale integration (GSI).
Proceedings of the Sixth International Workshop on System-Level Interconnect Prediction (SLIP 2004), 2004