2024
Scalability Limitations of Processing-in-Memory using Real System Evaluations.
,
,
,
,
,
,
,
,
,
,
Proc. ACM Meas. Anal. Comput. Syst., 2024
Photonics for Sustainable Computing.
CoRR, 2024
SOPHIE: A Scalable Recurrent Ising Machine Using Optically Addressed Phase Change Memory.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024
NeuraChip: Accelerating GNN Computations with a Hash-based Decoupled Spatial Accelerator.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024
Mirage: An RNS-Based Photonic Accelerator for DNN Training.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024
HEAP: A Fully Homomorphic Encryption Accelerator with Parallelized Bootstrapping.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024
A Robot-Administered ICU Confusion Assessment with Brain-Computer Interface Control.
Proceedings of the Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction, 2024
IOMMU Deferred Invalidation Vulnerability: Exploit and Defense.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024
2023
RISE: RISC-V SoC for En/Decryption Acceleration on the Edge for Homomorphic Encryption.
IEEE Trans. Very Large Scale Integr. Syst., October, 2023
An Electro-Photonic System for Accelerating Deep Neural Networks.
ACM J. Emerg. Technol. Comput. Syst., October, 2023
Puppeteer: A Random Forest Based Manager for Hardware Prefetchers Across the Memory Hierarchy.
ACM Trans. Archit. Code Optim., March, 2023
On Architecting Fully Homomorphic Encryption-based Computing Systems
Synthesis Lectures on Computer Architecture, Springer, ISBN: 978-3-031-31753-8, 2023
Accelerating Finite Field Arithmetic for Homomorphic Encryption on GPUs.
IEEE Micro, 2023
Towards Efficient Hyperdimensional Computing Using Photonics.
,
,
,
,
,
,
,
,
,
,
CoRR, 2023
Accelerating DNN Training With Photonics: A Residue Number System-Based Design.
CoRR, 2023
A Blueprint for Precise and Fault-Tolerant Analog Neural Networks.
CoRR, 2023
Leveraging Residue Number System for Designing High-Precision Analog Deep Neural Network Accelerators.
CoRR, 2023
GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023
MAD: Memory-Aware Design Techniques for Accelerating Fully Homomorphic Encryption.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023
Processing-in-Memory Using Optically-Addressed Phase Change Memory.
Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2023
FAB: An FPGA-based Accelerator for Bootstrappable Fully Homomorphic Encryption.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023
ProcessorFuzz: Processor Fuzzing with Control and Status Registers Guidance.
Proceedings of the IEEE International Symposium on Hardware Oriented Security and Trust, 2023
SIGFuzz: A Framework for Discovering Microarchitectural Timing Side Channels.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023
2022
Architecting Optically Controlled Phase Change Memory.
ACM Trans. Archit. Code Optim., 2022
ProcessorFuzz: Guiding Processor Fuzzing using Control and Status Registers.
CoRR, 2022
Accelerating Polynomial Multiplication for Homomorphic Encryption on GPUs.
Proceedings of the 2022 IEEE International Symposium on Secure and Private Execution Environment Design (SEED), 2022
RACE: RISC-V SoC for En/decryption Acceleration on the Edge for Homomorphic Computation.
Proceedings of the ISLPED '22: ACM/IEEE International Symposium on Low Power Electronics and Design, Boston, MA, USA, August 1, 2022
Hydra: A near hybrid memory accelerator for CNN inference.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022
TargetFuzz: Using DARTs to Guide Directed Greybox Fuzzers.
Proceedings of the ASIA CCS '22: ACM Asia Conference on Computer and Communications Security, Nagasaki, Japan, 30 May 2022, 2022
NaviSim: A Highly Accurate GPU Simulator for AMD RDNA GPUs.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022
2021
Hardware Trojan Detection Using Backside Optical Imaging.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021
Accelerating Data-Parallel Neural Network Training with Weighted-Averaging Reparameterisation.
Parallel Process. Lett., 2021
Does Fully Homomorphic Encryption Need Compute Acceleration?
IACR Cryptol. ePrint Arch., 2021
A Cautionary Tale About Detecting Malware Using Hardware Performance Counters and Machine Learning.
IEEE Des. Test, 2021
Network-on-Chip Microarchitecture-based Covert Channel in GPUs.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021
AI Tax in Mobile SoCs: End-to-end Performance Analysis of Machine Learning in Smartphones.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021
GNNMark: A Benchmark Suite to Characterize Graph Neural Network Training on GPUs.
,
,
,
,
,
,
,
,
,
,
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021
Hardware Acceleration for DBMS Machine Learning Scoring: Is It Worth the Overheads?
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021
TAP-2.5D: A Thermally-Aware Chiplet Placement Methodology for 2.5D Systems.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021
SealPK: Sealable Protection Keys for RISC-V.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021
DirectFuzz: Automated Test Generation for RTL Designs using Directed Graybox Fuzzing.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021
FlexFilt: Towards Flexible Instruction Filtering for Security.
Proceedings of the ACSAC '21: Annual Computer Security Applications Conference, Virtual Event, USA, December 6, 2021
2020
Cross-Layer Co-Optimization of Network Design and Chiplet Placement in 2.5-D Systems.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020
BlackParrot: An Agile Open-Source RISC-V Multicore for Accelerator SoCs.
,
,
,
,
,
,
,
,
,
,
,
,
,
IEEE Micro, 2020
Efficient Sealable Protection Keys for RISC-V.
CoRR, 2020
MGPU-TSM: A Multi-GPU System with Truly Shared Memory.
CoRR, 2020
Custom Tailored Suite of Random Forests for Prefetcher Adaptation.
CoRR, 2020
HALCONE : A Hardware-Level Timestamp-based Cache Coherence Scheme for Multi-GPU systems.
CoRR, 2020
Gate-Level Validation of Integrated Circuits With Structured-Illumination Read-Out of Embedded Optical Signatures.
IEEE Access, 2020
LEAF-QA: Locate, Encode & Attend for Figure Question Answering.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020
PHMon: A Programmable Hardware Monitor and Its Security Use Cases.
Proceedings of the 29th USENIX Security Symposium, 2020
Bandwidth Allocation in Silicon-Photonic Networks Using Application Instrumentation.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020
Griffin: Hardware-Software Support for Efficient Page Migration in Multi-GPU Systems.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020
Efficient Context-Sensitive CFI Enforcement Through a Hardware Monitor.
Proceedings of the Detection of Intrusions and Malware, and Vulnerability Assessment, 2020
System-level Evaluation of Chip-Scale Silicon Photonic Networks for Emerging Data-Intensive Applications.
Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition, 2020
Valkyrie: Leveraging Inter-TLB Locality to Enhance GPU Performance.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020
2019
Editorial TVLSI Positioning - Continuing and Accelerating an Upward Trajectory.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
IEEE Trans. Very Large Scale Integr. Syst., 2019
The efficacy of various machine learning models for multi-class classification of RNA-seq expression data.
CoRR, 2019
MGPUSim: enabling multi-GPU performance modeling and optimization.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 46th International Symposium on Computer Architecture, 2019
CUDA Optimized Neural Network Predicts Blood Glucose Control from Quantified Joint Mobility and Anthropometrics.
Proceedings of the 3rd International Conference on Information System and Data Mining, 2019
2018
MGSim + MGMark: A Framework for Multi-GPU System Research.
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2018
Nile: A Programmable Monitoring Coprocessor.
IEEE Comput. Archit. Lett., 2018
Profiling DNN Workloads on a Volta-based DGX-1 System.
Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018
A cross-layer methodology for design and optimization of networks in 2.5D systems.
Proceedings of the International Conference on Computer-Aided Design, 2018
Leveraging thermally-aware chiplet organization in 2.5D systems to reclaim dark silicon.
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018
Hardware Performance Counters Can Detect Malware: Myth or Fact?
Proceedings of the 2018 on Asia Conference on Computer and Communications Security, 2018
2017
Adaptive Tuning of Photonic Devices in a Photonic NoC Through Dynamic Workload Allocation.
,
,
,
,
,
,
,
,
,
,
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2017
Field of Groves: An Energy-Efficient Random Forest.
CoRR, 2017
High-performance low-energy implementation of cryptographic algorithms on a programmable SoC for IoT devices.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017
Using Machine Learning techniques for identification of Chronic Traumatic Encephalopathy related Spectroscopic Biomarkers.
Proceedings of the 2017 IEEE Applied Imagery Pattern Recognition Workshop, 2017
2016
Designing Tunable Subthreshold Logic Circuits Using Adaptive Feedback Equalization.
IEEE Trans. Very Large Scale Integr. Syst., 2016
UMH: A Hardware-Based Unified Memory Hierarchy for Systems with Multiple Discrete GPUs.
ACM Trans. Archit. Code Optim., 2016
Electro-Photonic NoC Designs for Kilocore Systems.
ACM J. Emerg. Technol. Comput. Syst., 2016
Energy-Efficient Adaptive Classifier Design for Mobile Systems.
Proceedings of the 2016 International Symposium on Low Power Electronics and Design, 2016
Cross-layer floorplan optimization for silicon photonic NoCs in many-core systems.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016
2015
Managing Laser Power in Silicon-Photonic NoC Through Cache and NoC Reconfiguration.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2015
Using GUI Design Theory to Develop an Open Source Touchscreen Smartphone GUI.
Comput. Inf. Sci., 2015
Asymmetric NoC Architectures for GPU Systems.
Proceedings of the 9th International Symposium on Networks-on-Chip, 2015
Leveraging Silicon-Photonic NoC for Designing Scalable GPUs.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015
Dictionary-based sparse representation for resolution improvement in laser voltage imaging of CMOS integrated circuits.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015
Detecting hardware trojans using backside optical imaging of embedded watermarks.
Proceedings of the 52nd Annual Design Automation Conference, 2015
Towards General-Purpose Neural Network Computing.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015
2014
Design and Optimization of Nonvolatile Multibit 1T1R Resistive RAM.
IEEE Trans. Very Large Scale Integr. Syst., 2014
Learning to navigate in a virtual world using optic flow and stereo disparity signals.
Artif. Life Robotics, 2014
Sharing and placement of on-chip laser sources in silicon-photonic NoCs.
Proceedings of the Eighth IEEE/ACM International Symposium on Networks-on-Chip, 2014
Neural network-based accelerators for transcendental function approximation.
Proceedings of the Great Lakes Symposium on VLSI 2014, GLSVLSI '14, Houston, TX, USA - May 21, 2014
Thermal management of manycore systems with silicon-photonic networks.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014
Sub-threshold logic circuit design using feedback equalization.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014
2013
Energy-efficient pass-transistor-logic using decision feedback equalization.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013
2012
Nonlinear Multi-Error Correction Codes for Reliable MLC nand Flash Memories.
IEEE Trans. Very Large Scale Integr. Syst., 2012
Secure Multipliers Resilient to Strong Fault-Injection Attacks Using Multilinear Arithmetic Codes.
IEEE Trans. Very Large Scale Integr. Syst., 2012
Designing Chip-Level Nanophotonic Interconnection Networks.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2012
Tutorial T8A: Designing Silicon-Photonic Communication Networks for Manycore Systems.
Proceedings of the 25th International Conference on VLSI Design, 2012
Error mitigation in digital logic using a feedback equalization with schmitt trigger (FEST) circuit.
Proceedings of the Thirteenth International Symposium on Quality Electronic Design, 2012
A multi-layer approach to green computing: Designing energy-efficient digital circuits and manycore architectures.
Proceedings of the 2012 International Green Computing Conference, 2012
Performance and energy models for memristor-based 1T1R RRAM cell.
Proceedings of the Great Lakes Symposium on VLSI 2012, 2012
2011
Express Virtual Channels with Taps (EVC-T): A Flow Control Technique for Network-on-Chip (NoC) in Manycore Systems.
Proceedings of the IEEE 19th Annual Symposium on High Performance Interconnects, 2011
Influence of metallic tubes on the reliability of CNTFET SRAMs: error mechanisms and countermeasures.
Proceedings of the 21st ACM Great Lakes Symposium on VLSI 2010, 2011
Run-time energy management of manycore systems through reconfigurable interconnects.
Proceedings of the 21st ACM Great Lakes Symposium on VLSI 2010, 2011
A preliminary look at error avoidance in digital logic via feedback equalization.
Proceedings of the 49th Annual Allerton Conference on Communication, 2011
2010
Re-architecting DRAM memory systems with monolithically integrated silicon photonics.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010
Reliable MLC NAND flash memories based on nonlinear t-error-correcting codes.
Proceedings of the 2010 IEEE/IFIP International Conference on Dependable Systems and Networks, 2010
2009
Building Many-Core Processor-to-DRAM Networks with Monolithic CMOS Silicon Photonics.
,
,
,
,
,
,
,
,
,
,
,
,
,
IEEE Micro, 2009
A Modeling and exploration framework for interconnect network design in the nanometer era.
Proceedings of the Third International Symposium on Networks-on-Chips, 2009
Silicon-photonic clos networks for global on-chip communication.
Proceedings of the Third International Symposium on Networks-on-Chips, 2009
Designing multi-socket systems using silicon photonics.
Proceedings of the 23rd international conference on Supercomputing, 2009
Design of Reliable and Secure Multipliers by Multilinear Arithmetic Codes.
Proceedings of the Information and Communications Security, 11th International Conference, 2009
Designing Energy-Efficient Low-Diameter On-Chip Networks with Equalized Interconnects.
Proceedings of the 17th IEEE Symposium on High Performance Interconnects, 2009
2008
Distilling the essence of proprietary workloads into miniature benchmarks.
ACM Trans. Archit. Code Optim., 2008
Analysing and improving clustering based sampling for microprocessor simulation.
Int. J. High Perform. Comput. Netw., 2008
Automatically countering imbalance and its empirical relationship to cost.
Data Min. Knowl. Discov., 2008
Building Manycore Processor-to-DRAM Networks with Monolithic Silicon Photonics.
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 16th Annual IEEE Symposium on High Performance Interconnects (HOTI 2008), 2008
2007
Design and Optimization of On-Chip Interconnects Using Wave-Pipelined Multiplexed Routing.
IEEE Trans. Very Large Scale Integr. Syst., 2007
Applying Statistical Sampling for Fast and Efficient Simulation of Commercial Workloads.
IEEE Trans. Computers, 2007
Subsetting the SPEC CPU2006 benchmark suite.
SIGARCH Comput. Archit. News, 2007
Scaling and evaluation of carbon nanotube interconnects for VLSI applications.
Proceedings of the 2nd Internationa ICST Conference on Nano-Networks, 2007
Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007
2006
Measuring Benchmark Similarity Using Inherent Program Characteristics.
IEEE Trans. Computers, 2006
Low Power Multilevel Interconnect Networks Using Wave-Pipelined Multiplexed (WPM) Routing.
Proceedings of the 19th International Conference on VLSI Design (VLSI Design 2006), 2006
Evaluating the efficacy of statistical simulation for design space exploration.
Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006
Evaluating Benchmark Subsetting Approaches.
Proceedings of the 2006 IEEE International Symposium on Workload Characterization, 2006
Performance Cloning: A Technique for Disseminating Proprietary Applications as Benchmarks.
Proceedings of the 2006 IEEE International Symposium on Workload Characterization, 2006
2005
Wave-pipelined multiplexed (WPM) routing for gigascale integration (GSI).
IEEE Trans. Very Large Scale Integr. Syst., 2005
Gigascale ASIC/SoC design using wave-pipelined multiplexed (WPM) routing.
Proceedings of the Proceedings 2005 IEEE International SOC Conference, 2005
Analyzing and Improving Clustering Based Sampling for Microprocessor Simulation.
Proceedings of the 17th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2005), 2005
Measuring Program Similarity: Experiments with SPEC CPU Benchmark Suites.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Wave-pipelined 2-slot time division multiplexed (WP/2-TDM) routing.
Proceedings of the 15th ACM Great Lakes Symposium on VLSI 2005, 2005
2004
A 2-slot time-division multiplexing (TDM) interconnect network for gigascale integration (GSI).
Proceedings of the Sixth International Workshop on System-Level Interconnect Prediction (SLIP 2004), 2004