2024
PipeOrgan: Efficient Inter-operation Pipelining with Flexible Spatial Organization and Interconnects.
CoRR, 2024
Not All Weights Are Created Equal: Enhancing Energy Efficiency in On-Device Streaming Speech Recognition.
CoRR, 2024
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Folding Attention: Memory and Power Optimization for On-Device Transformer-Based Streaming Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024
GPU-based Private Information Retrieval for On-Device Machine Learning Inference.
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
2023
Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition.
CoRR, 2023
XRBench: An Extended Reality (XR) Machine Learning Benchmark Suite for the Metaverse.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023
Efficient Non-Linear Adder for Stochastic Computing with Approximate Spatial-Temporal Sorting Network.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023
2022
SDRM3: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads.
CoRR, 2022
Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
2021
Low-Rank+Sparse Tensor Compression for Neural Networks.
CoRR, 2021
Heterogeneous Dataflow Accelerators for Multi-DNN Workloads.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021
2020
Improving Efficiency in Neural Network Accelerator Using Operands Hamming Distance optimization.
CoRR, 2020
Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020
2019
HERALD: Optimizing Heterogeneous DNN Accelerators for Edge Devices.
CoRR, 2019
SenseHAR: a robust virtual activity sensor for smartphones and wearables.
Proceedings of the 17th Conference on Embedded Networked Sensor Systems, 2019
2018
Rethinking Machine Learning Development and Deployment for Edge Devices.
CoRR, 2018
Federated Learning with Non-IID Data.
CoRR, 2018
CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs.
CoRR, 2018
Not All Ops Are Created Equal!
CoRR, 2018
Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018
Enabling deep learning at the IoT edge.
Proceedings of the International Conference on Computer-Aided Design, 2018
2017
System-Level Dynamic Variation Margining in Presence of Monitoring and Actuation.
IEEE Embed. Syst. Lett., 2017
Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks.
CoRR, 2017
Hello Edge: Keyword Spotting on Microcontrollers.
CoRR, 2017
PrivyNet: A Flexible Framework for Privacy-Preserving Deep Neural Network Training with A Fine-Grained Privacy Control.
CoRR, 2017
Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations.
CoRR, 2017
Exploiting data-dependence and Flip-Flop asymmetry for zero-overhead system soft error mitigation.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017
Cross-level Monte Carlo Framework for System Vulnerability Evaluation against Fault Attack.
Proceedings of the 54th Annual Design Automation Conference, 2017
2016
Resiliency in dynamically power managed designs.
Proceedings of the 35th International Conference on Computer-Aided Design, 2016
Multi-story power distribution networks for GPUs.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016
Hardware Reliability margining for the dark silicon era.
Proceedings of the 21st Asia and South Pacific Design Automation Conference, 2016
2015
NSF expedition on variability-aware software: Recent results and contributions.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
it Inf. Technol., 2015
Evaluating and exploiting impacts of dynamic power management schemes on system reliability.
Proceedings of the 2015 International Conference on Compilers, 2015
2014
Synthesis and Analysis of Design-Dependent Ring Oscillator (DDRO) Performance Monitors.
IEEE Trans. Very Large Scale Integr. Syst., 2014
SlackProbe: A Flexible and Efficient In Situ Timing Slack Monitoring Methodology.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2014
BTI-Gater: An Aging-Resilient Clock Gating Methodology.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2014
Accurate and inexpensive performance monitoring for variability-aware systems.
Proceedings of the 19th Asia and South Pacific Design Automation Conference, 2014
2013
SlackProbe: a low overhead in situ on-line timing slack monitoring methodology.
Proceedings of the Design, Automation and Test in Europe, 2013
VarEMU: An emulation testbed for variability-aware software.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2013
2012
DDRO: A novel performance monitoring methodology based on design-dependent ring oscillators.
Proceedings of the Thirteenth International Symposium on Quality Electronic Design, 2012