2024
AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration.
GetMobile Mob. Comput. Commun., December, 2024
Efficient Deep Learning Computing: From TinyML to LargeLM
PhD thesis, 2024
Tiny Machine Learning: Progress and Futures.
CoRR, 2024
AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024
VILA: On Pre-training for Visual Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
2023
Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2023
VILA: On Pre-training for Visual Language Models.
CoRR, 2023
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration.
CoRR, 2023
Offsite-Tuning: Transfer Learning without Full Model.
CoRR, 2023
PockEngine: Sparse and Efficient Fine-tuning in a Pocket.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models.
Proceedings of the International Conference on Machine Learning, 2023
2022
Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications.
ACM Trans. Design Autom. Electr. Syst., 2022
TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Devices.
IEEE Trans. Pattern Anal. Mach. Intell., 2022
GAN Compression: Efficient Architectures for Interactive Conditional GANs.
IEEE Trans. Pattern Anal. Mach. Intell., 2022
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models.
CoRR, 2022
On-Device Training Under 256KB Memory.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Network Augmentation for Tiny Deep Learning.
Proceedings of the Tenth International Conference on Learning Representations, 2022
2021
MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning.
CoRR, 2021
TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Device.
CoRR, 2021
Memory-efficient Patch-based Inference for Tiny Deep Learning.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Anycost GANs for Interactive Image Synthesis and Editing.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
2020
AutoML for Architecting Efficient and Specialized Neural Networks.
IEEE Micro, 2020
Hardware-Centric AutoML for Mixed-Precision Quantization.
Int. J. Comput. Vis., 2020
APQ: Joint Search for Network Architecture, Pruning and Quantization Policy.
CoRR, 2020
Differentiable Augmentation for Data-Efficient GAN Training.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
MCUNet: Tiny Deep Learning on IoT Devices.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
Lite Transformer with Long-Short Range Attention.
Proceedings of the 8th International Conference on Learning Representations, 2020
Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution.
Proceedings of the Computer Vision - ECCV 2020, 2020
APQ: Joint Search for Network Architecture, Pruning and Quantization Policy.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
2019
Runtime Network Routing for Efficient Image Classification.
IEEE Trans. Pattern Anal. Mach. Intell., 2019
Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos.
CoRR, 2019
Design Automation for Efficient Deep Learning Computing.
CoRR, 2019
Defensive Quantization: When Efficiency Meets Robustness.
Proceedings of the 7th International Conference on Learning Representations, 2019
On-Device Image Classification with Proxyless Neural Architecture Search and Quantization-Aware Fine-Tuning.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019
TSM: Temporal Shift Module for Efficient Video Understanding.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
Joint Monocular 3D Vehicle Detection and Tracking.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
HAQ: Hardware-Aware Automated Quantization With Mixed Precision.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019
2018
HAQ: Hardware-Aware Automated Quantization.
CoRR, 2018
Temporal Shift Module for Efficient Video Understanding.
CoRR, 2018
Reinforcement Learning from Imperfect Demonstrations.
Proceedings of the 6th International Conference on Learning Representations, 2018
AMC: AutoML for Model Compression and Acceleration on Mobile Devices.
Proceedings of the Computer Vision - ECCV 2018, 2018
2017
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017
Learning Discriminative Aggregation Network for Video-Based Face Recognition.
Proceedings of the IEEE International Conference on Computer Vision, 2017
Consistent-Aware Deep Learning for Person Re-identification in a Camera Network.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017