Song Han

Orcid: 0000-0002-4186-7618

Affiliations:
  • Massachusetts Institute of Technology, Cambridge, MA, USA
  • Stanford University, Stanford, USA (former)


According to our database1, Song Han authored at least 157 papers between 2015 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
NAPA: Intermediate-Level Variational Native-Pulse Ansatz for Variational Quantum Algorithms.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., June, 2024

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation.
CoRR, 2024

LongVILA: Scaling Long-Context Visual Language Models for Long Videos.
CoRR, 2024

Sparse Refinement for Efficient High-Resolution Semantic Segmentation.
CoRR, 2024

Wolf: Captioning Everything with a World Summarization Framework.
CoRR, 2024

VILA<sup>2</sup>: VILA Augmented VILA.
CoRR, 2024

X-VILA: Cross-Modality Alignment for Large Language Model.
CoRR, 2024

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving.
CoRR, 2024

Condition-Aware Neural Network for Controlled Image Generation.
CoRR, 2024

Tiny Machine Learning: Progress and Futures.
CoRR, 2024

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models.
CoRR, 2024

BitDelta: Your Fine-Tune May Only Be Worth One Bit.
CoRR, 2024

Qplacer: Frequency-Aware Component Placement for Superconducting Quantum Computers.
CoRR, 2024

AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

Atomique: A Quantum Compiler for Reconfigurable Neutral Atom Arrays.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Lightening-Transformer: A Dynamically-Operated Optically-Interconnected Photonic Transformer Accelerator.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

Q-Pilot: Field Programmable Qubit Array Compilation with Flying Ancillas.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VILA: On Pre-training for Visual Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Condition-Aware Neural Network for Controlled Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

Algorithm-System-Hardware Co-Design for Efficient 3D Deep Learning.
World Sci. Annu. Rev. Artif. Intell., 2023

VILA: On Pre-training for Visual Language Models.
CoRR, 2023

DGR: Tackling Drifted and Correlated Noise in Quantum Error Correction via Decoding Graph Re-weighting.
CoRR, 2023

Q-Pilot: Field Programmable Quantum Array Compilation with Flying Ancillas.
CoRR, 2023

Transformer-QEC: Quantum Error Correction Code Decoding with Transferable Transformers.
CoRR, 2023

RobustState: Boosting Fidelity of Quantum State Preparation via Noise-Aware Variational Training.
CoRR, 2023

FPQA-C: A Compilation Framework for Field Programmable Qubit Array.
CoRR, 2023

Retrospective: EIE: Efficient Inference Engine on Sparse and Compressed Neural Network.
CoRR, 2023

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration.
CoRR, 2023

DOTA: A Dynamically-Operated Photonic Tensor Core for Energy-Efficient Transformer Accelerator.
CoRR, 2023

Offsite-Tuning: Transfer Learning without Full Model.
CoRR, 2023

DISQ: Dynamic Iteration Skipping for Variational Quantum Algorithms.
Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2023

QuantumSEA: In-Time Sparse Exploration for Noise Adaptive Quantum Circuits.
Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2023

PockEngine: Sparse and Efficient Fine-tuning in a Pocket.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

A Fully-Integrated Energy-Scalable Transformer Accelerator Supporting Adaptive Model Configuration and Word Elimination for Language Understanding on Edge Devices.
Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2023

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models.
Proceedings of the International Conference on Machine Learning, 2023

EfficientViT: Lightweight Multi-Scale Attention for High-Resolution Dense Prediction.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Design of Quantum Computer Antivirus.
Proceedings of the IEEE International Symposium on Hardware Oriented Security and Trust, 2023

Hybrid Gate-Pulse Model for Variational Quantum Algorithms.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

TorchSparse++: Efficient Point Cloud Engine.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Machine Learning for Arterial Blood Pressure Prediction.
Proceedings of the Conference on Health, Inference, and Learning, 2023

2022
Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications.
ACM Trans. Design Autom. Electr. Syst., 2022

PVNAS: 3D Neural Architecture Search With Point-Voxel Convolution.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Devices.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

GAN Compression: Efficient Architectures for Interactive Conditional GANs.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models.
CoRR, 2022

QuEst: Graph Transformer for Quantum Circuit Reliability Estimation.
CoRR, 2022

TopGen: Topology-Aware Bottom-Up Generator for Variational Quantum Circuits.
CoRR, 2022

PAN: Pulse Ansatz on NISQ Machines.
CoRR, 2022

EfficientViT: Enhanced Linear Attention for High-Resolution Low-Computation Visual Recognition.
CoRR, 2022

On-chip QNN: Towards Efficient On-Chip Training of Quantum Neural Networks.
CoRR, 2022

Variational Quantum Pulse Learning.
Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2022

On-Device Training Under 256KB Memory.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

TorchSparse: Efficient Point Cloud Inference Engine.
Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

RobustAnalog: Fast Variation-Aware Analog Circuit Design Via Multi-task RL.
Proceedings of the 2022 ACM/IEEE Workshop on Machine Learning for CAD, 2022

VISTA 2.0: An Open, Data-driven Simulator for Multimodal Sensing and Policy Learning for Autonomous Vehicles.
Proceedings of the 2022 International Conference on Robotics and Automation, 2022

Network Augmentation for Tiny Deep Learning.
Proceedings of the Tenth International Conference on Learning Representations, 2022

TorchQuantum Case Study for Robust Quantum Circuits.
Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, 2022

QuantumNAS: Noise-Adaptive Search for Robust Quantum Circuits.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

QOC: quantum on-chip training with parameter shift and gradient pruning.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

QuantumNAT: quantum noise-aware training with noise injection, quantization and normalization.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

DeepVS: a deep learning approach for RF-based vital signs sensing.
Proceedings of the BCB '22: 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Northbrook, Illinois, USA, August 7, 2022

2021
MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning.
CoRR, 2021

RoQNN: Noise-Aware Training for Robust Quantum Neural Networks.
CoRR, 2021

TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Device.
CoRR, 2021

PatchNet - Short-range Template Matching for Efficient Video Processing.
CoRR, 2021

Delayed Gradient Averaging: Tolerate the Communication Latency for Federated Learning.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Memory-efficient Patch-based Inference for Tiny Deep Learning.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

IOS: Inter-Operator Scheduler for CNN Acceleration.
Proceedings of the Fourth Conference on Machine Learning and Systems, 2021

PointAcc: Efficient Point Cloud Accelerator.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

SemAlign: Annotation-Free Camera-LiDAR Calibration with Semantic Alignment Loss.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021

Efficient and Robust LiDAR-Based End-to-End Navigation.
Proceedings of the IEEE International Conference on Robotics and Automation, 2021

LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

NAAS: Neural Accelerator Architecture Search.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

Anycost GANs for Interactive Image Synthesis and Editing.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021


2020
Deep Leakage from Gradients.
Proceedings of the Federated Learning - Privacy and Incentive, 2020

Long Live TIME: Improving Lifetime and Security for NVM-Based Training-in-Memory Systems.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Energy Efficient On-Demand Dynamic Branch Prediction Models.
IEEE Trans. Computers, 2020

Scanning the Issue.
Proc. IEEE, 2020

Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey.
Proc. IEEE, 2020

AutoML for Architecting Efficient and Specialized Neural Networks.
IEEE Micro, 2020

Hardware-Centric AutoML for Mixed-Precision Quantization.
Int. J. Comput. Vis., 2020

Tiny Transfer Learning: Towards Memory-Efficient On-Device Learning.
CoRR, 2020

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy.
CoRR, 2020

Domain-specific hardware accelerators.
Commun. ACM, 2020

Differentiable Augmentation for Data-Efficient GAN Training.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

MCUNet: Tiny Deep Learning on IoT Devices.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Lite Transformer with Long-Short Range Attention.
Proceedings of the 8th International Conference on Learning Representations, 2020

Once-for-All: Train One Network and Specialize it for Efficient Deployment.
Proceedings of the 8th International Conference on Learning Representations, 2020

SpArch: Efficient Architecture for Sparse Matrix Multiplication.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution.
Proceedings of the Computer Vision - ECCV 2020, 2020

DataMix: Efficient Privacy-Preserving Edge-Cloud Inference.
Proceedings of the Computer Vision - ECCV 2020, 2020

GCN-RL Circuit Designer: Transferable Transistor Sizing with Graph Neural Networks and Reinforcement Learning.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

HAT: Hardware-Aware Transformers for Efficient Natural Language Processing.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019
Modeling and Optimization for Self-powered Non-volatile IoT Edge Devices with Ultra-low Harvesting Power.
ACM Trans. Cyber Phys. Syst., 2019

Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos.
CoRR, 2019

Once for All: Train One Network and Specialize it for Efficient Deployment.
CoRR, 2019

Design Automation for Efficient Deep Learning Computing.
CoRR, 2019

SysML: The New Frontier of Machine Learning Systems.
CoRR, 2019

Deep Leakage from Gradients.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

MicroNet for Efficient Language Modeling.
Proceedings of the NeurIPS 2019 Competition and Demonstration Track, 2019

Park: An Open Platform for Learning-Augmented Computer Systems.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Point-Voxel CNN for Efficient 3D Deep Learning.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Defensive Quantization: When Efficiency Meets Robustness.
Proceedings of the 7th International Conference on Learning Representations, 2019

ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware.
Proceedings of the 7th International Conference on Learning Representations, 2019

On-Device Image Classification with Proxyless Neural Architecture Search and Quantization-Aware Fine-Tuning.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

TSM: Temporal Shift Module for Efficient Video Understanding.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

A Fine-Grained Sparse Accelerator for Multi-Precision DNN.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

Fast Inference of Deep Neural Networks for Real-time Particle Physics Applications.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

A Configurable Multi-Precision CNN Computing Framework Based on Single Bit RRAM.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

HAQ: Hardware-Aware Automated Quantization With Mixed Precision.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

Learning to Design Circuits.
CoRR, 2018

HAQ: Hardware-Aware Automated Quantization.
CoRR, 2018

Temporal Shift Module for Efficient Video Understanding.
CoRR, 2018

Fast inference of deep neural networks in FPGAs for particle physics.
CoRR, 2018

Path-Level Network Transformation for Efficient Architecture Search.
Proceedings of the 35th International Conference on Machine Learning, 2018

Efficient Sparse-Winograd Convolutional Neural Networks.
Proceedings of the 6th International Conference on Learning Representations, 2018

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training.
Proceedings of the 6th International Conference on Learning Representations, 2018

AMC: AutoML for Model Compression and Acceleration on Mobile Devices.
Proceedings of the Computer Vision - ECCV 2018, 2018

Bandwidth-efficient deep learning.
Proceedings of the 55th Annual Design Automation Conference, 2018

Long live TIME: improving lifetime for training-in-memory engines by structured gradient sparsification.
Proceedings of the 55th Annual Design Automation Conference, 2018

2017
Software-Hardware Codesign for Efficient Neural Network Acceleration.
IEEE Micro, 2017

Deep Generative Adversarial Networks for Compressed Sensing Automates MRI.
CoRR, 2017

Exploring the Regularity of Sparse Structure in Convolutional Neural Networks.
CoRR, 2017

Trained Ternary Quantization.
Proceedings of the 5th International Conference on Learning Representations, 2017

Efficient Sparse-Winograd Convolutional Neural Networks.
Proceedings of the 5th International Conference on Learning Representations, 2017

DSD: Dense-Sparse-Dense Training for Deep Neural Networks.
Proceedings of the 5th International Conference on Learning Representations, 2017

ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA.
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

An FPGA Design Framework for CNN Sparsification and Acceleration.
Proceedings of the 25th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2017

Exploring the Granularity of Sparsity in Convolutional Neural Networks.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017

2016
Research for Practice: Cryptocurrencies, Blockchains, and Smart Contracts; Hardware for Deep Learning.
ACM Queue, 2016

Generate Image Descriptions based on Deep RNN and Memory Cells for Images Features.
CoRR, 2016

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size.
CoRR, 2016

DSD: Regularizing Deep Neural Networks with Dense-Sparse-Dense Training Flow.
CoRR, 2016

Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding.
Proceedings of the 4th International Conference on Learning Representations, 2016

ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA.
CoRR, 2016

Angel-Eye: A Complete Design Flow for Mapping CNN onto Customized Hardware.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2016

EIE: Efficient Inference Engine on Compressed Deep Neural Network.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Deep compression and EIE: Efficient inference engine on compressed deep neural network.
Proceedings of the 2016 IEEE Hot Chips 28 Symposium (HCS), 2016

From model to FPGA: Software-hardware co-design for efficient neural network acceleration.
Proceedings of the 2016 IEEE Hot Chips 28 Symposium (HCS), 2016

Real-Time Pedestrian Detection and Tracking on Customized Hardware.
Proceedings of the 14th ACM/IEEE Symposium on Embedded Systems for Real-Time Multimedia, 2016

2015
Learning both Weights and Connections for Efficient Neural Networks.
CoRR, 2015

On-Demand Dynamic Branch Prediction.
IEEE Comput. Archit. Lett., 2015

Learning both Weights and Connections for Efficient Neural Network.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015


  Loading...