Shijie Cao

Orcid: 0009-0000-2001-3763

According to our database1, Shijie Cao authored at least 22 papers between 2013 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs.
CoRR, 2024

LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration.
CoRR, 2024

T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge.
CoRR, 2024

Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation.
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

AFPQ: Asymmetric Floating Point Quantization for LLMs.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Inverse model and adaptive neighborhood search based cooperative optimizer for energy-efficient distributed flexible job shop scheduling.
Swarm Evol. Comput., December, 2023

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference.
CoRR, 2023

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models.
CoRR, 2023

NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors.
Proceedings of the 21st Annual International Conference on Mobile Systems, 2023

Efficient GPU Kernels for N: M-Sparse Weights in Deep Learning.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

Accurate and Structured Pruning for Efficient Automatic Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Adam Accumulation to Reduce Memory Footprints of Both Activations and Gradients for Large-Scale DNN Training.
Proceedings of the ECAI 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Kraków, Poland, 2023

2021
Dense-to-Sparse Gate for Mixture-of-Experts.
CoRR, 2021

Building a COVID-19 Literature Knowledge Graph Based on PubMed.
Proceedings of 2021 International Conference on Medical Imaging and Computer-Aided Diagnosis, 2021

2019
FlexSaaS: A Reconfigurable Accelerator for Web Search Selection.
ACM Trans. Reconfigurable Technol. Syst., 2019

Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Balanced Sparsity for Efficient DNN Inference on GPU.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2013
Information Technology Education Based on Cloud Computing.
Proceedings of the Information Computing and Applications - 4th International Conference, 2013


  Loading...