Haojun Xia
Orcid: 0000-0002-9384-0935
According to our database1,
Haojun Xia
authored at least 19 papers
between 2021 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design.
CoRR, 2024
Quant-LLM: Accelerating the Serving of Large Language Models via FP6-Centric Algorithm-System Co-Design on Modern GPUs.
Proceedings of the 2024 USENIX Annual Technical Conference, 2024
MonoNN: Enabling a New Monolithic Optimization Space for Neural Network Inference Tasks on Modern GPU-Centric Architectures.
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024
Proceedings of the 27th International Conference on Computer Supported Cooperative Work in Design, 2024
Proceedings of the 27th International Conference on Computer Supported Cooperative Work in Design, 2024
Proceedings of the 27th International Conference on Computer Supported Cooperative Work in Design, 2024
Proceedings of the 27th International Conference on Computer Supported Cooperative Work in Design, 2024
2023
Enabling Fast and Memory-Efficient Acceleration for Pattern Matching Workloads: The Lightweight Automata Processing Engine.
IEEE Trans. Computers, April, 2023
Flash-LLM: Enabling Low-Cost and Highly-Efficient Large Generative Model Inference With Unstructured Sparsity.
Proc. VLDB Endow., 2023
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks.
CoRR, 2023
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity.
CoRR, 2023
2022
A Secure and Efficient USB-based In-band Communication Interface between Host and BMC.
Proceedings of the IEEE Intl Conf on Parallel & Distributed Processing with Applications, 2022
Secure and Efficient BMC-Based Centralized Management Method for Large-Scale Data Centers.
Proceedings of the 24th IEEE Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, 2022
Proceedings of the 25th IEEE International Conference on Computer Supported Cooperative Work in Design, 2022
2021
Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021
AddrArmor: An Address-based Runtime Code-reuse Attack Mitigation for Shared Objects at the Binary-level.
Proceedings of the 2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), New York City, NY, USA, September 30, 2021
η-LSTM: Co-Designing Highly-Efficient Large LSTM Training via Exploiting Memory-Saving and Architectural Design Opportunities.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021
HyperKRP: A Kernel Runtime Security Architecture with A Tiny Hypervisor on Commodity Hardware.
Proceedings of the IEEE Global Communications Conference, 2021
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021