Zhihang Yuan

Orcid: 0000-0001-7846-0240

According to our database1, Zhihang Yuan authored at least 44 papers between 2017 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Latency-Aware Unified Dynamic Networks for Efficient Image Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

Post-training quantization for re-parameterization via coarse & fine weight splitting.
J. Syst. Archit., February, 2024

Stabilized activation scale estimation for precise Post-Training Quantization.
Neurocomputing, February, 2024

CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios.
CoRR, 2024

DiTFastAttn: Attention Compression for Diffusion Transformer Models.
CoRR, 2024

PillarHist: A Quantization-aware Pillar Feature Encoder based on Height-aware Histogram.
CoRR, 2024

I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models.
CoRR, 2024

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training.
CoRR, 2024

SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models.
CoRR, 2024

A Survey on Efficient Inference for Large Language Models.
CoRR, 2024

PillarTrack: Redesigning Pillar-based Transformer Network for Single Object Tracking on Point Clouds.
CoRR, 2024

LLM Inference Unveiled: Survey and Roofline Model Insights.
CoRR, 2024

WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains More.
CoRR, 2024

QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning.
CoRR, 2024

34.3 A 22nm 64kb Lightning-Like Hybrid Computing-in-Memory Macro with a Compressed Adder Tree and Analog-Storage Quantizers for Transformer and CNNs.
Proceedings of the IEEE International Solid-State Circuits Conference, 2024

Learning High-Frequency Functions Made Easy with Sinusoidal Positional Encoding.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

PB-LLM: Partially Binarized Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Algorithm-Hardware Co-Design for Energy-Efficient A/D Conversion in ReRAM-Based Accelerators.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

2023
FD-CNN: A Frequency-Domain FPGA Acceleration Scheme for CNN-Based Image-Processing Applications.
ACM Trans. Embed. Comput. Syst., November, 2023

ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models.
CoRR, 2023

PB-LLM: Partially Binarized Large Language Models.
CoRR, 2023

Improving Post-Training Quantization on Object Detection with Task Loss-Guided Lp Metric.
CoRR, 2023

RPTQ: Reorder-based Post-training Quantization for Large Language Models.
CoRR, 2023

Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance.
CoRR, 2023

MIM4DD: Mutual Information Maximization for Dataset Distillation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Post-Training Quantization on Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

PD-Quant: Post-Training Quantization Based on Prediction Difference Metric.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Flatfish: A Reinforcement Learning Approach for Application-Aware Address Mapping.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

Latency-aware Spatial-wise Dynamic Networks.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Enabling High-Quality Uncertainty Quantification in a PIM Designed for Bayesian Neural Network.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

PTQ4ViT: Post-training Quantization for Vision Transformers with Twin Uniform Quantization.
Proceedings of the Computer Vision - ECCV 2022, 2022

Tailor: removing redundant operations in memristive analog neural network accelerators.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

2021
PTQ4ViT: Post-Training Quantization Framework for Vision Transformers.
CoRR, 2021

PTQ-SL: Exploring the Sub-layerwise Post-training Quantization.
CoRR, 2021

METRO: A Software-Hardware Co-Design of Interconnections for Spatial DNN Accelerators.
CoRR, 2021

NAS4RRAM: neural network architecture search for inference on RRAM-based accelerators.
Sci. China Inf. Sci., 2021

Rapid Configuration of Asynchronous Recurrent Neural Networks for ASIC Implementations.
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021

Reconfigurable ASIC Implementation of Asynchronous Recurrent Neural Networks.
Proceedings of the 27th IEEE International Symposium on Asynchronous Circuits and Systems, 2021

2020
Crane: Mitigating Accelerator Under-utilization Caused by Sparsity Irregularities in CNNs.
IEEE Trans. Computers, 2020

ENAS4D: Efficient Multi-stage CNN Architecture Search for Dynamic Inference.
CoRR, 2020

S2DNAS: Transforming Static CNN Model for Dynamic Inference via Neural Architecture Search.
Proceedings of the Computer Vision - ECCV 2020, 2020

2017
Reducing Overfitting in Deep Convolutional Neural Networks Using Redundancy Regularizer.
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2017, 2017

FPGA-based accelerator for long short-term memory recurrent neural networks.
Proceedings of the 22nd Asia and South Pacific Design Automation Conference, 2017

Using Data Compression for Optimizing FPGA-Based Convolutional Neural Network Accelerators.
Proceedings of the Advanced Parallel Processing Technologies, 2017


  Loading...