Yang Hu
Orcid: 0000-0001-6942-4395Affiliations:
- Tsinghua University, School of integrated circuits, China
- University of Texas at Dallas, Department of Electrical and Computer Engineering, Richardson, TX, USA (former)
- University of Florida, Department of Electrical and Computer Engineering, Gainesville, FL, USA (former)
According to our database1,
Yang Hu
authored at least 84 papers
between 2010 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on linkedin.com
-
on orcid.org
-
on dl.acm.org
On csauthors.net:
Bibliography
2024
Ayaka: A Versatile Transformer Accelerator With Low-Rank Estimation and Heterogeneous Dataflow.
IEEE J. Solid State Circuits, October, 2024
CIMFormer: A Systolic CIM-Array-Based Transformer Accelerator With Token-Pruning-Aware Attention Reformulating and Principal Possibility Gathering.
IEEE J. Solid State Circuits, October, 2024
MulTCIM: Digital Computing-in-Memory-Based Multimodal Transformer Accelerator With Attention-Token-Bit Hybrid Sparsity.
IEEE J. Solid State Circuits, January, 2024
Optimizing the Micro-Architectural Performance of the Current and Emerging Edge Infrastructure.
IEEE Trans. Cloud Comput., 2024
SOFA: A Compute-Memory Optimized Sparsity Accelerator via Cross-Stage Coordinated Tiling.
CoRR, 2024
PALM: A Efficient Performance Simulator for Tiled Accelerators with Large-scale Model Training.
CoRR, 2024
CoRR, 2024
A 52.01 TFLOPS/W Diffusion Model Processor with Inter-Time-Step Convolution-Attention-Redundancy Elimination and Bipolar Floating-Point Multiplication.
Proceedings of the IEEE Symposium on VLSI Technology and Circuits 2024, 2024
A 28nm 4170-TFLOPS/W/b and 195-TFLOPS/mm<sup>2</sup>/b Multiply-Free Fully-Digital Floating-Point Compute-In-Memory Macro with Mitchell's Approximation.
Proceedings of the IEEE Symposium on VLSI Technology and Circuits 2024, 2024
A 22nm 54.94TFLOPS/W Transformer Fine-Tuning Processor with Exponent-Stationary Re-Computing, Aggressive Linear Fitting, and Logarithmic Domain Multiplicating.
Proceedings of the IEEE Symposium on VLSI Technology and Circuits 2024, 2024
ETCIM: An Error-Tolerant Digital-CIM Processor with Redundancy-Free Repair and Run-Time MAC and Cell Error Correction.
Proceedings of the IEEE Symposium on VLSI Technology and Circuits 2024, 2024
15.1 A 0.795fJ/bit Physically-Unclonable Function-Protected TCAM for a Software-Defined Networking Switch.
Proceedings of the IEEE International Solid-State Circuits Conference, 2024
34.1 A 28nm 83.23TFLOPS/W POSIT-Based Compute-in-Memory Macro for High-Accuracy AI Applications.
Proceedings of the IEEE International Solid-State Circuits Conference, 2024
20.2 A 28nm 74.34TFLOPS/W BF16 Heterogenous CIM-Based Accelerator Exploiting Denoising-Similarity for Diffusion Models.
Proceedings of the IEEE International Solid-State Circuits Conference, 2024
Exploiting Similarity Opportunities of Emerging Vision AI Models on Hybrid Bonding Architecture.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024
A Tale of Two Domains: Exploring Efficient Architecture Design for Truly Autonomous Things.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024
CAP: A General Purpose Computation-in-memory with Content Addressable Processing Paradigm.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024
FQP: A Fibonacci Quantization Processor with Multiplication-Free Computing and Topological-Order Routing.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024
GSPO: A Graph Substitution and Parallelization Joint Optimization Framework for DNN Inference.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024
Dyn-Bitpool: A Two-sided Sparse CIM Accelerator Featuring a Balanced Workload Scheme and High CIM Macro Utilization.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024
2023
Data Fusion in Infrastructure-Augmented Autonomous Driving System: Why? Where? and How?
IEEE Internet Things J., September, 2023
IEEE Trans. Cloud Comput., 2023
IEEE Trans. Circuits Syst. I Regul. Pap., 2023
CoRR, 2023
Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane.
CoRR, 2023
CoRR, 2023
A 28nm 77.35TOPS/W Similar Vectors Traceable Transformer Processor with Principal-Component-Prior Speculating and Dynamic Bit-wise Stationary Computing.
Proceedings of the 2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2023
Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023
CV-CIM: A 28nm XOR-Derived Similarity-Aware Computation-in-Memory for Cost-Volume Construction.
Proceedings of the IEEE International Solid- State Circuits Conference, 2023
TensorCIM: A 28nm 3.7nJ/Gather and 8.3TFLOPS/W FP32 Digital-CIM Tensor Processor for MCM-CIM-Based Beyond-NN Acceleration.
Proceedings of the IEEE International Solid- State Circuits Conference, 2023
MuITCIM: A 28nm $2.24 \mu\mathrm{J}$/Token Attention-Token-Bit Hybrid Sparse Digital CIM-Based Accelerator for Multimodal Transformers.
Proceedings of the IEEE International Solid- State Circuits Conference, 2023
FACT: FFN-Attention Co-optimized Transformer Architecture with Eager Correlation Prediction.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023
RMP-MEM: A HW/SW Reconfigurable Multi-Port Memory Architecture for Multi-PEA Oriented CGRA.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
A 28nm 49.7TOPS/W Sparse Transformer Processor with Random-Projection-Based Speculation, Multi-Stationary Dataflow, and Redundant Partial Product Elimination.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2023
CIMFormer: A 38.9TOPS/W-8b Systolic CIM-Array Based Transformer Processor with Token-Slimmed Attention Reformulating and Principal Possibility Gathering.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2023
A Systolic Computing-in-Memory Array based Accelerator with Predictive Early Activation for Spatiotemporal Convolutions.
Proceedings of the 5th IEEE International Conference on Artificial Intelligence Circuits and Systems, 2023
2022
CoRR, 2022
Towards a High-performance and Secure Memory System and Architecture for Emerging Applications.
CoRR, 2022
Comput. Electr. Eng., 2022
Brief Industry Paper: The Necessity of Adaptive Data Fusion in Infrastructure-Augmented Autonomous Driving System.
Proceedings of the 28th IEEE Real-Time and Embedded Technology and Applications Symposium, 2022
Enabling efficient deep convolutional neural network-based sensor fusion for autonomous driving.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022
2021
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021
Proceedings of the 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, 2021
Towards a Secure Integrated Heterogeneous Platform via Cooperative CPU/GPU Encryption.
Proceedings of the 30th IEEE Asian Test Symposium, 2021
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021
2020
Enabling Latency-Aware Data Initialization for Integrated CPU/GPU Heterogeneous Platform.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020
A Hardware-Based Architecture-Neutral Framework for Real-Time IoT Workload Forensics.
IEEE Trans. Computers, 2020
Co-Optimizing Performance and Memory FootprintVia Integrated CPU/GPU Memory Management, anImplementation on Autonomous Driving Platform.
CoRR, 2020
Proceedings of the International Conference for High Performance Computing, 2020
Co-Optimizing Performance and Memory Footprint Via Integrated CPU/GPU Memory Management, an Implementation on Autonomous Driving Platform.
Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium, 2020
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020
Understanding and Tackling the Hidden Memory Latency for Edge-based Heterogeneous Platform.
Proceedings of the 3rd USENIX Workshop on Hot Topics in Edge Computing, 2020
2019
Characterizing and Understanding the Architectural Implications of Cloudnative Edge NFV Workloads.
Proceedings of the IEEE Conference on Network Function Virtualization and Software Defined Networks, 2019
Characterizing and orchestrating NFV-ready servers for efficient edge data processing.
Proceedings of the International Symposium on Quality of Service, 2019
Proceedings of the 37th IEEE International Conference on Computer Design, 2019
Proceedings of the 37th IEEE International Conference on Computer Design, 2019
2018
Exploring Customizable Heterogeneous Power Distribution and Management for Datacenter.
IEEE Trans. Parallel Distributed Syst., 2018
IEEE Trans. Parallel Distributed Syst., 2018
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018
Enabling Efficient Network Service Function Chain Deployment on Heterogeneous Server Platform.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018
2017
IEEE Trans. Parallel Distributed Syst., 2017
Proceedings of the International Conference for High Performance Computing, 2017
Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017
2016
RE-UPS: an adaptive distributed energy storage system for dynamically managing solar energy in green datacenters.
J. Supercomput., 2016
Towards efficient server architecture for virtualized network function deployment: Implications and implementations.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016
Proceedings of the 2016 International Conference on Supercomputing, 2016
Proceedings of the 2016 International Conference on Supercomputing, 2016
Bridging the Semantic Gaps of GPU Acceleration for Scale-out CNN-based Big Data Processing: Think Big, See Small.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016
2015
IEEE Comput. Archit. Lett., 2015
HEB: deploying and managing hybrid energy buffers for improving datacenter efficiency and economy.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015
Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2015
2014
Towards Automated Provisioning and Emergency Handling in Renewable Energy Powered Datacenters.
J. Comput. Sci. Technol., 2014
Proceedings of the International Green Computing Conference, 2014
2013
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013
2012
IEICE Trans. Electron., 2012
2010
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2010