Yang Hu

Orcid: 0000-0001-6942-4395

Affiliations:
  • Tsinghua University, School of integrated circuits, China
  • University of Texas at Dallas, Department of Electrical and Computer Engineering, Richardson, TX, USA (former)
  • University of Florida, Department of Electrical and Computer Engineering, Gainesville, FL, USA (former)


According to our database1, Yang Hu authored at least 84 papers between 2010 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Ayaka: A Versatile Transformer Accelerator With Low-Rank Estimation and Heterogeneous Dataflow.
IEEE J. Solid State Circuits, October, 2024

CIMFormer: A Systolic CIM-Array-Based Transformer Accelerator With Token-Pruning-Aware Attention Reformulating and Principal Possibility Gathering.
IEEE J. Solid State Circuits, October, 2024

MulTCIM: Digital Computing-in-Memory-Based Multimodal Transformer Accelerator With Attention-Token-Bit Hybrid Sparsity.
IEEE J. Solid State Circuits, January, 2024

Optimizing the Micro-Architectural Performance of the Current and Emerging Edge Infrastructure.
IEEE Trans. Cloud Comput., 2024

SOFA: A Compute-Memory Optimized Sparsity Accelerator via Cross-Stage Coordinated Tiling.
CoRR, 2024

PALM: A Efficient Performance Simulator for Tiled Accelerators with Large-scale Model Training.
CoRR, 2024

Efficient Orchestrated AI Workflows Execution on Scale-out Spatial Architecture.
CoRR, 2024

A 52.01 TFLOPS/W Diffusion Model Processor with Inter-Time-Step Convolution-Attention-Redundancy Elimination and Bipolar Floating-Point Multiplication.
Proceedings of the IEEE Symposium on VLSI Technology and Circuits 2024, 2024

A 28nm 4170-TFLOPS/W/b and 195-TFLOPS/mm<sup>2</sup>/b Multiply-Free Fully-Digital Floating-Point Compute-In-Memory Macro with Mitchell's Approximation.
Proceedings of the IEEE Symposium on VLSI Technology and Circuits 2024, 2024

A 22nm 54.94TFLOPS/W Transformer Fine-Tuning Processor with Exponent-Stationary Re-Computing, Aggressive Linear Fitting, and Logarithmic Domain Multiplicating.
Proceedings of the IEEE Symposium on VLSI Technology and Circuits 2024, 2024

ETCIM: An Error-Tolerant Digital-CIM Processor with Redundancy-Free Repair and Run-Time MAC and Cell Error Correction.
Proceedings of the IEEE Symposium on VLSI Technology and Circuits 2024, 2024

15.1 A 0.795fJ/bit Physically-Unclonable Function-Protected TCAM for a Software-Defined Networking Switch.
Proceedings of the IEEE International Solid-State Circuits Conference, 2024

34.1 A 28nm 83.23TFLOPS/W POSIT-Based Compute-in-Memory Macro for High-Accuracy AI Applications.
Proceedings of the IEEE International Solid-State Circuits Conference, 2024

20.2 A 28nm 74.34TFLOPS/W BF16 Heterogenous CIM-Based Accelerator Exploiting Denoising-Similarity for Diffusion Models.
Proceedings of the IEEE International Solid-State Circuits Conference, 2024

Exploiting Similarity Opportunities of Emerging Vision AI Models on Hybrid Bonding Architecture.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

MECLA: Memory-Compute-Efficient LLM Accelerator with Scaling Sub-matrix Partition.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

A Tale of Two Domains: Exploring Efficient Architecture Design for Truly Autonomous Things.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

CAP: A General Purpose Computation-in-memory with Content Addressable Processing Paradigm.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

FQP: A Fibonacci Quantization Processor with Multiplication-Free Computing and Topological-Order Routing.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

GSPO: A Graph Substitution and Parallelization Joint Optimization Framework for DNN Inference.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

Dyn-Bitpool: A Two-sided Sparse CIM Accelerator Featuring a Balanced Workload Scheme and High CIM Macro Utilization.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

2023
Data Fusion in Infrastructure-Augmented Autonomous Driving System: Why? Where? and How?
IEEE Internet Things J., September, 2023

Towards an Efficient SIMD Virtual Radio Access Network (vRAN) and Edge Cloud System.
IEEE Trans. Cloud Comput., 2023

STAR: An STGCN ARchitecture for Skeleton-Based Human Action Recognition.
IEEE Trans. Circuits Syst. I Regul. Pap., 2023

Wafer-scale Computing: Advancements, Challenges, and Future Perspectives.
CoRR, 2023

WindMill: A Parameterized and Pluggable CGRA Implemented by DIAG Design Flow.
CoRR, 2023

Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane.
CoRR, 2023

Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling.
CoRR, 2023

A 28nm 77.35TOPS/W Similar Vectors Traceable Transformer Processor with Principal-Component-Prior Speculating and Dynamic Bit-wise Stationary Computing.
Proceedings of the 2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2023

Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

CV-CIM: A 28nm XOR-Derived Similarity-Aware Computation-in-Memory for Cost-Volume Construction.
Proceedings of the IEEE International Solid- State Circuits Conference, 2023

TensorCIM: A 28nm 3.7nJ/Gather and 8.3TFLOPS/W FP32 Digital-CIM Tensor Processor for MCM-CIM-Based Beyond-NN Acceleration.
Proceedings of the IEEE International Solid- State Circuits Conference, 2023

MuITCIM: A 28nm $2.24 \mu\mathrm{J}$/Token Attention-Token-Bit Hybrid Sparse Digital CIM-Based Accelerator for Multimodal Transformers.
Proceedings of the IEEE International Solid- State Circuits Conference, 2023

FACT: FFN-Attention Co-optimized Transformer Architecture with Eager Correlation Prediction.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

RMP-MEM: A HW/SW Reconfigurable Multi-Port Memory Architecture for Multi-PEA Oriented CGRA.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

A 28nm 49.7TOPS/W Sparse Transformer Processor with Random-Projection-Based Speculation, Multi-Stationary Dataflow, and Redundant Partial Product Elimination.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2023

CIMFormer: A 38.9TOPS/W-8b Systolic CIM-Array Based Transformer Processor with Token-Slimmed Attention Reformulating and Principal Possibility Gathering.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2023

A Systolic Computing-in-Memory Array based Accelerator with Predictive Early Activation for Spatiotemporal Convolutions.
Proceedings of the 5th IEEE International Conference on Artificial Intelligence Circuits and Systems, 2023

2022
Towards Efficient Architecture and Algorithms for Sensor Fusion.
CoRR, 2022

Demystifying Arch-hints for Model Extraction: An Attack in Unified Memory System.
CoRR, 2022

Towards a High-performance and Secure Memory System and Architecture for Emerging Applications.
CoRR, 2022

A synergistic reinforcement learning-based framework design in driving automation.
Comput. Electr. Eng., 2022

Brief Industry Paper: The Necessity of Adaptive Data Fusion in Infrastructure-Augmented Autonomous Driving System.
Proceedings of the 28th IEEE Real-Time and Embedded Technology and Applications Symposium, 2022

Enabling efficient deep convolutional neural network-based sensor fusion for autonomous driving.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

2021
Tackling Variabilities in Autonomous Driving.
CoRR, 2021

Enabling Efficient SIMD Acceleration for Virtual Radio Access Network.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

Characterization and Implication of Edge WebAssembly Runtimes.
Proceedings of the 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, 2021

Towards a Secure Integrated Heterogeneous Platform via Cooperative CPU/GPU Encryption.
Proceedings of the 30th IEEE Asian Test Symposium, 2021

Q-VR: system-level design for future mobile collaborative virtual reality.
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

2020
Enabling Latency-Aware Data Initialization for Integrated CPU/GPU Heterogeneous Platform.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

A Hardware-Based Architecture-Neutral Framework for Real-Time IoT Workload Forensics.
IEEE Trans. Computers, 2020

Co-Optimizing Performance and Memory FootprintVia Integrated CPU/GPU Memory Management, anImplementation on Autonomous Driving Platform.
CoRR, 2020

ANT-man: towards agile power management in the microservice era.
Proceedings of the International Conference for High Performance Computing, 2020

Co-Optimizing Performance and Memory Footprint Via Integrated CPU/GPU Memory Management, an Implementation on Autonomous Driving Platform.
Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium, 2020

Performance Analysis of 5G NR vRAN Platform and its Implications on Edge Computing.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

Understanding and Tackling the Hidden Memory Latency for Edge-based Heterogeneous Platform.
Proceedings of the 3rd USENIX Workshop on Hot Topics in Edge Computing, 2020

2019
Characterizing and Understanding the Architectural Implications of Cloudnative Edge NFV Workloads.
Proceedings of the IEEE Conference on Network Function Virtualization and Software Defined Networks, 2019

Characterizing and orchestrating NFV-ready servers for efficient edge data processing.
Proceedings of the International Symposium on Quality of Service, 2019

An FPGA Implementation of Stochastic Computing-Based LSTM.
Proceedings of the 37th IEEE International Conference on Computer Design, 2019

Architectural and Cost Implications of the 5G Edge NFV Systems.
Proceedings of the 37th IEEE International Conference on Computer Design, 2019

2018
Exploring Customizable Heterogeneous Power Distribution and Management for Datacenter.
IEEE Trans. Parallel Distributed Syst., 2018

A Flattened Metadata Service for Distributed File Systems.
IEEE Trans. Parallel Distributed Syst., 2018

Prediction Based Execution on Deep Neural Networks.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

In-Situ AI: Towards Autonomous and Incremental Deep Learning for IoT Systems.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

Enabling Efficient Network Service Function Chain Deployment on Heterogeneous Server Platform.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

2017
Oasis: Scaling Out Datacenter Sustainably and Economically.
IEEE Trans. Parallel Distributed Syst., 2017

LocoFS: a loosely-coupled metadata service for distributed file systems.
Proceedings of the International Conference for High Performance Computing, 2017

GaaS workload characterization under NUMA architecture for virtualized GPU.
Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017

Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

Towards "Full Containerization" in Containerized Network Function Virtualization.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

2016
RE-UPS: an adaptive distributed energy storage system for dynamically managing solar energy in green datacenters.
J. Supercomput., 2016

Towards efficient server architecture for virtualized network function deployment: Implications and implementations.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Towards an Adaptive Multi-Power-Source Datacenter.
Proceedings of the 2016 International Conference on Supercomputing, 2016

HOPE: Enabling Efficient Service Orchestration in Software-Defined Data Centers.
Proceedings of the 2016 International Conference on Supercomputing, 2016

Bridging the Semantic Gaps of GPU Acceleration for Scale-out CNN-based Big Data Processing: Think Big, See Small.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
Leveraging Heterogeneous Power for Improving Datacenter Efficiency and Resiliency.
IEEE Comput. Archit. Lett., 2015

HEB: deploying and managing hybrid energy buffers for improving datacenter efficiency and economy.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Towards sustainable in-situ server systems in the big data era.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

BAAT: Towards Dynamically Managing Battery Aging in Green Datacenters.
Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2015

2014
Towards Automated Provisioning and Emergency Handling in Renewable Energy Powered Datacenters.
J. Comput. Sci. Technol., 2014

Leveraging distributed UPS energy for managing solar energy powered data centers.
Proceedings of the International Green Computing Conference, 2014

2013
Enabling datacenter servers to scale out economically and sustainably.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

2012
Hybrid Wired/Wireless On-Chip Network Design for Application-Specific SoC.
IEICE Trans. Electron., 2012

2010
Mixed-level modeling for network on chip infrastructure in SoC design.
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2010


  Loading...