Guyue Huang

Orcid: 0000-0002-1280-4781

According to our database1, Guyue Huang authored at least 15 papers between 2020 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization.
CoRR, 2024

OPER: Optimality-Guided Embedding Table Parallelization for Large-scale Recommendation Model.
Proceedings of the 2024 USENIX Annual Technical Conference, 2024

2023
TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs.
Proceedings of the 2023 USENIX Annual Technical Conference, 2023

ALCOP: Automatic Load-Compute Pipelining in Deep Learning Compiler for AI-GPUs.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse Acceleration.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

2022
Enabling Data Movement and Computation Pipelining in Deep Learning Compiler.
CoRR, 2022

Heuristic Adaptability to Input Dynamics for SpMM on GPUs.
CoRR, 2022

LightSeq2: Accelerated Training for Transformer-Based Models on GPUs.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Understanding GNN Computational Graph: A Coordinated Computation, IO, and Memory Perspective.
Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

Shfl-BW: accelerating deep neural network inference with tensor-core aware weight pruning.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Heuristic adaptability to input dynamics for SpMM on CPUs.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

2021
Machine Learning for Electronic Design Automation: A Survey.
ACM Trans. Design Autom. Electr. Syst., 2021

Efficient Sparse Matrix Kernels based on Adaptive Workload-Balancing and Parallel-Reduction.
CoRR, 2021

Exploiting Online Locality and Reduction Parallelism for Sampled Dense Matrix Multiplication on GPUs.
Proceedings of the 39th IEEE International Conference on Computer Design, 2021

2020
GE-SpMM: general-purpose sparse matrix-matrix multiplication on GPUs for graph neural networks.
Proceedings of the International Conference for High Performance Computing, 2020


  Loading...