Guohao Dai
Orcid: 0000-0003-0849-3252
According to our database1,
Guohao Dai
authored at least 97 papers
between 2013 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
IEEE Trans. Circuits Syst. Video Technol., September, 2024
GRAPHIC: Gather and Process Harmoniously in the Cache With High Parallelism and Flexibility.
IEEE Trans. Emerg. Top. Comput., 2024
CoRR, 2024
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding.
CoRR, 2024
CoRR, 2024
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs.
CoRR, 2024
CoRR, 2024
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation.
CoRR, 2024
MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization.
CoRR, 2024
CoRR, 2024
Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better.
CoRR, 2024
CoRR, 2024
Proceedings of the 37th IEEE International System-on-Chip Conference, 2024
FlashDecoding++: Faster Large Language Model Inference with Asynchronization, Flat GEMM Optimization, and Heuristics.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs.
Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2024
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024
FlashEval: Towards Fast and Accurate Evaluation of Text-to-Image Diffusion Generative Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
FEASTA: A Flexible and Efficient Accelerator for Sparse Tensor Algebra in Machine Learning.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
2023
CoGNN: An Algorithm-Hardware Co-Design Approach to Accelerate GNN Inference With Minibatch Sampling.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., December, 2023
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2023
Gibbon: An Efficient Co-Exploration Framework of NN Model and Processing-In-Memory Architecture.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2023
Adaptive Multidimensional Parallel Fault Simulation Framework on Heterogeneous System.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., June, 2023
CCF Trans. High Perform. Comput., June, 2023
Serving Multi-DNN Workloads on FPGAs: A Coordinated Architecture, Scheduling, and Mapping Perspective.
IEEE Trans. Computers, May, 2023
Fast and Efficient 2-bit LLM Inference on GPU: 2/4/16-bit in a Weight Matrix with Asynchronous Dequantization.
CoRR, 2023
Proceedings of the ACM Web Conference 2023, 2023
History-Detr: Optimize Query Initialization Strategy by Using Historical Information and Kinematics.
Proceedings of the ACM Multimedia Asia 2023, 2023
HyperGef: A Framework Enabling Efficient Fusion for Hypergraph Neural Network on GPUs.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023
Exploiting Hardware Utilization and Adaptive Dataflow for Efficient Sparse Convolution in 3D Point Clouds.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023
DF-GAS: a Distributed FPGA-as-a-Service Architecture towards Billion-Scale Graph-based Approximate Nearest Neighbor Search.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023
TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023
Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
TSTC: Two-Level Sparsity Tensor Core Enabling both Algorithm Flexibility and Hardware Efficiency.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023
OPT: Optimal Proposal Transfer for Efficient Yield Optimization for Analog and SRAM Circuits.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023
A Point Transformer Accelerator with Fine-Grained Pipelines and Distribution-Aware Dynamic FPS.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023
Adam Accumulation to Reduce Memory Footprints of Both Activations and Gradients for Large-Scale DNN Training.
Proceedings of the ECAI 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Kraków, Poland, 2023
Minimizing Communication Conflicts in Network-On-Chip Based Processing-In-Memory Architecture.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023
PIM-HLS: An Automatic Hardware Generation Tool for Heterogeneous Processing-In-Memory-based Neural Network Accelerators.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
Processing-In-Hierarchical-Memory Architecture for Billion-Scale Approximate Nearest Neighbor Search.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
An Efficient Accelerator for Point-based and Voxel-based Point Cloud Neural Networks.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
Seeking the Yield Barrier: High-Dimensional SRAM Evaluation Through Optimal Manifold.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
Memory-Efficient and Real-Time SPAD-based dToF Depth Sensor with Spatial and Statistical Correlation.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
High-Dimensional Yield Estimation Using Shrinkage Deep Features and Maximization of Integral Entropy Reduction.
Proceedings of the 28th Asia and South Pacific Design Automation Conference, 2023
Proceedings of the 28th Asia and South Pacific Design Automation Conference, 2023
2022
A Unified FPGA Virtualization Framework for General-Purpose Deep Neural Networks in the Cloud.
ACM Trans. Reconfigurable Technol. Syst., 2022
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022
GRAPHIC: GatheR-And-Process in Highly parallel with In-SSD Compression Architecture in Very Large-Scale Graph.
CoRR, 2022
Understanding GNN Computational Graph: A Coordinated Computation, IO, and Memory Perspective.
Proceedings of the Fifth Conference on Machine Learning and Systems, 2022
Proceedings of the 23rd IEEE International Conference on Mobile Data Management, 2022
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022
Exploiting Parallelism with Vertex-Clustering in Processing-In-Memory-based GCN Accelerators.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022
A one-for-all and <i>o</i>(<i>v</i> log(<i>v</i> ))-cost solution for parallel merge style operations on sorted key-value arrays.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022
2021
Efficient Sparse Matrix Kernels based on Adaptive Workload-Balancing and Parallel-Reduction.
CoRR, 2021
Exploiting Online Locality and Reduction Parallelism for Sampled Dense Matrix Multiplication on GPUs.
Proceedings of the 39th IEEE International Conference on Computer Design, 2021
Rerec: In-ReRAM Acceleration with Access-Aware Mapping for Personalized Recommendation.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021
3M-AI: A Multi-task and Multi-core Virtualization Framework for Multi-FPGA AI Systems in the Cloud.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021
2020
GE-SpMM: general-purpose sparse matrix-matrix multiplication on GPUs for graph neural networks.
Proceedings of the International Conference for High Performance Computing, 2020
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020
MNSIM 2.0: A Behavior-Level Modeling Tool for Memristor-based Neuromorphic Computing Systems.
Proceedings of the GLSVLSI '20: Great Lakes Symposium on VLSI 2020, 2020
An Order Sampling Processing-in-Memory Architecture for Approximate Graph Pattern Mining.
Proceedings of the GLSVLSI '20: Great Lakes Symposium on VLSI 2020, 2020
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020
Proceedings of the 28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2020
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020
2019
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019
IEEE Trans. Computers, 2019
Centrifuge: Evaluating full-system HLS-generated heterogenous-accelerator SoCs using FPGA-Acceleration.
Proceedings of the International Conference on Computer-Aided Design, 2019
Proceedings of the 56th Annual Design Automation Conference 2019, 2019
Proceedings of the 56th Annual Design Automation Conference 2019, 2019
GraphSAR: a sparsity-aware processing-in-memory architecture for large-scale graph processing on ReRAMs.
Proceedings of the 24th Asia and South Pacific Design Automation Conference, 2019
2018
Proceedings of the International Symposium on Memory Systems, 2018
NewGraph: Balanced Large-Scale Graph Processing on FPGAs with Low Preprocessing Overheads.
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018
2017
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017
2016
Proceedings of the 32nd IEEE International Conference on Data Engineering, 2016
Proceedings of the 26th International Conference on Field Programmable Logic and Applications, 2016
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016
2015
Proceedings of the 2015 International Conference on Field Programmable Technology, 2015
2014
Proceedings of the 2014 International Conference on Field-Programmable Technology, 2014
2013
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013