Xuehai Qian
According to our database1,
Xuehai Qian
authored at least 106 papers
between 2007 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
Falic: An FPGA-Based Multi-Scalar Multiplication Accelerator for Zero-Knowledge Proof.
IEEE Trans. Computers, December, 2024
NAPA: Intermediate-Level Variational Native-Pulse Ansatz for Variational Quantum Algorithms.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., June, 2024
Fine-Grained Embedding Dimension Optimization During Training for Recommender Systems.
CoRR, 2024
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024
2023
IEEE Trans. Computers, March, 2023
IEEE Trans. Cloud Comput., 2023
RobustState: Boosting Fidelity of Quantum State Preparation via Noise-Aware Variational Training.
CoRR, 2023
GNNPipe: Accelerating Distributed Full-Graph GNN Training with Pipelined Model Parallelism.
CoRR, 2023
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023
DecoMine: A Compilation-Based Graph Pattern Mining System with Pattern Decomposition.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023
2022
IEEE Trans. Parallel Distributed Syst., 2022
IEEE Trans. Neural Networks Learn. Syst., 2022
GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices Based on Fine-Grained Structured Weight Sparsity.
IEEE Trans. Pattern Anal. Mach. Intell., 2022
J. Comput. Sci. Technol., 2022
CoRR, 2022
Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2022
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022
2021
Graph processing and machine learning architectures with emerging memory technologies: a survey.
Sci. China Inf. Sci., 2021
ESCALATE: Boosting the Efficiency of Sparse CNN Accelerator with Kernel Decomposition.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021
FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021
GoSPA: An Energy-efficient High-performance Globally Optimized SParse Convolutional Neural Network Accelerator.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021
2020
IEEE Trans. Parallel Distributed Syst., 2020
Efficient Performance Estimation and Work-Group Size Pruning for OpenCL Kernels on GPUs.
IEEE Trans. Parallel Distributed Syst., 2020
Guest Editors' Introduction to the Special Issue on Machine Learning Architectures and Accelerators.
IEEE Trans. Computers, 2020
CoRR, 2020
CoRR, 2020
SympleGraph: distributed graph processing with precise loop-carried dependency guarantee.
Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2020
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020
TUPIM: A Transparent and Universal Processing-in-Memory Architecture for Unmodified Binaries.
Proceedings of the GLSVLSI '20: Great Lakes Symposium on VLSI 2020, 2020
DNNGuard: An Elastic Heterogeneous DNN Accelerator Architecture against Adversarial Attacks.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020
PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020
AsymNVM: An Efficient Framework for Implementing Persistent Data Structures on Asymmetric NVM Architecture.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020
2019
IEEE Trans. Parallel Distributed Syst., 2019
Distributed Graph Processing System and Processing-in-memory Architecture with Precise Loop-carried Dependency Guarantee.
ACM Trans. Comput. Syst., 2019
HEIF: Highly Efficient Stochastic Computing-Based Inference Framework for Deep Neural Networks.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019
A Stochastic-Computing based Deep Learning Framework using Adiabatic Quantum-Flux-Parametron SuperconductingTechnology.
CoRR, 2019
ReBNN: in-situ acceleration of binarized neural networks in ReRAM using complementary resistive cell.
CCF Trans. High Perform. Comput., 2019
IEEE Comput. Archit. Lett., 2019
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019
TPShare: a time-space sharing scheduling abstraction for shared cloud via vertical labels.
Proceedings of the 46th International Symposium on Computer Architecture, 2019
Proceedings of the 46th International Symposium on Computer Architecture, 2019
A stochastic-computing based deep learning framework using adiabatic quantum-flux-parametron superconducting technology.
Proceedings of the 46th International Symposium on Computer Architecture, 2019
Proceedings of the 39th IEEE International Conference on Distributed Computing Systems, 2019
A Hybrid Framework for Fast and Accurate GPU Performance Estimation through Source-Level Analysis and Trace-Based Simulation.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019
ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Methods of Multipliers.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019
2018
ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Method of Multipliers.
CoRR, 2018
CoRR, 2018
vSensor: leveraging fixed-workload snippets of programs for performance variance detection.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018
GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018
Datasize-Aware High Dimensional Configurations Auto-Tuning of In-Memory Cluster Computing.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018
Neu-NoC: A high-efficient interconnection network for accelerated neuromorphic systems.
Proceedings of the 23rd Asia and South Pacific Design Automation Conference, 2018
Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018
2017
CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-CirculantWeight Matrices.
CoRR, 2017
Squeezing out All the Value of Loaded Data: An Out-of-core Graph Processing System with Reduced Disk I/O.
Proceedings of the 2017 USENIX Annual Technical Conference, 2017
CirCNN: accelerating and compressing deep neural networks using block-circulant weight matrices.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017
SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017
2016
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016
Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, 2016
Proceedings of the 2016 International Conference on Supercomputing, 2016
2015
Sci. China Inf. Sci., 2015
2014
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014
Pacifier: Record and replay for relaxed-consistency multiprocessors with distributed directory protocol.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014
2013
BulkCommit: scalable and fast commit of atomic blocks in a lazy multiprocessor environment.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013
Rainbow: Efficient memory dependence recording with high replay parallelism for relaxed memory model.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2013
2012
Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012
2010
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010
2007
Proceedings of the 15th Euromicro International Conference on Parallel, 2007
Circuit implementation of floating point range reduction for trigonometric functions.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2007), 2007
Proceedings of the Architecture of Computing Systems, 2007