We stand with Ukraine

We stand with Ukraine

Ningyi Xu

Orcid: 0009-0004-6809-7694

According to our database¹, Ningyi Xu authored at least 67 papers between 2005 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

A Point Transformer Accelerator With Distribution-Aware Heuristic Distance Calculation.

[BibT_eX]

[DOI]

,

,

,

,

,

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., February, 2025

DeepGate4: Efficient and Effective Representation Learning for Circuit Design at Scale.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, February, 2025

LLSM: LLM-enhanced Logic Synthesis Model with EDA-guided CoT Prompting, Hybrid Embedding and AIG-tailored Acceleration.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 30th Asia and South Pacific Design Automation Conference, 2025

2024

M2M: A Fine-Grained Mapping Framework to Accelerate Multiple DNNs on a Multi-Chiplet Architecture.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

IEEE Trans. Very Large Scale Integr. Syst., October, 2024

CoDA: A Co-Design Framework for Versatile and Efficient Attention Accelerators.

[BibT_eX]

[DOI]

,

,

,

IEEE Trans. Computers, August, 2024

Quantization and Hardware Architecture Co-Design for Matrix-Vector Multiplications of Large Language Models.

[BibT_eX]

[DOI]

,

,

,

IEEE Trans. Circuits Syst. I Regul. Pap., June, 2024

INDM: Chiplet-Based Interconnect Network and Dataflow Mapping for DNN Accelerators.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., April, 2024

A Precision-Scalable Deep Neural Network Accelerator With Activation Sparsity Exploitation.

[BibT_eX]

[DOI]

,

,

,

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., January, 2024

Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, 2024

MARCA: Mamba Accelerator with ReConfigurable Architecture.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, 2024

Cross Anything: General Quadruped Robot Navigation through Complex Terrains.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, 2024

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Shanghang Zhang

,

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

VEGA: Implementing a Versatile and Efficient Deep Learning Processor with Graph-Based ALU.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 42nd IEEE International Conference on Computer Design, 2024

Enhancing Vectorized Map Perception with Historical Rasterized Maps.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2024, 2024

Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Computer Vision - ECCV 2024, 2024

AFPQ: Asymmetric Floating Point Quantization for LLMs.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics, 2024

BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Low-Complexity Precision-Scalable Multiply-Accumulate Unit Architectures for Deep Neural Network Accelerators.

[BibT_eX]

[DOI]

,

,

,

,

IEEE Trans. Circuits Syst. II Express Briefs, April, 2023

NVP: A Flexible and Efficient Processor Architecture for Accelerating Diverse Computer Vision Tasks including DNN.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

IEEE Trans. Circuits Syst. II Express Briefs, 2023

Large Trajectory Models are Scalable Motion Predictors and Planners.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2023

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Shanghang Zhang

,

CoRR, 2023

History-Detr: Optimize Query Initialization Strategy by Using Historical Information and Kinematics.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the ACM Multimedia Asia 2023, 2023

Exploiting Hardware Utilization and Adaptive Dataflow for Efficient Sparse Convolution in 3D Point Clouds.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

SpOctA: A 3D Sparse Convolution Accelerator with Octree-Encoding-Based Map Search and Inherent Sparsity-Aware Processing.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

A Point Transformer Accelerator with Fine-Grained Pipelines and Distribution-Aware Dynamic FPS.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

Adam Accumulation to Reduce Memory Footprints of Both Activations and Gradients for Large-Scale DNN Training.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the ECAI 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Kraków, Poland, 2023

COSA:Co-Operative Systolic Arrays for Multi-head Attention Mechanism in Neural Network using Hybrid Data Reuse and Fusion Methodologies.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

FLNA: An Energy-Efficient Point Cloud Feature Learning Accelerator with Dataflow Decoupling.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

2022

Efficient Compression Methods for Wire-Spread-Based Stochastic Computing Deep Neural Networks.

[BibT_eX]

[DOI]

,

,

,

IEEE Trans. Circuits Syst. II Express Briefs, 2022

2021

A Low-Latency FPGA Implementation for Real-Time Object Detection.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE International Symposium on Circuits and Systems, 2021

CCASM: A Computation- and Communication-Aware Scheduling and Mapping Algorithm for NoC-Based DNN Accelerators.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 14th IEEE International Conference on ASIC, 2021

2020

Crane: Mitigating Accelerator Under-utilization Caused by Sparsity Irregularities in CNNs.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

IEEE Trans. Computers, 2020

Enhanced Power Decoupling Strategy for Virtual Synchronous Generator.

[BibT_eX]

[DOI]

,

,

,

,

,

IEEE Access, 2020

2019

FlexSaaS: A Reconfigurable Accelerator for Web Search Selection.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

ACM Trans. Reconfigurable Technol. Syst., 2019

2017

FxpNet: Training a deep convolutional neural network in fixed-point representation.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 25th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2017

The Feniks FPGA Operating System for Cloud Computing.

[BibT_eX]

[DOI]

,

Yongqiang Xiong

,

,

,

,

,

,

Thomas Moscibroda

Proceedings of the 8th Asia-Pacific Workshop on Systems, Mumbai, India, September 2, 2017, 2017

Using Data Compression for Optimizing FPGA-Based Convolutional Neural Network Accelerators.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Advanced Parallel Processing Technologies, 2017

2016

ClickNP: Highly flexible and High-performance Network Processing with Reconfigurable Hardware.

[BibT_eX]

[DOI]

,

,

Layong Larry Luo

,

,

,

,

Yongqiang Xiong

,

Proceedings of the ACM SIGCOMM 2016 Conference, Florianopolis, Brazil, August 22-26, 2016, 2016

Going Deeper with Embedded FPGA Platform for Convolutional Neural Network.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

2015

Real-Time High-Quality Stereo Vision System in FPGA.

[BibT_eX]

[DOI]

,

,

,

,

Feng-Hsiung Hsu

IEEE Trans. Circuits Syst. Video Technol., 2015

2014

Large scale recurrent neural network on GPU.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the 2014 International Joint Conference on Neural Networks, 2014

Energy efficient neural networks for big data analytics.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

2012

Probabilistic Brain Fiber Tractography on GPUs.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

The Colored Concept Map and Its Application in Learning Assistance Program.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Hybrid Learning - 5th International Conference, 2012

The Analysis of Research Hotspots and Fronts of Knowledge Visualization Based on CiteSpace II.

[BibT_eX]

[DOI]

,

,

Proceedings of the Hybrid Learning - 5th International Conference, 2012

Efficient Query Processing for Web Search Engine with FPGAs.

[BibT_eX]

[DOI]

,

,

,

,

,

Feng-Hsiung Hsu

Proceedings of the 2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines, 2012

2011

An FPGA-based accelerator for LambdaRank in Web search engines.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Feng-Hsiung Hsu

ACM Trans. Reconfigurable Technol. Syst., 2011

A heterogeneous accelerator platform for multi-subject voxel-based brain network analysis.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011

Gemma in April: A matrix-like parallel programming architecture on OpenCL.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the Design, Automation and Test in Europe, 2011

2010

FPGA and GPU implementation of large scale SpMV.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE 8th Symposium on Application Specific Processors, 2010

Efficient PageRank and SpMV Computation on AMD GPUs.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 39th International Conference on Parallel Processing, 2010

Making Human Connectome Faster: GPU Acceleration of Brain Network Analysis.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 16th IEEE International Conference on Parallel and Distributed Systems, 2010

A compression method for inverted index and its FPGA-based decompression solution.

[BibT_eX]

[DOI]

,

,

,

,

Feng-Hsiung Hsu

Proceedings of the International Conference on Field-Programmable Technology, 2010

LambdaRank acceleration for relevance ranking in web search engines (abstract only).

[BibT_eX]

[DOI]

,

,

,

,

,

,

Feng-Hsiung Hsu

Proceedings of the ACM/SIGDA 18th International Symposium on Field Programmable Gate Arrays, 2010

FPMR: MapReduce framework on FPGA.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the ACM/SIGDA 18th International Symposium on Field Programmable Gate Arrays, 2010

2009

FPGA Acceleration of RankBoost in Web Search Engines.

[BibT_eX]

[DOI]

,

,

,

,

Feng-Hsiung Hsu

ACM Trans. Reconfigurable Technol. Syst., 2009

Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units.

[BibT_eX]

[DOI]

,

,

Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

FTL design exploration in reconfigurable high-performance SSD for server applications.

[BibT_eX]

[DOI]

,

,

,

,

,

Seungryoul Maeng

,

Feng-Hsiung Hsu

Proceedings of the 23rd international conference on Supercomputing, 2009

RankBoost Acceleration on both NVIDIA CUDA and ATI Stream Platforms.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

FPGA-based acceleration of neural network for ranking in web search engine with a streaming architecture.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Feng-Hsiung Hsu

Proceedings of the 19th International Conference on Field Programmable Logic and Applications, 2009

An Efficient Lossless Compression Method for Internet Search Data in Hardware Accelerators.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the CSIE 2009, 2009 WRI World Congress on Computer Science and Information Engineering, March 31, 2009

2008

Distributed RankBoost Acceleration Using FPGA and MPI for Web Relevance Ranking.

[BibT_eX]

[DOI]

,

,

Feng-Hsiung Hsu

,

,

,

Proceedings of the 14th International Conference on Parallel and Distributed Systems, 2008

2007

FPGA-based Accelerator Design for RankBoost in Web Search Engines.

[BibT_eX]

[DOI]

,

,

,

,

Feng-Hsiung Hsu

Proceedings of the 2007 International Conference on Field-Programmable Technology, 2007

2006

A single receiving chip for DVB data broadcasting system.

[BibT_eX]

[DOI]

,

,

,

IEEE Trans. Consumer Electron., 2006

2005

The design and implementation of a DVB receiving chip with PCI interface.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

Loading...