Yu Wang

Orcid: 0000-0001-6108-5157

Affiliations:
  • Tsinghua University, Department of Electronic Engineering, TNList, Beijing, China (PhD 2007)


According to our database1, Yu Wang authored at least 450 papers between 2006 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Enhancing Timeliness in Asynchronous Vehicle Localization: A Signal-Multiplexing Network Measuring Approach.
IEEE Trans. Intell. Transp. Syst., October, 2024

Toward High-Accuracy and Real-Time Two-Stage Small Object Detection on FPGA.
IEEE Trans. Circuits Syst. Video Technol., September, 2024

TDPP: 2-D Permutation-Based Protection of Memristive Deep Neural Networks.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., March, 2024

Active Neural Topological Mapping for Multi-Agent Exploration.
IEEE Robotics Autom. Lett., January, 2024

GRAPHIC: Gather and Process Harmoniously in the Cache With High Parallelism and Flexibility.
IEEE Trans. Emerg. Top. Comput., 2024

OmniDrones: An Efficient and Flexible Platform for Reinforcement Learning in Drone Control.
IEEE Robotics Autom. Lett., 2024

An Efficient Flood Detection Method With Satellite Images Based on Algorithm-Hardware Co-Design.
IEEE Geosci. Remote. Sens. Lett., 2024

Few-shot In-Context Preference Learning Using Large Language Models.
CoRR, 2024

DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization.
CoRR, 2024

CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios.
CoRR, 2024

A Survey on Self-play Methods in Reinforcement Learning.
CoRR, 2024

Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs.
CoRR, 2024

MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression.
CoRR, 2024

Can LLMs Learn by Teaching? A Preliminary Study.
CoRR, 2024

DiTFastAttn: Attention Compression for Diffusion Transformer Models.
CoRR, 2024

FlightBench: A Comprehensive Benchmark of Spatial Planning Methods for Quadrotors.
CoRR, 2024

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation.
CoRR, 2024

MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization.
CoRR, 2024

HetHub: A Heterogeneous distributed hybrid training system for large-scale models.
CoRR, 2024

DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis.
CoRR, 2024

A Survey on Efficient Inference for Large Language Models.
CoRR, 2024

Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better.
CoRR, 2024

Localization matters too: How localization error affects UAV flight.
CoRR, 2024

Representation Learning for Frequent Subgraph Mining.
CoRR, 2024

LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K.
CoRR, 2024

DeSCo: Towards Generalizable and Scalable Deep Subgraph Counting.
Proceedings of the 17th ACM International Conference on Web Search and Data Mining, 2024

TCP: Triplet Contrastive-relationship Preserving for Class-Incremental Learning.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Efficient Deployment of Large Language Model across Cloud-Device Systems.
Proceedings of the 37th IEEE International System-on-Chip Conference, 2024

PIP: Detecting Adversarial Examples in Large Vision-Language Models via Attention Patterns of Irrelevant Probe Questions.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

FlashDecoding++: Faster Large Language Model Inference with Asynchronization, Flat GEMM Optimization, and Heuristics.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Evaluating Quantized Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

A Unified Sampling Framework for Solver Searching of Diffusion Probabilistic Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs.
Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2024

DyPIM: Dynamic-Inference-Enabled Processing - In-Memory Accelerator.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

Invited: Automatic Hardware/Software Design for High-Speed Autonomous Unmanned Aerial Vehicles Guided by a Flight Model.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

DySpMM: From Fix to Dynamic for Sparse Matrix-Matrix Multiplication Accelerators.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

EPIM: Efficient Processing-In-Memory Accelerators based on Epitome.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

FlashEval: Towards Fast and Accurate Evaluation of Text-to-Image Diffusion Generative Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination.
Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, 2024

FEASTA: A Flexible and Efficient Accelerator for Sparse Tensor Algebra in Machine Learning.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

UPDP: A Unified Progressive Depth Pruner for CNN and Vision Transformer.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Accelerate Multi-Agent Reinforcement Learning in Zero-Sum Games with Subgame Curriculum Learning.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
CoGNN: An Algorithm-Hardware Co-Design Approach to Accelerate GNN Inference With Minibatch Sampling.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., December, 2023

A Cooperative Relative Localization System for Distributed Multi-Agent Networks.
IEEE Trans. Veh. Technol., November, 2023

MNSIM 2.0: A Behavior-Level Modeling Tool for Processing-In-Memory Architectures.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2023

Gibbon: An Efficient Co-Exploration Framework of NN Model and Processing-In-Memory Architecture.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2023

Improving Sample Efficiency of Multiagent Reinforcement Learning With Nonexpert Policy for Flocking Control.
IEEE Internet Things J., August, 2023

A Generic Graph-Based Neural Architecture Encoding Scheme With Multifaceted Information.
IEEE Trans. Pattern Anal. Mach. Intell., July, 2023

Adaptive Multidimensional Parallel Fault Simulation Framework on Heterogeneous System.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., June, 2023

Sgap: towards efficient sparse tensor algebra compilation for GPU.
CCF Trans. High Perform. Comput., June, 2023

Serving Multi-DNN Workloads on FPGAs: A Coordinated Architecture, Scheduling, and Mapping Perspective.
IEEE Trans. Computers, May, 2023

Dual-Timescale Resource Allocation for Collaborative Service Caching and Computation Offloading in IoT Systems.
IEEE Trans. Ind. Informatics, 2023

FeFET-Based Logic-in-Memory Supporting SA-Free Write-Back and Fully Dynamic Access With Reduced Bitline Charging Activity and Recycled Bitline Charge.
IEEE Trans. Circuits Syst. I Regul. Pap., 2023

TaskFlex Solver for Multi-Agent Pursuit via Automatic Curriculum Learning.
CoRR, 2023

MASP: Scalable GNN-based Planning for Multi-Agent Navigation.
CoRR, 2023

FlashDecoding++: Faster Large Language Model Inference on GPUs.
CoRR, 2023

Large Trajectory Models are Scalable Motion Predictors and Planners.
CoRR, 2023

TDPP: Two-Dimensional Permutation-Based Protection of Memristive Deep Neural Networks.
CoRR, 2023

Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding.
CoRR, 2023

Dynamic Ensemble of Low-fidelity Experts: Mitigating NAS "Cold-Start".
CoRR, 2023

CogDL: A Comprehensive Library for Graph Deep Learning.
Proceedings of the ACM Web Conference 2023, 2023

HyperGef: A Framework Enabling Efficient Fusion for Hypergraph Neural Network on GPUs.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

Exploiting Hardware Utilization and Adaptive Dataflow for Efficient Sparse Convolution in 3D Point Clouds.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

DF-GAS: a Distributed FPGA-as-a-Service Architecture towards Billion-Scale Graph-based Approximate Nearest Neighbor Search.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

A Three-Step Multi-Resolution Time-to-Digital Converter.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2023

OMS-DPM: Optimizing the Model Schedule for Diffusion Probabilistic Models.
Proceedings of the International Conference on Machine Learning, 2023

Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

TSTC: Two-Level Sparsity Tensor Core Enabling both Algorithm Flexibility and Hardware Efficiency.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

A Point Transformer Accelerator with Fine-Grained Pipelines and Distribution-Aware Dynamic FPS.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

Realizing Extreme Endurance Through Fault-aware Wear Leveling and Improved Tolerance.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

UAV Swarm Planning accelerator on FPGA with low latency and fixed-point L-BFGS Quasi-Newton solver.
Proceedings of the International Conference on Field Programmable Technology, 2023

Minimizing Communication Conflicts in Network-On-Chip Based Processing-In-Memory Architecture.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

CLAP: Locality Aware and Parallel Triangle Counting with Content Addressable Memory.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

PIM-HLS: An Automatic Hardware Generation Tool for Heterogeneous Processing-In-Memory-based Neural Network Accelerators.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Processing-In-Hierarchical-Memory Architecture for Billion-Scale Approximate Nearest Neighbor Search.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

An Efficient Accelerator for Point-based and Voxel-based Point Cloud Neural Networks.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Memory-Efficient and Real-Time SPAD-based dToF Depth Sensor with Spatial and Statistical Correlation.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

MIPI 2023 Challenge on RGB+ToF Depth Completion: Methods and Results.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

TorchSparse++: Efficient Point Cloud Engine.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

MD-RadioMap: Multi-Drone Radio Map Building via Single-Anchor Ultra-Wideband Localization Network.
Proceedings of the 19th IEEE International Conference on Automation Science and Engineering, 2023

Asynchronous Multi-Agent Reinforcement Learning for Efficient Real-Time Multi-Robot Cooperative Exploration.
Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, 2023

Learning Graph-Enhanced Commander-Executor for Multi-Agent Navigation.
Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, 2023

Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed Cooperative-Competitive Games.
Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, 2023

NTGAT: A Graph Attention Network Accelerator with Runtime Node Tailoring.
Proceedings of the 28th Asia and South Pacific Design Automation Conference, 2023

Dynamic Ensemble of Low-Fidelity Experts: Mitigating NAS "Cold-Start".
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Memory-Oriented Structural Pruning for Efficient Image Restoration.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Ensemble-in-One: Ensemble Learning within Random Gated Networks for Enhanced Adversarial Robustness.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Soft Error Tolerant Convolutional Neural Networks on FPGAs With Ensemble Learning.
IEEE Trans. Very Large Scale Integr. Syst., 2022

A Unified FPGA Virtualization Framework for General-Purpose Deep Neural Networks in the Cloud.
ACM Trans. Reconfigurable Technol. Syst., 2022

Exploring the Potential of Low-Bit Training of Convolutional Neural Networks.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

INCAME: Interruptible CNN Accelerator for Multirobot Exploration.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

Corrections to "MR-TopoMap: Multi-Robot Exploration Based on Topological Map in Communication Restricted Environment".
IEEE Robotics Autom. Lett., 2022

MR-TopoMap: Multi-Robot Exploration Based on Topological Map in Communication Restricted Environment.
IEEE Robotics Autom. Lett., 2022

A Framework to Co-Optimize Robot Exploration and Task Planning in Unknown Environments.
IEEE Robotics Autom. Lett., 2022

Point Cloud Change Detection With Stereo V-SLAM: Dataset, Metrics and Baseline.
IEEE Robotics Autom. Lett., 2022

MR-GMMapping: Communication Efficient Multi-Robot Mapping System via Gaussian Mixture Model.
IEEE Robotics Autom. Lett., 2022

Weakly-supervised semantic segmentation with superpixel guided local and global consistency.
Pattern Recognit., 2022

A Learning-Based AoA Estimation Method for Device-Free Localization.
IEEE Commun. Lett., 2022

Cross-layer Attention Network for Fine-grained Visual Categorization.
CoRR, 2022

Primal-dual Estimator Learning: an Offline Constrained Moving Horizon Estimation Method with Feasibility and Near-optimality Guarantees.
CoRR, 2022

Heuristic Adaptability to Input Dynamics for SpMM on GPUs.
CoRR, 2022

Multi-UAV Coverage Planning with Limited Endurance in Disaster Environment.
CoRR, 2022

A Mobile Robot Experiment System with Lightweight Simulator Generator for Deep Reinforcement Learning Algorithm.
Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2022

MR-GMMExplore: Multi-Robot Exploration System in Unknown Environments based on Gaussian Mixture Model.
Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2022

A Benchmark of Planning-based Exploration Methods in Photo-Realistic 3D Simulator.
Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2022

The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

TA-GATES: An Encoding Scheme for Neural Network Architectures.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Understanding GNN Computational Graph: A Coordinated Computation, IO, and Memory Perspective.
Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

Optimizing Graph-based Approximate Nearest Neighbor Search: Stronger and Smarter.
Proceedings of the 23rd IEEE International Conference on Mobile Data Management, 2022

WESCO: Weight-encoded Reliability and Security Co-design for In-memory Computing Systems.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022

Efficient Autonomous Driving System Design: From Software to Hardware.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022

DIMMining: pruning-efficient and parallel graph mining on near-memory-computing.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

MOM: Microphone based 3D Orientation Measurement.
Proceedings of the 21st ACM/IEEE International Conference on Information Processing in Sensor Networks, 2022

Relative Distributed Formation and Obstacle Avoidance with Multi-agent Reinforcement Learning.
Proceedings of the 2022 International Conference on Robotics and Automation, 2022

Explore-Bench: Data Sets, Metrics and Evaluations for Frontier-based and Deep-reinforcement-learning-based Autonomous Exploration.
Proceedings of the 2022 International Conference on Robotics and Automation, 2022

Multi-UAV Disaster Environment Coverage Planning with Limited-Endurance.
Proceedings of the 2022 International Conference on Robotics and Automation, 2022

SAVE: Spatial-Attention Visual Exploration.
Proceedings of the 2022 IEEE International Conference on Image Processing, 2022

On the Performance Bound of Multi-Agent Formation with Localization Uncertainty.
Proceedings of the IEEE International Conference on Communications, 2022

A-U3D: A Unified 2D/3D CNN Accelerator on the Versal Platform for Disparity Estimation.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

CLOSE: Curriculum Learning on the Sharing Extent Towards Better One-Shot NAS.
Proceedings of the Computer Vision - ECCV 2022, 2022

Learning Efficient Multi-agent Cooperative Visual Exploration.
Proceedings of the Computer Vision - ECCV 2022, 2022

Exploiting Parallelism with Vertex-Clustering in Processing-In-Memory-based GCN Accelerators.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

Gibbon: Efficient Co-Exploration of NN Model and Processing-In-Memory Architecture.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

Heuristic adaptability to input dynamics for SpMM on CPUs.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

CodedVTR: Codebook-based Sparse Voxel Transformer with Geometric Guidance.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

FedCor: Correlation-Based Active Client Selection Strategy for Heterogeneous Federated Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

VMAPD: Generate Diverse Solutions for Multi-Agent Games with Recurrent Trajectory Discriminators.
Proceedings of the IEEE Conference on Games, CoG 2022, Beijing, 2022

Primal-Dual Estimator Learning Method with Feasibility and Near-Optimality Guarantees.
Proceedings of the 61st IEEE Conference on Decision and Control, 2022

A one-for-all and <i>o</i>(<i>v</i> log(<i>v</i> ))-cost solution for parallel merge style operations on sorted key-value arrays.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021
FTT-NAS: Discovering Fault-tolerant Convolutional Neural Architecture.
ACM Trans. Design Autom. Electr. Syst., 2021

Machine Learning for Electronic Design Automation: A Survey.
ACM Trans. Design Autom. Electr. Syst., 2021

Enabling Lower-Power Charge-Domain Nonvolatile In-Memory Computing With Ferroelectric FETs.
IEEE Trans. Circuits Syst. II Express Briefs, 2021

Rescuing RRAM-Based Computing From Static and Dynamic Faults.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

Multi-Agent Vulnerability Discovery for Autonomous Driving with Hazard Arbitration Reward.
CoRR, 2021

A drl based distributed formation control scheme with stream based collision avoidance.
CoRR, 2021

Efficient Sparse Matrix Kernels based on Adaptive Workload-Balancing and Parallel-Reduction.
CoRR, 2021

BoolNet: Minimizing The Energy Consumption of Binary Neural Networks.
CoRR, 2021

Ensemble-in-One: Learning Ensemble within Random Gated Networks for Enhanced Adversarial Robustness.
CoRR, 2021

FedGP: Correlation-Based Active Client Selection for Heterogeneous Federated Learning.
CoRR, 2021

The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games.
CoRR, 2021

CogDL: An Extensive Toolkit for Deep Learning on Graphs.
CoRR, 2021

Low-Cost Multi-Agent Navigation via Reinforcement Learning With Multi-Fidelity Simulator.
IEEE Access, 2021

Evaluating Efficient Performance Estimators of Neural Architectures.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

SMMR-Explore: SubMap-based Multi-Robot Exploration System with Multi-robot Multi-target Potential Field Exploration Method.
Proceedings of the IEEE International Conference on Robotics and Automation, 2021

Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization.
Proceedings of the 9th International Conference on Learning Representations, 2021

Enhancing Adversarial Robustness For Image Classification By Regularizing Class Level Feature Distribution.
Proceedings of the 2021 IEEE International Conference on Image Processing, 2021

Exploiting Online Locality and Reduction Parallelism for Sampled Dense Matrix Multiplication on GPUs.
Proceedings of the 39th IEEE International Conference on Computer Design, 2021

Rerec: In-ReRAM Acceleration with Access-Aware Mapping for Personalized Recommendation.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

FedSwap: A Federated Learning based 5G Decentralized Dynamic Spectrum Access System.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

Depression Detection by Analysing Eye Movements on Emotional Images.
Proceedings of the IEEE International Conference on Acoustics, 2021

On the Performance of Multi-Agent Detection in Mobile Delay-Sensitive Networks.
Proceedings of the IEEE Global Communications Conference, 2021

Cooperative Dynamic Coverage Control in Wireless Camera Sensor Networks with Anisotropic Perception.
Proceedings of the IEEE Global Communications Conference, 2021

3M-AI: A Multi-task and Multi-core Virtualization Framework for Multi-FPGA AI Systems in the Cloud.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

GAME: Gaussian Mixture Model Mapping and Navigation Engine on Embedded FPGA.
Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021

Hermes: Decentralized Dynamic Spectrum Access System for Massive Devices Deployment in 5G.
Proceedings of the EWSN '21: Proceedings of the 2021 International Conference on Embedded Wireless Systems and Networks, 2021

Adversarial Robustness Under Long-Tailed Distribution.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Reliability-Aware Training and Performance Modeling for Processing-In-Memory Systems.
Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021

Efficient Computing Platform Design for Autonomous Driving Systems.
Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021

MNSIM-TIME: Performance Modeling Framework for Training-In-Memory Architectures.
Proceedings of the 3rd IEEE International Conference on Artificial Intelligence Circuits and Systems, 2021

Ensemble of Pruned Networks for Reliable Classifiers.
Proceedings of the 3rd IEEE International Conference on Artificial Intelligence Circuits and Systems, 2021

2020
Algorithmic Fault Detection for RRAM-based Matrix Operations.
ACM Trans. Design Autom. Electr. Syst., 2020

One-Shot Refresh: A Low-Power Low-Congestion Approach for Dynamic Memories.
IEEE Trans. Circuits Syst., 2020

DNNVM: End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-Based CNN Accelerators.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Low Bit-Width Convolutional Neural Network on RRAM.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Long Live TIME: Improving Lifetime and Security for NVM-Based Training-in-Memory Systems.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Enabling Secure NVM-Based in-Memory Neural Network Computing by Sparse Fast Gradient Encryption.
IEEE Trans. Computers, 2020

WSODPB: Weakly supervised object detection with PCSNet and box regression module.
Neurocomputing, 2020

Nonparametric Topic Modeling with Neural Inference.
Neurocomputing, 2020

Semantic head enhanced pedestrian detection in a crowd.
Neurocomputing, 2020

Multi-shot NAS for Discovering Adversarially Robust Convolutional Neural Architectures at Targeted Capacities.
CoRR, 2020

aw_nas: A Modularized and Extensible NAS framework.
CoRR, 2020

BARS: Joint Search of Cell Topology and Layout for Accurate and Efficient Binary ARchitectures.
CoRR, 2020

A Surgery of the Neural Architecture Evaluators.
CoRR, 2020

Physical Adversarial Attack on Vehicle Detector in the Carla Simulator.
CoRR, 2020

Towards Lower Bit Multiplication for Convolutional Neural Network Training.
CoRR, 2020

FTT-NAS: Discovering Fault-Tolerant Neural Architecture.
CoRR, 2020

Efficient 16 Boolean logic and arithmetic based on bipolar oxide memristors.
Sci. China Inf. Sci., 2020

Optimizing CNN Accelerator With Improved Roofline Model.
Proceedings of the 33rd IEEE International System-on-Chip Conference, 2020

GE-SpMM: general-purpose sparse matrix-matrix multiplication on GPUs for graph neural networks.
Proceedings of the International Conference for High Performance Computing, 2020

DualLip: A System for Joint Lip Reading and Generation.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

33.2 A Fully Integrated Analog ReRAM Based 78.4TOPS/W Compute-In-Memory Chip with Fully Parallel MAC Computing.
Proceedings of the 2020 IEEE International Solid- State Circuits Conference, 2020

FeFET-based low-power bitwise logic-in-memory with direct write-back and data-adaptive dynamic sensing interface.
Proceedings of the ISLPED '20: ACM/IEEE International Symposium on Low Power Electronics and Design, 2020

CNN-based Monocular Decentralized SLAM on embedded FPGA.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

AVD-Net: Attention Value Decomposition Network For Deep Multi-Agent Reinforcement Learning.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

GraphSDH: A General Graph Sampling Framework with Distribution and Hierarchy.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

LessMine: Reducing Sample Space and Data Access for Dense Pattern Mining.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

Communication Lower Bound in Convolution Accelerators.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

MNSIM 2.0: A Behavior-Level Modeling Tool for Memristor-based Neuromorphic Computing Systems.
Proceedings of the GLSVLSI '20: Great Lakes Symposium on VLSI 2020, 2020

An Order Sampling Processing-in-Memory Architecture for Approximate Graph Pattern Mining.
Proceedings of the GLSVLSI '20: Great Lakes Symposium on VLSI 2020, 2020

Enable Efficient and Flexible FPGA Virtualization for Deep Learning in the Cloud.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

INCAME: INterruptible CNN Accelerator for Multi-robot Exploration.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

Enabling Efficient and Flexible FPGA Virtualization for Deep Learning in the Cloud.
Proceedings of the 28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2020

CNN-based Feature-point Extraction for Real-time Visual SLAM on Embedded FPGA.
Proceedings of the 28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2020

Distribution-Balanced Loss for Multi-label Classification in Long-Tailed Datasets.
Proceedings of the Computer Vision - ECCV 2020, 2020

A Generic Graph-Based Neural Architecture Encoding Scheme for Predictor-Based NAS.
Proceedings of the Computer Vision - ECCV 2020, 2020

DSA: More Efficient Budgeted Pruning via Differentiable Sparsity Allocation.
Proceedings of the Computer Vision - ECCV 2020, 2020

Reliable Classification with Ensemble Convolutional Neural Networks.
Proceedings of the IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, 2020

Reliability Evaluation of Pruned Neural Networks against Errors on Parameters.
Proceedings of the IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, 2020

Security Enhancement for RRAM Computing System through Obfuscating Crossbar Row Connections.
Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition, 2020

INCA: INterruptible CNN Accelerator for Multi-tasking in Embedded Robots.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Attentional Separation-and-Aggregation Network for Self-supervised Depth-Pose Learning in Dynamic Scenes.
Proceedings of the 4th Conference on Robot Learning, 2020

Black Box Search Space Profiling for Accelerator-Aware Neural Architecture Search.
Proceedings of the 25th Asia and South Pacific Design Automation Conference, 2020

Adaptive Circuit Approaches to Low-Power Multi-Level/Cell FeFET Memory.
Proceedings of the 25th Asia and South Pacific Design Automation Conference, 2020

An Energy-Efficient Quantized and Regularized Training Framework For Processing-In-Memory Accelerators.
Proceedings of the 25th Asia and South Pacific Design Automation Conference, 2020

FTT-NAS: Discovering Fault-Tolerant Neural Architecture.
Proceedings of the 25th Asia and South Pacific Design Automation Conference, 2020

Soft Error Mitigation for Deep Convolution Neural Network on FPGA Accelerators.
Proceedings of the 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems, 2020

Feature Variance Regularization: A Simple Way to Improve the Generalizability of Neural Networks.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Towards Energy-Efficient Systems for Artificial Intelligence in the Future.
Proceedings of the 19th IEEE International Conference on Cognitive Informatics & Cognitive Computing, 2020

2019
[DL] A Survey of FPGA-based Neural Network Inference Accelerators.
ACM Trans. Reconfigurable Technol. Syst., 2019

Fault-Tolerant Training Enabled by On-Line Fault Detection for RRAM-Based Neural Computing Systems.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

TIME: A Training-in-Memory Architecture for RRAM-Based Deep Neural Networks.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

HyVE: Hybrid Vertex-Edge Memory Hierarchy for Energy-Efficient Graph Processing.
IEEE Trans. Computers, 2019

Designing scrubbing strategy for memories suffering MCUs through the selection of optimal interleaving distance.
Int. J. Comput. Sci. Eng., 2019

A DenseNet feature-based loop closure method for visual SLAM system.
Proceedings of the 2019 IEEE International Conference on Robotics and Biomimetics, 2019

On-Chip Instruction Generation for Cross-Layer CNN Accelerator on FPGA.
Proceedings of the 2019 IEEE Computer Society Annual Symposium on VLSI, 2019

Metric Learning in Codebook Generation of Bag-of-Words for Person Re-identification.
Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods, 2019

An In-depth Comparison of Compilers for Deep Neural Networks on Hardware.
Proceedings of the 15th IEEE International Conference on Embedded Software and Systems, 2019

HDC-IM: Hyperdimensional Computing In-Memory Architecture based on RRAM.
Proceedings of the 26th IEEE International Conference on Electronics, Circuits and Systems, 2019

Augmentation Invariant Training.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

A General Logic Synthesis Framework for Memristor-based Logic Design.
Proceedings of the International Conference on Computer-Aided Design, 2019

Enabling Secure in-Memory Neural Network Computing by Sparse Fast Gradient Encryption.
Proceedings of the International Conference on Computer-Aided Design, 2019

A Fine-Grained Sparse Accelerator for Multi-Precision DNN.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

DNNVM: End-to-End Compiler Leveraging Operation Fusion on FPGA-based CNN Accelerators.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

Compressed CNN Training with FPGA-based Accelerator.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

A Configurable Multi-Precision CNN Computing Framework Based on Single Bit RRAM.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Memory-Bound Proof-of-Work Acceleration for Blockchain Applications.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Fault tolerance in neuromorphic computing systems.
Proceedings of the 24th Asia and South Pacific Design Automation Conference, 2019

Learning the sparsity for ReRAM: mapping and pruning sparse neural network for ReRAM based accelerator.
Proceedings of the 24th Asia and South Pacific Design Automation Conference, 2019

GraphSAR: a sparsity-aware processing-in-memory architecture for large-scale graph processing on ReRAMs.
Proceedings of the 24th Asia and South Pacific Design Automation Conference, 2019

Multi-task ADAS system on FPGA.
Proceedings of the IEEE International Conference on Artificial Intelligence Circuits and Systems, 2019

2018
Instruction Driven Cross-layer CNN Accelerator for Fast Detection on FPGA.
ACM Trans. Reconfigurable Technol. Syst., 2018

Bidirectional Database Storage and SQL Query Exploiting RRAM-Based Process-in-Memory Structure.
ACM Trans. Storage, 2018

Towards Real-Time Object Detection on Embedded Systems.
IEEE Trans. Emerg. Top. Comput., 2018

MNSIM: Simulation Platform for Memristor-Based Neuromorphic Computing System.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

Optimizing Cache Bypassing and Warp Scheduling for GPUs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

Hardware Trojan Detection in Third-Party Digital Intellectual Property Cores by Multilevel Feature Analysis.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

Stuck-at Fault Tolerance in RRAM Computing Systems.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2018

Nonparametric Topic Modeling with Neural Inference.
CoRR, 2018

PoTrojan: powerful neural-level trojan designs in deep learning models.
CoRR, 2018

Low power driven loop tiling for RRAM crossbar-based CNN.
Proceedings of the 33rd Annual ACM Symposium on Applied Computing, 2018

GraphIA: an in-situ accelerator for large-scale graph processing.
Proceedings of the International Symposium on Memory Systems, 2018

Fault Tolerance for RRAM-Based Matrix Operations.
Proceedings of the IEEE International Test Conference, 2018

Hu-Fu: Hardware and Software Collaborative Attack Framework Against Neural Networks.
Proceedings of the 2018 IEEE Computer Society Annual Symposium on VLSI, 2018

RRAM Based Buffer Design for Energy Efficient CNN Accelerator.
Proceedings of the 2018 IEEE Computer Society Annual Symposium on VLSI, 2018

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training.
Proceedings of the 6th International Conference on Learning Representations, 2018

An Efficient Reconfigurable Framework for General Purpose CNN-RNN Models on FPGAs.
Proceedings of the 23rd IEEE International Conference on Digital Signal Processing, 2018

Mixed size crossbar based RRAM CNN accelerator with overlapped mapping method.
Proceedings of the International Conference on Computer-Aided Design, 2018

Real-Time Object Detection and Semantic Segmentation Hardware System with Deep Learning Networks.
Proceedings of the International Conference on Field-Programmable Technology, 2018

NewGraph: Balanced Large-Scale Graph Processing on FPGAs with Low Preprocessing Overheads.
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

Design of fault-tolerant neuromorphic computing systems.
Proceedings of the 23rd IEEE European Test Symposium, 2018

Real-time object detection towards high power efficiency.
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

A peripheral circuit reuse structure integrated with a retimed data flow for low power RRAM crossbar-based CNN.
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

Rescuing memristor-based computing with non-linear resistance levels.
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

HyVE: Hybrid vertex-edge memory hierarchy for energy-efficient graph processing.
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

Long live TIME: improving lifetime for training-in-memory engines by structured gradient sparsification.
Proceedings of the 55th Annual Design Automation Conference, 2018

Training low bitwidth convolutional neural network on RRAM.
Proceedings of the 23rd Asia and South Pacific Design Automation Conference, 2018

2017
The First 25 Years of the FPL Conference: Significant Papers.
ACM Trans. Reconfigurable Technol. Syst., 2017

Exploiting Stable Data Dependency in Stream Processing Acceleration on FPGAs.
ACM Trans. Embed. Comput. Syst., 2017

Maximum Energy Efficiency Tracking Circuits for Converter-Less Energy Harvesting Sensor Nodes.
IEEE Trans. Circuits Syst. II Express Briefs, 2017

A Compact Memristor-Based Dynamic Synapse for Spiking Neural Networks.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2017

A General Framework for Hardware Trojan Detection in Digital Circuits by Statistical Learning Algorithms.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2017

Software-Hardware Codesign for Efficient Neural Network Acceleration.
IEEE Micro, 2017

A ReRAM-Based Nonvolatile Flip-Flop With Self-Write-Termination Scheme for Frequent-OFF Fast-Wake-Up Nonvolatile Processors.
IEEE J. Solid State Circuits, 2017

A Survey of FPGA Based Neural Network Accelerator.
CoRR, 2017

A Deep Learning Approach for Blind Drift Calibration of Sensor Networks.
CoRR, 2017

Exploring the Regularity of Sparse Structure in Convolutional Neural Networks.
CoRR, 2017

Energy-efficient SQL query exploiting RRAM-based process-in-memory structure.
Proceedings of the IEEE 6th Non-Volatile Memory Systems and Applications Symposium, 2017

Low-overhead implementation of logic encryption using gate replacement techniques.
Proceedings of the 18th International Symposium on Quality Electronic Design, 2017

Circuit design for beyond von Neumann applications using emerging memory: From nonvolatile logics to neuromorphic computing.
Proceedings of the 18th International Symposium on Quality Electronic Design, 2017

Streaming sorting network based BWT acceleration on FPGA for lossless compression.
Proceedings of the International Conference on Field Programmable Technology, 2017

Instruction driven cross-layer CNN accelerator with winograd transformation on FPGA.
Proceedings of the International Conference on Field Programmable Technology, 2017

ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA.
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture.
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

An FPGA Design Framework for CNN Sparsification and Acceleration.
Proceedings of the 25th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2017

Fault-Tolerant Training with On-Line Fault Detection for RRAM-Based Neural Computing Systems.
Proceedings of the 54th Annual Design Automation Conference, 2017

TIME: A Training-in-memory Architecture for Memristor-based Deep Neural Networks.
Proceedings of the 54th Annual Design Automation Conference, 2017

Exploring the Granularity of Sparsity in Convolutional Neural Networks.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017

Binary convolutional neural network on RRAM.
Proceedings of the 22nd Asia and South Pacific Design Automation Conference, 2017

Computation-oriented fault-tolerance schemes for RRAM computing systems.
Proceedings of the 22nd Asia and South Pacific Design Automation Conference, 2017

2016
Guest Editorial: Design and Applications of Neuromorphic Computing System.
IEEE Trans. Multi Scale Comput. Syst., 2016

Harmonica: A Framework of Heterogeneous Computing Systems With Memristor-Based Neuromorphic Computing Accelerators.
IEEE Trans. Circuits Syst. I Regul. Pap., 2016

A Unified Methodology for Designing Hardware Random Number Generators Based on Any Probability Distribution.
IEEE Trans. Circuits Syst. II Express Briefs, 2016

Solar Power Prediction Assisted Intra-task Scheduling for Nonvolatile Sensor Nodes.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2016

Modeling Random Telegraph Noise as a Randomness Source and its Application in True Random Number Generation.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2016

All Spin Artificial Neural Networks Based on Compound Spintronic Synapse and Neuron.
IEEE Trans. Biomed. Circuits Syst., 2016

Technological Exploration of RRAM Crossbar Array for Matrix-Vector Multiplication.
J. Comput. Sci. Technol., 2016

Editorial: Special Issue on The 14th International Conference on Computer-Aided Design and Computer Graphics (CAD/Graphics 2015).
Integr., 2016

Leveraging Stochastic Memristor Devices in Neuromorphic Hardware Systems.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2016

Exploring the Precision Limitation for RRAM-Based Analog Approximate Computing.
IEEE Des. Test, 2016

ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA.
CoRR, 2016

Global and regional cortical connectivity maturation index (CCMI) of developmental human brain with quantification of short-range association tracts.
Proceedings of the Medical Imaging 2016: Biomedical Applications in Molecular, Structural, and Functional Imaging, San Diego, California, United States, 27 February, 2016

Angel-Eye: A Complete Design Flow for Mapping CNN onto Customized Hardware.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2016

4.7 A 65nm ReRAM-enabled nonvolatile processor with 6× reduction in restore time and 4× higher clock frequency using adaptive data retention and self-write-termination nonvolatile logic.
Proceedings of the 2016 IEEE International Solid-State Circuits Conference, 2016

Low power Convolutional Neural Networks on a chip.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2016

Heterogeneous systems with reconfigurable neuromorphic computing accelerators.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2016

PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

NXgraph: An efficient graph processing system on a single machine.
Proceedings of the 32nd IEEE International Conference on Data Engineering, 2016

A data locality-aware design framework for reconfigurable sparse matrix-vector multiplication kernel.
Proceedings of the 35th International Conference on Computer-Aided Design, 2016

From model to FPGA: Software-hardware co-design for efficient neural network acceleration.
Proceedings of the 2016 IEEE Hot Chips 28 Symposium (HCS), 2016

Approximate Frequent Itemset Mining for streaming data on FPGA.
Proceedings of the 26th International Conference on Field Programmable Logic and Applications, 2016

SRI-SURF: A better SURF powered by scaled-RAM interpolator on FPGA.
Proceedings of the 26th International Conference on Field Programmable Logic and Applications, 2016

Going Deeper with Embedded FPGA Platform for Convolutional Neural Network.
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

FPGP: Graph Processing Framework on FPGA A Case Study of Breadth-First Search.
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

Real-Time Pedestrian Detection and Tracking on Customized Hardware.
Proceedings of the 14th ACM/IEEE Symposium on Embedded Systems for Real-Time Multimedia, 2016

MNSIM: Simulation platform for memristor-based neuromorphic computing system.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

Sparsity-oriented sparse solver design for circuit simulation.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

Switched by input: power efficient structure for RRAM-based convolutional neural network.
Proceedings of the 53rd Annual Design Automation Conference, 2016

RRAM based learning acceleration.
Proceedings of the 2016 International Conference on Compilers, 2016

Performance-centric register file design for GPUs using racetrack memory.
Proceedings of the 21st Asia and South Pacific Design Automation Conference, 2016

2015
HS3-DPG: Hierarchical Simulation for 3-D P/G Network.
IEEE Trans. Very Large Scale Integr. Syst., 2015

Whitespace-Aware TSV Arrangement in 3-D Clock Tree Synthesis.
IEEE Trans. Very Large Scale Integr. Syst., 2015

GPU-Accelerated Sparse LU Factorization for Circuit Simulation with Performance Modeling.
IEEE Trans. Parallel Distributed Syst., 2015

Real-Time High-Quality Stereo Vision System in FPGA.
IEEE Trans. Circuits Syst. Video Technol., 2015

RRAM-Based Analog Approximate Computing.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2015

FASTrust: Feature analysis for third-party IP trust verification.
Proceedings of the 2015 IEEE International Test Conference, 2015

Leveraging emerging nonvolatile memory in high-level synthesis with loop transformations.
Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2015

Energy-efficient neuromorphic computation based on compound spin synapse with stochastic learning.
Proceedings of the 2015 IEEE International Symposium on Circuits and Systems, 2015

Hi-fi playback: tolerating position errors in shift operations of racetrack memory.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Rebooting Computing and Low-Power Image Recognition Challenge.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2015

Coordinated static and dynamic cache bypassing for GPUs.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Energy Efficient RRAM Spiking Neural Network for Real Time Classification.
Proceedings of the 25th edition on Great Lakes Symposium on VLSI, GLVLSI 2015, Pittsburgh, PA, USA, May 20, 2015

A self-aware data compression system on FPGA in Hadoop.
Proceedings of the 2015 International Conference on Field Programmable Technology, 2015

An FPGA-based real-time simultaneous localization and mapping system.
Proceedings of the 2015 International Conference on Field Programmable Technology, 2015

Significant papers from the first 25 years of the FPL conference.
Proceedings of the 25th International Conference on Field Programmable Logic and Applications, 2015

EURECA: On-Chip Configuration Generation for Effective Dynamic Data Access.
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

FPGA Acceleration of Recurrent Neural Network Based Language Model.
Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2015

Spiking neural network with RRAM: can we use it for real-world application?
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

A fast parallel sparse solver for SPICE-based circuit simulators.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

RENO: a high-efficient reconfigurable neuromorphic computing accelerator design.
Proceedings of the 52nd Annual Design Automation Conference, 2015

Merging the interface: power, area and accuracy co-optimization for RRAM crossbar-based mixed-signal computing system.
Proceedings of the 52nd Annual Design Automation Conference, 2015

A STT-RAM-based low-power hybrid register file for GPGPUs.
Proceedings of the 52nd Annual Design Automation Conference, 2015

An accurate and low-cost PM2.5 estimation method based on Artificial Neural Network.
Proceedings of the 20th Asia and South Pacific Design Automation Conference, 2015

Modeling and optimization of low power resonant clock mesh.
Proceedings of the 20th Asia and South Pacific Design Automation Conference, 2015

Technological exploration of RRAM crossbar array for matrix-vector multiplication.
Proceedings of the 20th Asia and South Pacific Design Automation Conference, 2015

2014
Hardware Acceleration for an Accurate Stereo Vision System Using Mini-Census Adaptive Support Region.
ACM Trans. Embed. Comput. Syst., 2014

PS3-RAM: A Fast Portable and Scalable Statistical STT-RAM Reliability/Energy Analysis Method.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2014

On-Chip Hybrid Power Supply System for Wireless Sensor Nodes.
ACM J. Emerg. Technol. Comput. Syst., 2014

Exploration of Electrical and Novel Optical Chip-to-Chip Interconnects.
IEEE Des. Test, 2014

Efficient region-aware P/G TSV planning for 3D ICs.
Proceedings of the Fifteenth International Symposium on Quality Electronic Design, 2014

Energy efficient spiking neural network design with RRAM devices.
Proceedings of the 2014 International Symposium on Integrated Circuits (ISIC), 2014

Large scale recurrent neural network on GPU.
Proceedings of the 2014 International Joint Conference on Neural Networks, 2014

A universal FPGA-based floating-point matrix processor for mobile systems.
Proceedings of the 2014 International Conference on Field-Programmable Technology, 2014

Online scheduling for FPGA computation in the Cloud.
Proceedings of the 2014 International Conference on Field-Programmable Technology, 2014

Accelerating frequent item counting with FPGA.
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

ICE: Inline calibration for memristor crossbar-based computing engine.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

Energy efficient neural networks for big data analytics.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

Design Methodologies for 3D Mixed Signal Integrated Circuits: a Practical 12-bit SAR ADC Design Case.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

Run-Time Technique for Simultaneous Aging and Power Optimization in GPGPUs.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

Enabling FPGAs in the cloud.
Proceedings of the Computing Frontiers Conference, CF'14, 2014

Training itself: Mixed-signal training acceleration for memristor-based neural network.
Proceedings of the 19th Asia and South Pacific Design Automation Conference, 2014

The stochastic modeling of TiO2 memristor and its usage in neuromorphic system design.
Proceedings of the 19th Asia and South Pacific Design Automation Conference, 2014

Statistical analysis of random telegraph noise in digital circuits.
Proceedings of the 19th Asia and South Pacific Design Automation Conference, 2014

2013
On-Chip Sensor Network for Efficient Management of Power Gating-Induced Power/Ground Noise in Multiprocessor System on Chip.
IEEE Trans. Parallel Distributed Syst., 2013

NICSLU: An Adaptive Sparse Matrix Solver for Parallel Circuit Simulation.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2013

Unification of PR Region floorplanning and Fine-Grained Placement for Dynamic Partially Reconfigurable FPGAs.
J. Circuits Syst. Comput., 2013

Evaluation and mitigation of performance degradation under random telegraph noise for digital circuits.
IET Circuits Devices Syst., 2013

Assessment of Circuit Optimization Techniques Under NBTI.
IEEE Des. Test, 2013

Nonzero pattern analysis and memory access optimization in GPU-based sparse LU factorization for circuit simulation.
Proceedings of the 3rd Workshop on Irregular Applications - Architectures and Algorithms, 2013

RALP: Reconvergence-aware layer partitioning for 3D FPGAs.
Proceedings of the 2012 International Conference on Reconfigurable Computing and FPGAs, 2013

Whitespace-aware TSV arrangement in 3D clock tree synthesis.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2013

TSV-aware topology generation for 3D Clock Tree Synthesis.
Proceedings of the International Symposium on Quality Electronic Design, 2013

Memristor-based approximated computation.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

ADAMS: asymmetric differential STT-RAM cell structure for reliable and high-performance applications.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2013

DTW-Based Subsequence Similarity Search on AMD Heterogeneous Computing Platform.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

Dynamic Stencil: Effective exploitation of run-time resources in reconfigurable clusters.
Proceedings of the 2013 International Conference on Field-Programmable Technology, 2013

Accelerating subsequence similarity search based on dynamic time warping distance with FPGA.
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

HS3DPG: Hierarchical simulation for 3D P/G network.
Proceedings of the 18th Asia and South Pacific Design Automation Conference, 2013

Data dependency aware prefetch scheduling for Dynamic Partial reconfigurable designs.
Proceedings of the IEEE 10th International Conference on ASIC, 2013

2012
Variation-Aware Supply Voltage Assignment for Simultaneous Power and Aging Optimization.
IEEE Trans. Very Large Scale Integr. Syst., 2012

Parametric Yield-Driven Resource Binding in High-Level Synthesis with Multi-Vth/Vdd Library and Device Sizing.
J. Electr. Comput. Eng., 2012

Temporal Performance Degradation under RTN: Evaluation and Mitigation for Nanoscale Circuits.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2012

Improving energy efficiency of write-asymmetric memories by log style write.
Proceedings of the International Symposium on Low Power Electronics and Design, 2012

Probabilistic Brain Fiber Tractography on GPUs.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Parallel Circuit Simulation on Multi/Many-core Systems.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

FPGA based memory efficient high resolution stereo vision system for video tolling.
Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012

Parallel FPGA-based all pairs shortest paths for sparse networks: A human brain connectome case study.
Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL), 2012

Pub/Sub on stream: a multi-core based message broker with QoS support.
Proceedings of the Sixth ACM International Conference on Distributed Event-Based Systems, 2012

PS3-RAM: a fast portable and scalable statistical STT-RAM reliability analysis method.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

Sparse LU factorization for parallel circuit simulation on GPU.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

Yield-aware time-efficient testing and self-fixing design for TSV-based 3D ICs.
Proceedings of the 17th Asia and South Pacific Design Automation Conference, 2012

Thermal-aware power network design for IR drop reduction in 3D ICs.
Proceedings of the 17th Asia and South Pacific Design Automation Conference, 2012

An adaptive LU factorization algorithm for parallel circuit simulation.
Proceedings of the 17th Asia and South Pacific Design Automation Conference, 2012

A Reconfigurable Computing Approach for Efficient and Scalable Parallel Graph Exploration.
Proceedings of the 23rd IEEE International Conference on Application-Specific Systems, 2012

PDPR: Fine-Grained Placement for Dynamic Partially Reconfigurable FPGAs.
Proceedings of the Reconfigurable Computing: Architectures, Tools and Applications, 2012

2011
Power Gating Aware Task Scheduling in MPSoC.
IEEE Trans. Very Large Scale Integr. Syst., 2011

Leakage Power and Circuit Aging Cooptimization by Gate Replacement Techniques.
IEEE Trans. Very Large Scale Integr. Syst., 2011

An FPGA-based accelerator for LambdaRank in Web search engines.
ACM Trans. Reconfigurable Technol. Syst., 2011

Temperature-Aware NBTI Modeling and the Impact of Standby Leakage Reduction Techniques on Circuit Performance Degradation.
IEEE Trans. Dependable Secur. Comput., 2011

An EScheduler-Based Data Dependence Analysis and Task Scheduling for Parallel Circuit Simulation.
IEEE Trans. Circuits Syst. II Express Briefs, 2011

Leakage-Aware TSV-Planning with Power-Temperature-Delay Dependence in 3D ICs.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2011

A Hardware-Software Collaborated Method for Soft-Error Tolerant MPSoC.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2011

Circuit-level delay modeling considering both TDDB and NBTI.
Proceedings of the 12th International Symposium on Quality Electronic Design, 2011

A heterogeneous accelerator platform for multi-subject voxel-based brain network analysis.
Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011

Gemma in April: A matrix-like parallel programming architecture on OpenCL.
Proceedings of the Design, Automation and Test in Europe, 2011

Tree-Based Partitioning Approach for Network-on-Chip Synthesis.
Proceedings of the 12th International Conference on Computer-Aided Design and Computer Graphics, 2011

Network flow-based simultaneous retiming and slack budgeting for low power design.
Proceedings of the 16th Asia South Pacific Design Automation Conference, 2011

Rethinking thermal via planning with timing-power-temperature dependence for 3D ICs.
Proceedings of the 16th Asia South Pacific Design Automation Conference, 2011

On-chip hybrid power supply system for wireless sensor nodes.
Proceedings of the 16th Asia South Pacific Design Automation Conference, 2011

Incremental layout optimization for NoC designs based on MILP formulation.
Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

FPGA Accelerated Parallel Sparse Matrix Factorization for Circuit Simulations.
Proceedings of the Reconfigurable Computing: Architectures, Tools and Applications, 2011

2010
Output remapping technique for critical paths soft-error rate reduction.
IET Comput. Digit. Tech., 2010

Fast-locking all-digital phase-locked loop with digitally controlled oscillator tuning word estimating and presetting.
IET Circuits Devices Syst., 2010

FPGA and GPU implementation of large scale SpMV.
Proceedings of the IEEE 8th Symposium on Application Specific Processors, 2010

Efficient PageRank and SpMV Computation on AMD GPUs.
Proceedings of the 39th International Conference on Parallel Processing, 2010

Making Human Connectome Faster: GPU Acceleration of Brain Network Analysis.
Proceedings of the 16th IEEE International Conference on Parallel and Distributed Systems, 2010

LambdaRank acceleration for relevance ranking in web search engines (abstract only).
Proceedings of the ACM/SIGDA 18th International Symposium on Field Programmable Gate Arrays, 2010

FPMR: MapReduce framework on FPGA.
Proceedings of the ACM/SIGDA 18th International Symposium on Field Programmable Gate Arrays, 2010

Simultaneous slack budgeting and retiming for synchronous circuits optimization.
Proceedings of the 15th Asia South Pacific Design Automation Conference, 2010

PS-FPG: pattern selection based co-design of floorplan and power/ground network with wiring resource optimization.
Proceedings of the 15th Asia South Pacific Design Automation Conference, 2010

Three-dimensional integrated circuits (3D IC) floorplan and power/ground network co-synthesis.
Proceedings of the 15th Asia South Pacific Design Automation Conference, 2010

Parametric yield driven resource binding in behavioral synthesis with multi-<i>V</i><sub><i>th</i></sub><i>/V</i><sub><i>dd</i></sub> library.
Proceedings of the 15th Asia South Pacific Design Automation Conference, 2010

Minimizing leakage power in aging-bounded high-level synthesis with design time multi-<i>V</i><sub><i>th</i></sub> assignment.
Proceedings of the 15th Asia South Pacific Design Automation Conference, 2010

2009
Leakage Power Reduction through Dual V<sub>th</sub> Assignment Considering Threshold voltage Variation.
J. Circuits Syst. Comput., 2009

<i>New-Age</i>: A Negative Bias Temperature Instability-Estimation Framework for Microarchitectural Components.
Int. J. Parallel Program., 2009

Thermal-Aware Incremental Floorplanning for 3D ICs Based on MILP Formulation.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2009

Temperature-Aware NBTI Modeling Techniques in Digital Circuits.
IEICE Trans. Electron., 2009

On-line MPSoC Scheduling Considering Power Gating Induced Power/Ground Noise.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2009

Modern Floorplanning with Boundary Clustering Constraint.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2009

On the efficacy of input Vector Control to mitigate NBTI effects and leakage power.
Proceedings of the 10th International Symposium on Quality of Electronic Design (ISQED 2009), 2009

NBTI-aware statistical circuit delay assessment.
Proceedings of the 10th International Symposium on Quality of Electronic Design (ISQED 2009), 2009

Variation-aware supply voltage assignment for minimizing circuit degradation and leakage.
Proceedings of the 2009 International Symposium on Low Power Electronics and Design, 2009

RankBoost Acceleration on both NVIDIA CUDA and ATI Stream Platforms.
Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

Multi-objective Floorplanning Based on Fuzzy Logic.
Proceedings of the Sixth International Conference on Fuzzy Systems and Knowledge Discovery, 2009

FPGA-based acceleration of neural network for ranking in web search engine with a streaming architecture.
Proceedings of the 19th International Conference on Field Programmable Logic and Applications, 2009

Gate replacement techniques for simultaneous leakage and aging optimization.
Proceedings of the Design, Automation and Test in Europe, 2009

An efficient technique for analysis of minimal buffer requirements of synchronous dataflow graphs with model checking.
Proceedings of the 7th International Conference on Hardware/Software Codesign and System Synthesis, 2009

A case study of on-chip sensor network in multiprocessor system-on-chip.
Proceedings of the 2009 International Conference on Compilers, 2009

A framework for estimating NBTI degradation of microarchitectural components.
Proceedings of the 14th Asia South Pacific Design Automation Conference, 2009

2008
Two-Phase Fine-Grain Sleep Transistor Insertion Technique in Leakage Critical Circuits.
IEEE Trans. Very Large Scale Integr. Syst., 2008

Output Remapping Technique for Soft-Error Rate Reduction in Critical Paths.
Proceedings of the 9th International Symposium on Quality of Electronic Design (ISQED 2008), 2008

A capacitive boosted buffer technique for high-speed process-variation-tolerant interconnect in UDVS application.
Proceedings of the 13th Asia South Pacific Design Automation Conference, 2008

Dynamic TDM virtual circuit implementation for NoC.
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2008

2007
A Novel Gate-Level NBTI Delay Degradation Model with Stacking Effect.
Proceedings of the Integrated Circuit and System Design. Power and Timing Modeling, 2007

Modeling of PMOS NBTI Effect Considering Temperature Variation.
Proceedings of the 8th International Symposium on Quality of Electronic Design (ISQED 2007), 2007

A power gating scheme for ground bounce reduction during mode transition.
Proceedings of the 25th International Conference on Computer Design, 2007

Temperature-aware NBTI modeling and the impact of input vector control on performance degradation.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

2006
Signal-Path-Level Dual-V<sub>T</sub> Assignment for Leakage Power Reduction.
J. Circuits Syst. Comput., 2006

IR-drop Reduction Through Combinational Circuit Partitioning.
Proceedings of the Integrated Circuit and System Design. Power and Timing Modeling, 2006

Simultaneous Fine-grain Sleep Transistor Placement and Sizing for Leakage Optimization.
Proceedings of the 7th International Symposium on Quality of Electronic Design (ISQED 2006), 2006

Two-phase fine-grain sleep transistor insertion technique in leakage critical circuits.
Proceedings of the 2006 International Symposium on Low Power Electronics and Design, 2006

Genetic Algorithm Based Fine-Grain Sleep Transistor Insertion Technique for Leakage Optimization.
Proceedings of the Advances in Natural Computation, Second International Conference, 2006

Fine-grain Sleep Transistor Placement Considering Leakage Feedback Gate.
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems 2006, 2006

A New Thermal-Conscious System-Level Methodology for Energy-Efficient Processor Voltage Selection.
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems 2006, 2006


  Loading...