Weng-Fai Wong
Orcid: 0000-0002-4281-2053
According to our database1,
Weng-Fai Wong
authored at least 186 papers
between 1989 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
1.63 pJ/SOP Neuromorphic Processor With Integrated Partial Sum Routers for In-Network Computing.
IEEE Trans. Very Large Scale Integr. Syst., November, 2024
IEEE Trans. Neural Networks Learn. Syst., November, 2024
Optimizing the Number of Clusters for Billion-Scale Quantization-Based Nearest Neighbor Search.
IEEE Trans. Knowl. Data Eng., November, 2024
Enabling Energy-Efficient Deployment of Large Language Models on Memristor Crossbar: A Synergy of Large and Small.
CoRR, 2024
CoRR, 2024
CoRR, 2024
Integrating Deep Learning and Synthetic Biology: A Co-Design Approach for Enhancing Gene Expression via N-terminal Coding Sequences.
CoRR, 2024
Proceedings of the International Joint Conference on Neural Networks, 2024
Proceedings of the 53rd International Conference on Parallel Processing, 2024
Table-Lookup MAC: Scalable Processing of Quantised Neural Networks in FPGA Soft Logic.
Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2024
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024
2023
Proc. ACM Manag. Data, December, 2023
Desire backpropagation: A lightweight training algorithm for multi-layer spiking neural networks based on spike-timing-dependent plasticity.
Neurocomputing, December, 2023
IEEE Trans. Parallel Distributed Syst., October, 2023
IEEE Trans. Computers, October, 2023
CQ$^{+}$+ Training: Minimizing Accuracy Loss in Conversion From Convolutional Neural Networks to Spiking Neural Networks.
IEEE Trans. Pattern Anal. Mach. Intell., October, 2023
Commun. ACM, July, 2023
IEEE Trans. Computers, June, 2023
HongTu: Scalable Full-Graph GNN Training on Multiple GPUs (via communication-optimized CPU data offloading).
CoRR, 2023
HyperSNN: A new efficient and robust deep learning model for resource constrained control applications.
CoRR, 2023
Proceedings of the Machine Learning and Knowledge Discovery in Databases: Research Track, 2023
1.7pJ/SOP Neuromorphic Processor with Integrated Partial Sum Routers for In-Network Computing.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2023
OpenEmbedding: A Distributed Parameter Server for Deep Learning Recommendation Models using Persistent Memory.
Proceedings of the 39th IEEE International Conference on Data Engineering, 2023
Proceedings of the Next Generation Arithmetic - 4th International Conference, 2023
Proceedings of the Next Generation Arithmetic - 4th International Conference, 2023
2022
ACM Trans. Reconfigurable Technol. Syst., 2022
Tensorox: Accelerating GPU Applications via Neural Approximation on Unused Tensor Cores.
IEEE Trans. Parallel Distributed Syst., 2022
NC-Net: Efficient Neuromorphic Computing Using Aggregated Subnets on a Crossbar-Based Architecture With Nonvolatile Memory.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022
Corrigendum to "Coreset: Hierarchical neuromorphic computing supporting large-scale neural networks with improved resource efficiency" [Neurocomputing (2022) 128-140].
Neurocomputing, 2022
Coreset: Hierarchical neuromorphic computing supporting large-scale neural networks with improved resource efficiency.
Neurocomputing, 2022
Low Latency Conversion of Artificial Neural Network Models to Rate-encoded Spiking Neural Networks.
CoRR, 2022
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022
Proceedings of the 19th International SoC Design Conference, 2022
REACT: a heterogeneous reconfigurable neural network accelerator with software-configurable NoCs for training and inference on wearables.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022
Proceedings of the Next Generation Arithmetic - Third International Conference, 2022
2021
IEEE ACM Trans. Comput. Biol. Bioinform., 2021
ACM Trans. Archit. Code Optim., 2021
OBET: On-the-Fly Byte-Level Error Tracking for Correcting and Detecting Faults in Unreliable DRAM Systems.
Sensors, 2021
Optimizing An In-memory Database System For AI-powered On-line Decision Augmentation Using Persistent Memory.
Proc. VLDB Endow., 2021
DTNN: Energy-efficient Inference with Dendrite Tree Inspired Neural Networks for Edge Vision Applications.
CoRR, 2021
Biomed. Signal Process. Control., 2021
IEEE Access, 2021
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021
DeepFire: Acceleration of Convolutional Spiking Neural Network on Modern Field Programmable Gate Arrays.
Proceedings of the 31st International Conference on Field-Programmable Logic and Applications, 2021
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
2020
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020
IEEE Trans. Computers, 2020
A future intelligent traffic system with mixed autonomous vehicles and human-driven vehicles.
Inf. Sci., 2020
Proceedings of the International Conference on Neuromorphic Systems, 2020
Shenjing: A low power reconfigurable neuromorphic accelerator with partial-sum and spike networks-on-chip.
Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition, 2020
Proceedings of the 10th Conference on Innovative Data Systems Research, 2020
2019
IEEE Trans. Cloud Comput., 2019
IEEE Trans. Big Data, 2019
ACM Trans. Archit. Code Optim., 2019
Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, 2019
Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2019
Resource Efficient Personalized ECG Beat Classification via Temporal Logic Synthesis.
Proceedings of the 19th IEEE International Conference on Bioinformatics and Bioengineering, 2019
Proceedings of the Approximate Circuits, Methodologies and CAD., 2019
2018
Proceedings of the 25th IEEE International Conference on High Performance Computing, 2018
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018
2017
Proceedings of the 33rd IEEE International Conference on Data Engineering, 2017
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017
Proceedings of the Computational Methods in Systems Biology, 2017
Proceedings of the 22nd Asia and South Pacific Design Automation Conference, 2017
2016
IEEE Trans. Knowl. Data Eng., 2016
TreeFTL: An Efficient Workload-Adaptive Algorithm for RAM Buffer Management of NAND Flash-Based Devices.
IEEE Trans. Computers, 2016
2015
A Family of Bit-Representation-Optimized Formats for Fast Sparse Matrix-Vector Multiplication on the GPU.
IEEE Trans. Parallel Distributed Syst., 2015
A Code Generation Framework for Targeting Optimized Library Calls for Multiple Platforms.
IEEE Trans. Parallel Distributed Syst., 2015
In-memory Databases: Challenges and Opportunities From Software and Hardware Perspectives.
SIGMOD Rec., 2015
IEICE Electron. Express, 2015
DGCC: A New Dependency Graph based Concurrency Control Protocol for Multicore Database Systems.
CoRR, 2015
Proceedings of the 31st IEEE International Conference on Data Engineering, 2015
Proceedings of the Hybrid Systems Biology - Fourth International Workshop, 2015
Proceedings of the 2015 International Conference on Compilers, 2015
2014
IEEE Trans. Very Large Scale Integr. Syst., 2014
IEEE Trans. Parallel Distributed Syst., 2014
Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, 2014
Proceedings of the SIGPLAN/SIGBED Conference on Languages, 2014
Proceedings of the 32nd IEEE International Conference on Computer Design, 2014
Proceedings of the 2014 International Conference on Compilers, 2014
A coherent hybrid SRAM and STT-RAM L1 cache architecture for shared memory multicores.
Proceedings of the 19th Asia and South Pacific Design Automation Conference, 2014
2013
GPU code generation for ODE-based applications with phased shared-data access patterns.
ACM Trans. Archit. Code Optim., 2013
On-chip caches built on multilevel spin-transfer torque RAM cells and its optimizations.
ACM J. Emerg. Technol. Comput. Syst., 2013
Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes.
Proceedings of the International Conference for High Performance Computing, 2013
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013
Optimizing and Auto-Tuning Iterative Stencil Loops for GPUs with the In-Plane Method.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013
TreeFTL: efficient RAM management for high performance of NAND flash-based storage systems.
Proceedings of the Design, Automation and Test in Europe, 2013
Proceedings of the 50th Annual Design Automation Conference 2013, 2013
2012
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012
ADAPT: Efficient workload-sensitive flash management based on adaptation, prediction and aggregation.
Proceedings of the IEEE 28th Symposium on Mass Storage Systems and Technologies, 2012
Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012
Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012
Proceedings of the 49th Annual Design Automation Conference 2012, 2012
2011
Internet-based hardware/software co-design framework for embedded 3D graphics applications.
EURASIP J. Adv. Signal Process., 2011
Proceedings of the 7th International Conference on Virtual Execution Environments, 2011
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011
Proceedings of the 2011 International Symposium on Low Power Electronics and Design, 2011
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
Proceedings of the ACM/SIGDA 19th International Symposium on Field Programmable Gate Arrays, 2011
A UML 2-based hardware-software co-design framework for body sensor network applications.
Proceedings of the Design, Automation and Test in Europe, 2011
2010
ACM Trans. Archit. Code Optim., 2010
Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2010
2009
ACM Trans. Archit. Code Optim., 2009
Proceedings of the 22nd ACM Symposium on Operating Systems Principles 2009, 2009
The salvage cache: A fault-tolerant cache architecture for next-generation memory technologies.
Proceedings of the 27th International Conference on Computer Design, 2009
Optimal Placement-aware Trace-Based Scheduling of Hardware Reconfigurations for FPGA Accelerators.
Proceedings of the FCCM 2009, 2009
Proceedings of the 46th Design Automation Conference, 2009
Proceedings of the 46th Design Automation Conference, 2009
Proceedings of the Sixth International Workshop on Wearable and Implantable Body Sensor Networks, 2009
Proceedings of the 14th Asia South Pacific Design Automation Conference, 2009
2008
Softw. Pract. Exp., 2008
Defining neighborhood relations for fast spatial-temporal partitioning of applications on reconfigurable architectures.
Proceedings of the 2008 International Conference on Field-Programmable Technology, 2008
Proceedings of the Sixth International Symposium on Code Generation and Optimization (CGO 2008), 2008
Proceedings of the Compiler Construction, 17th International Conference, 2008
2007
J. VLSI Signal Process., 2007
Proceedings of the 28th IEEE Real-Time Systems Symposium (RTSS 2007), 2007
Proceedings of the 25th International Conference on Computer Design, 2007
DRIM: a low power dynamically reconfigurable instruction memory hierarchy for embedded systems.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007
Proceedings of the Fifth International Symposium on Code Generation and Optimization (CGO 2007), 2007
Proceedings of the 21st International Conference on Advanced Information Networking and Applications (AINA 2007), 2007
2006
Proceedings of the 2006 IEEE International Conference on Field Programmable Technology, 2006
Proceedings of the Emerging Directions in Embedded and Ubiquitous Computing, 2006
Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006
2005
SIGARCH Comput. Archit. News, 2005
Using UML 2.0 for System Level Design of Real Time SoC Platforms for Stream Processing.
Proceedings of the 11th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2005), 2005
Proceedings of the 30th Annual IEEE Conference on Local Computer Networks (LCN 2005), 2005
Proceedings of the High Performance Computing, 2005
Proceedings of the 2005 International Conference on Field Programmable Logic and Applications (FPL), 2005
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005
Proceedings of the Advances in Computer Systems Architecture, 10th Asia-Pacific Conference, 2005
Proceedings of the Fifth International Conference on Computer and Information Technology (CIT 2005), 2005
2004
Proceedings of the Web Content Caching and Distribution: 9th International Workshop, 2004
Proceedings of the 25th IEEE Real-Time Systems Symposium (RTSS 2004), 2004
Adaptive Compiler Directed Prefetching for EPIC Processors.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2004
Proceedings of the 2004 International Conference on Computer-Aided Design, 2004
Proceedings of the 2004 IEEE International Conference on Field-Programmable Technology, 2004
Proceedings of the 2nd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2004
Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004
Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, 2004
2003
Parallel Comput., 2003
Proceedings of the 2003 IEEE International Conference on Field-Programmable Technology, 2003
Proceedings of the Field Programmable Logic and Application, 13th International Conference, 2003
Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2003), 2003
2002
Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002
Proceedings of the 2002 IEEE International Conference on Field-Programmable Technology, 2002
Proceedings of the 2002 IEEE International Conference on Field-Programmable Technology, 2002
Proceedings of the 2002 IEEE International Conference on Cluster Computing (CLUSTER 2002), 2002
Proceedings of the 2002 IEEE International Conference on Cluster Computing (CLUSTER 2002), 2002
2001
Proceedings of the Embedded Software, First International Workshop, 2001
Proceedings of the 2001 International Conference on Compilers, 2001
2000
J. Syst. Archit., 2000
Proceedings of the Seventh International Conference on Parallel and Distributed Systems, 2000
SilkRoad: A Multithreaded Runtime System with Software Distributed Shared Memory for SMP Clusters.
Proceedings of the 2000 IEEE International Conference on Cluster Computing (CLUSTER 2000), November 28th, 2000
1999
Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999
1996
1995
IEEE Trans. Computers, 1995
Int. J. High Perform. Comput. Appl., 1995
Compiling Parallel Lisp for a Shared Memory Multiprocessor.
Proceedings of the Seventh IASTED/ISMM International Conference on Parallel and Distributed Computing and Systems, 1995
Highy Efficient Parallel Lisp Implementation on Distributed Systems.
Proceedings of the Parallel Computing: State-of-the-Art and Perspectives, 1995
Design and Implementation of Abstract Machine for Parallel Lisp Compilation.
Proceedings of the 1995 International Conference on Parallel Processing, 1995
1994
Fast Hardware-Based Algorithms for Elementary Function Computations Using Rectangular Multipliers.
IEEE Trans. Computers, 1994
A Simulation Study on the Interactions between Multithreaded Architectures and the Cache.
Int. J. High Speed Comput., 1994
Fast Evaluation of the Elementary Functions in Double Precision.
Proceedings of the 27th Annual Hawaii International Conference on System Sciences (HICSS-27), 1994
1992
Parallel Comput., 1992
1991
Int. J. High Speed Comput., 1991
1990
Microprocessing and Microprogramming, 1990
1989
Proceedings of the IEEE International Workshop on Tools for Artificial Intelligence: Architectures, 1989