Weng-Fai Wong

Orcid: 0000-0002-4281-2053

According to our database1, Weng-Fai Wong authored at least 186 papers between 1989 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
1.63 pJ/SOP Neuromorphic Processor With Integrated Partial Sum Routers for In-Network Computing.
IEEE Trans. Very Large Scale Integr. Syst., November, 2024

Optimizing for In-Memory Deep Learning With Emerging Memory Technology.
IEEE Trans. Neural Networks Learn. Syst., November, 2024

Optimizing the Number of Clusters for Billion-Scale Quantization-Based Nearest Neighbor Search.
IEEE Trans. Knowl. Data Eng., November, 2024

Enabling Energy-Efficient Deployment of Large Language Models on Memristor Crossbar: A Synergy of Large and Small.
CoRR, 2024

Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based Spiking Language Model.
CoRR, 2024

Reconsidering the energy efficiency of spiking neural networks.
CoRR, 2024

SparrowSNN: A Hardware/software Co-design for Energy Efficient ECG Classification.
CoRR, 2024

Integrating Deep Learning and Synthetic Biology: A Co-Design Approach for Enhancing Gene Expression via N-terminal Coding Sequences.
CoRR, 2024

OneSpike: Ultra-low latency spiking neural networks.
Proceedings of the International Joint Conference on Neural Networks, 2024

IMI: In-memory Multi-job Inference Acceleration for Large Language Models.
Proceedings of the 53rd International Conference on Parallel Processing, 2024

Table-Lookup MAC: Scalable Processing of Quantised Neural Networks in FPGA Soft Logic.
Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2024

NOVA: NoC-based Vector Unit for Mapping Attention Layers on a CNN Accelerator.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

2023
HongTu: Scalable Full-Graph GNN Training on Multiple GPUs.
Proc. ACM Manag. Data, December, 2023

Desire backpropagation: A lightweight training algorithm for multi-layer spiking neural networks based on spike-timing-dependent plasticity.
Neurocomputing, December, 2023

Simeuro: A Hybrid CPU-GPU Parallel Simulator for Neuromorphic Computing Chips.
IEEE Trans. Parallel Distributed Syst., October, 2023

DeepFire2: A Convolutional Spiking Neural Network Accelerator on FPGAs.
IEEE Trans. Computers, October, 2023

CQ$^{+}$+ Training: Minimizing Accuracy Loss in Conversion From Convolutional Neural Networks to Spiking Neural Networks.
IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

Achieving Green AI with Energy-Efficient Deep Learning Using Neuromorphic Computing.
Commun. ACM, July, 2023

Benchmarking Quantum(-Inspired) Annealing Hardware on Practical Use Cases.
IEEE Trans. Computers, June, 2023

LightRW: FPGA Accelerated Graph Dynamic Random Walks.
Proc. ACM Manag. Data, 2023

HongTu: Scalable Full-Graph GNN Training on Multiple GPUs (via communication-optimized CPU data offloading).
CoRR, 2023

HyperSNN: A new efficient and robust deep learning model for resource constrained control applications.
CoRR, 2023

Efficient Hyperdimensional Computing.
Proceedings of the Machine Learning and Knowledge Discovery in Databases: Research Track, 2023

1.7pJ/SOP Neuromorphic Processor with Integrated Partial Sum Routers for In-Network Computing.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2023

OpenEmbedding: A Distributed Parameter Server for Deep Learning Recommendation Models using Persistent Memory.
Proceedings of the 39th IEEE International Conference on Data Engineering, 2023

Towards a Better 16-Bit Number Representation for Training Neural Networks.
Proceedings of the Next Generation Arithmetic - 4th International Conference, 2023

Bedot: Bit Efficient Dot Product for Deep Generative Models.
Proceedings of the Next Generation Arithmetic - 4th International Conference, 2023

2022
ThunderGP: Resource-Efficient Graph Processing Framework on FPGAs with HLS.
ACM Trans. Reconfigurable Technol. Syst., 2022

Tensorox: Accelerating GPU Applications via Neural Approximation on Unused Tensor Cores.
IEEE Trans. Parallel Distributed Syst., 2022

NC-Net: Efficient Neuromorphic Computing Using Aggregated Subnets on a Crossbar-Based Architecture With Nonvolatile Memory.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

Corrigendum to "Coreset: Hierarchical neuromorphic computing supporting large-scale neural networks with improved resource efficiency" [Neurocomputing (2022) 128-140].
Neurocomputing, 2022

Coreset: Hierarchical neuromorphic computing supporting large-scale neural networks with improved resource efficiency.
Neurocomputing, 2022

Low Latency Conversion of Artificial Neural Network Models to Rate-encoded Spiking Neural Networks.
CoRR, 2022

ReGraph: Scaling Graph Processing on HBM-enabled FPGAs with Heterogeneous Pipelines.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Network-on-Chip-Centric Accelerator Architectures for Edge AI Computing.
Proceedings of the 19th International SoC Design Conference, 2022

REACT: a heterogeneous reconfigurable neural network accelerator with software-configurable NoCs for training and inference on wearables.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Qtorch+: Next Generation Arithmetic for Pytorch Machine Learning.
Proceedings of the Next Generation Arithmetic - Third International Conference, 2022

2021
Synthesis of the Dynamical Properties of Feedback Loops in Bio-Pathways.
IEEE ACM Trans. Comput. Biol. Bioinform., 2021

GRAM: A Framework for Dynamically Mixing Precisions in GPU Applications.
ACM Trans. Archit. Code Optim., 2021

OBET: On-the-Fly Byte-Level Error Tracking for Correcting and Detecting Faults in Unreliable DRAM Systems.
Sensors, 2021

Optimizing An In-memory Database System For AI-powered On-line Decision Augmentation Using Persistent Memory.
Proc. VLDB Endow., 2021

DTNN: Energy-efficient Inference with Dendrite Tree Inspired Neural Networks for Edge Vision Applications.
CoRR, 2021

Energy efficient ECG classification with spiking neural network.
Biomed. Signal Process. Control., 2021

ZEM: Zero-Cycle Bit-Masking Module for Deep Learning Refresh-Less DRAM.
IEEE Access, 2021

ThundeRiNG: generating multiple independent random number sequences on FPGAs.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

DeepFire: Acceleration of Convolutional Spiking Neural Network on Modern Field Programmable Gate Arrays.
Proceedings of the 31st International Conference on Field-Programmable Logic and Applications, 2021

ThunderGP: HLS-based Graph Processing Framework on FPGAs.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

Posit Arithmetic for the Training and Deployment of Generative Adversarial Networks.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

Skew-Oblivious Data Routing for Data Intensive Applications on FPGAs with HLS.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

Near Lossless Transfer Learning for Spiking Neural Networks.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
An FPGA-Based Hardware Emulator for Neuromorphic Chip With RRAM.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

NV-Journaling: Locality-Aware Journaling Using Byte-Addressable Non-Volatile Memory.
IEEE Trans. Computers, 2020

A future intelligent traffic system with mixed autonomous vehicles and human-driven vehicles.
Inf. Sci., 2020

NCPower: Power Modelling for NVM-based Neuromorphic Chip.
Proceedings of the International Conference on Neuromorphic Systems, 2020

Shenjing: A low power reconfigurable neuromorphic accelerator with partial-sum and spike networks-on-chip.
Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition, 2020

Is FPGA Useful for Hash Joins?
Proceedings of the 10th Conference on Innovative Data Systems Research, 2020

2019
Fault Tolerant Stencil Computation on Cloud-Based GPU Spot Instances.
IEEE Trans. Cloud Comput., 2019

MemepiC: Towards a Unified In-Memory Big Data Management System.
IEEE Trans. Big Data, 2019

A System-Level Simulator for RRAM-Based Neuromorphic Computing Chips.
ACM Trans. Archit. Code Optim., 2019

ApproxSymate: path sensitive program approximation using symbolic execution.
Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, 2019

On-The-Fly Parallel Data Shuffling for Graph Processing on OpenCL-Based FPGAs.
Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019

Multi-objective Precision Optimization of Deep Neural Networks for Edge Devices.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2019

Resource Efficient Personalized ECG Beat Classification via Temporal Logic Synthesis.
Proceedings of the 19th IEEE International Conference on Bioinformatics and Bioengineering, 2019

Compilation and Other Software Techniques Enabling Approximate Computing.
Proceedings of the Approximate Circuits, Methodologies and CAD., 2019

2018
Making Strassen Matrix Multiplication Safe.
Proceedings of the 25th IEEE International Conference on High Performance Computing, 2018

Gloss: Seamless Live Reconfiguration and Reoptimization of Stream Programs.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

2017
Parallelizing Skip Lists for In-Memory Multi-Core Database Systems.
Proceedings of the 33rd IEEE International Conference on Data Engineering, 2017

Exploiting half precision arithmetic in Nvidia GPUs.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

Automated Property Synthesis of ODEs Based Bio-pathways Models.
Proceedings of the Computational Methods in Systems Biology, 2017

Efficient floating point precision tuning for approximate computing.
Proceedings of the 22nd Asia and South Pacific Design Automation Conference, 2017

2016
Exploiting Single-Threaded Model in Multi-Core In-Memory Systems.
IEEE Trans. Knowl. Data Eng., 2016

TreeFTL: An Efficient Workload-Adaptive Algorithm for RAM Buffer Management of NAND Flash-Based Devices.
IEEE Trans. Computers, 2016

PI : a Parallel in-memory skip list based Index.
CoRR, 2016

2015
A Family of Bit-Representation-Optimized Formats for Fast Sparse Matrix-Vector Multiplication on the GPU.
IEEE Trans. Parallel Distributed Syst., 2015

A Code Generation Framework for Targeting Optimized Library Calls for Multiple Platforms.
IEEE Trans. Parallel Distributed Syst., 2015

Multi-agent simulation on multiple GPUs.
Simul. Model. Pract. Theory, 2015

In-memory Databases: Challenges and Opportunities From Software and Hardware Perspectives.
SIGMOD Rec., 2015

3DFTL: a three-level demand-based translation strategy for flash device.
IEICE Electron. Express, 2015

DGCC: A New Dependency Graph based Concurrency Control Protocol for Multicore Database Systems.
CoRR, 2015

"Anti-Caching"-based elastic memory management for Big Data.
Proceedings of the 31st IEEE International Conference on Data Engineering, 2015

Parallelized Parameter Estimation of Biological Pathway Models.
Proceedings of the Hybrid Systems Biology - Fourth International Workshop, 2015

PAC: Program Analysis for Approximation-aware Compilation.
Proceedings of the 2015 International Conference on Compilers, 2015

2014
STT-RAM Cache Hierarchy With Multiretention MTJ Designs.
IEEE Trans. Very Large Scale Integr. Syst., 2014

Mapping Streaming Applications onto GPU Systems.
IEEE Trans. Parallel Distributed Syst., 2014

StreamJIT: a commensal compiler for high-performance stream programming.
Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, 2014

ASAC: automatic sensitivity analysis for approximate computing.
Proceedings of the SIGPLAN/SIGBED Conference on Languages, 2014

Optimizing MLC-based STT-RAM caches by dynamic block size reconfiguration.
Proceedings of the 32nd IEEE International Conference on Computer Design, 2014

EnVM: Virtual memory design for new memory architectures.
Proceedings of the 2014 International Conference on Compilers, 2014

A coherent hybrid SRAM and STT-RAM L1 cache architecture for shared memory multicores.
Proceedings of the 19th Asia and South Pacific Design Automation Conference, 2014

2013
GPU code generation for ODE-based applications with phased shared-data access patterns.
ACM Trans. Archit. Code Optim., 2013

On-chip caches built on multilevel spin-transfer torque RAM cells and its optimizations.
ACM J. Emerg. Technol. Comput. Syst., 2013

Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes.
Proceedings of the International Conference for High Performance Computing, 2013

A practical low-power memristor-based analog neural branch predictor.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Optimizing and Auto-Tuning Iterative Stencil Loops for GPUs with the In-Plane Method.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

TreeFTL: efficient RAM management for high performance of NAND flash-based storage systems.
Proceedings of the Design, Automation and Test in Europe, 2013

SAW: system-assisted wear leveling on the write endurance of NAND flash devices.
Proceedings of the 50th Annual Design Automation Conference 2013, 2013

2012
Approximate probabilistic analysis of biopathway dynamics.
Bioinform., 2012

Poster: Automated Mapping Streaming Applications onto GPUs.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Mapping Streaming Applications onto GPU Systems.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Scalable framework for mapping streaming applications onto multi-GPU systems.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

ADAPT: Efficient workload-sensitive flash management based on adaptation, prediction and aggregation.
Proceedings of the IEEE 28th Symposium on Mass Storage Systems and Technologies, 2012

Automatic Refactoring of Legacy Fortran Code to the Array Slicing Notation.
Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012

Guppy: A GPU-like soft-core processor.
Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012

Tulipse: A Visualization Framework for User-Guided Parallelization.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

Extending the lifetime of NAND flash memory by salvaging bad blocks.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

Observational wear leveling: an efficient algorithm for flash memory management.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

2011
Guest Editorial - BSN2010 Special Issue.
IEEE Trans. Biomed. Circuits Syst., 2011

Internet-based hardware/software co-design framework for embedded 3D graphics applications.
EURASIP J. Adv. Signal Process., 2011

Dynamic cache contention detection in multi-threaded applications.
Proceedings of the 7th International Conference on Virtual Execution Environments, 2011

Multi retention level STT-RAM cache designs with a dynamic refresh scheme.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Processor caches with multi-level spin-transfer torque ram cells.
Proceedings of the 2011 International Symposium on Low Power Electronics and Design, 2011

Automated Architecture-Aware Mapping of Streaming Applications Onto GPUs.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Co-synthesis of FPGA-based application-specific floating point simd accelerators.
Proceedings of the ACM/SIGDA 19th International Symposium on Field Programmable Gate Arrays, 2011

A UML 2-based hardware-software co-design framework for body sensor network applications.
Proceedings of the Design, Automation and Test in Europe, 2011

2010
PiPA: Pipelined profiling and analysis on multicore systems.
ACM Trans. Archit. Code Optim., 2010

Interprocedural Placement-Aware Configuration Prefetching for FPGA-Based Systems.
Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2010

2009
Tolerating process variations in large, set-associative caches: The buddy cache.
ACM Trans. Archit. Code Optim., 2009

Automatically patching errors in deployed software.
Proceedings of the 22nd ACM Symposium on Operating Systems Principles 2009, 2009

The salvage cache: A fault-tolerant cache architecture for next-generation memory technologies.
Proceedings of the 27th International Conference on Computer Design, 2009

Optimal Placement-aware Trace-Based Scheduling of Hardware Reconfigurations for FPGA Accelerators.
Proceedings of the FCCM 2009, 2009

A computing origami: folding streams in FPGAs.
Proceedings of the 46th Design Automation Conference, 2009

A DVS-based pipelined reconfigurable instruction memory.
Proceedings of the 46th Design Automation Conference, 2009

BSN Simulator: Optimizing Application Using System Level Simulation.
Proceedings of the Sixth International Workshop on Wearable and Implantable Body Sensor Networks, 2009

A UML-based approach for heterogeneous IP integration.
Proceedings of the 14th Asia South Pacific Design Automation Conference, 2009

2008
Fast, frequency-based, integrated register allocation and instruction scheduling.
Softw. Pract. Exp., 2008

Defining neighborhood relations for fast spatial-temporal partitioning of applications on reconfigurable architectures.
Proceedings of the 2008 International Conference on Field-Programmable Technology, 2008

Pipa: pipelined profiling and analysis on multi-core systems.
Proceedings of the Sixth International Symposium on Code Generation and Optimization (CGO 2008), 2008

How to Do a Million Watchpoints: Efficient Debugging Using Dynamic Instrumentation.
Proceedings of the Compiler Construction, 17th International Conference, 2008

2007
Editorial for the Special Issue on Field Programmable Technology.
J. VLSI Signal Process., 2007

A UML-Based Design Framework for Time-Triggered Applications.
Proceedings of the 28th IEEE Real-Time Systems Symposium (RTSS 2007), 2007

VOSCH: Voltage scaled cache hierarchies.
Proceedings of the 25th International Conference on Computer Design, 2007

DRIM: a low power dynamically reconfigurable instruction memory hierarchy for embedded systems.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

Ubiquitous Memory Introspection.
Proceedings of the Fifth International Symposium on Code Generation and Optimization (CGO 2007), 2007

An Inter-Core Communication Enabled Multi-Core Simulator Based on SimpleScalar.
Proceedings of the 21st International Conference on Advanced Information Networking and Applications (AINA 2007), 2007

2006
Generating hardware from OpenMP programs.
Proceedings of the 2006 IEEE International Conference on Field Programmable Technology, 2006

Co-optimization of Performance and Power in a Superscalar Processor Design.
Proceedings of the Emerging Directions in Embedded and Ubiquitous Computing, 2006

DEP: detailed execution profile.
Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006

2005
Dynamic memory optimization using pool allocation and prefetching.
SIGARCH Comput. Archit. News, 2005

Using UML 2.0 for System Level Design of Real Time SoC Platforms for Stream Processing.
Proceedings of the 11th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2005), 2005

Sensor Grid: Integration ofWireless Sensor Networks and the Grid.
Proceedings of the 30th Annual IEEE Conference on Local Computer Networks (LCN 2005), 2005

Cooperative Instruction Scheduling with Linear Scan Register Allocation.
Proceedings of the High Performance Computing, 2005

A Reconfigurable Instruction Memory Hierarchy for Embedded Systems.
Proceedings of the 2005 International Conference on Field Programmable Logic and Applications (FPL), 2005

An integrated performance and power model for superscalar processor designs.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

Design of clocked circuits using UML.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

Targeted Data Prefetching.
Proceedings of the Advances in Computer Systems Architecture, 10th Asia-Pacific Conference, 2005

A Performance and Power Co-optimization Approach for Modern Processors.
Proceedings of the Fifth International Conference on Computer and Information Technology (CIT 2005), 2005

2004
Data Integrity Framework and Language Support for Active Web Intermediaries.
Proceedings of the Web Content Caching and Distribution: 9th International Workshop, 2004

Model-Driven SoC Design via Executable UML to SystemC.
Proceedings of the 25th IEEE Real-Time Systems Symposium (RTSS 2004), 2004

Adaptive Compiler Directed Prefetching for EPIC Processors.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2004

Configuration bitstream compression for dynamically reconfigurable FPGAs.
Proceedings of the 2004 International Conference on Computer-Aided Design, 2004

Windows CE for a reconfigurable system-on-a-chip processor.
Proceedings of the 2004 IEEE International Conference on Field-Programmable Technology, 2004

Tuning SoC platforms for multimedia processing: identifying limits and tradeoffs.
Proceedings of the 2nd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2004

Static Identification of Delinquent Loads.
Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004

Compiler orchestrated prefetching via speculation and predication.
Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, 2004

2003
SilkRoad II: mixed paradigm cluster computing with RC_dag consistency.
Parallel Comput., 2003

Compiling to FPGAs via an EPIC compiler's intermediate representation.
Proceedings of the 2003 IEEE International Conference on Field-Programmable Technology, 2003

A Model for Hardware Realization of Kernel Loops.
Proceedings of the Field Programmable Logic and Application, 13th International Conference, 2003

The Performance Model of SilkRoad - A Multithreaded DSM System for Clusters.
Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2003), 2003

2002
A Framework for Data Prefetching Using Off-Line Training of Markovian Predictors.
Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002

PD-XML: extensible markup language for processor description.
Proceedings of the 2002 IEEE International Conference on Field-Programmable Technology, 2002

A co-simulation study of adaptive EPIC computing.
Proceedings of the 2002 IEEE International Conference on Field-Programmable Technology, 2002

Shell over a Cluster (SHOC): Towards Achieving Single System Image via the Shell.
Proceedings of the 2002 IEEE International Conference on Cluster Computing (CLUSTER 2002), 2002

SilkRoad II: A Multi-Paradigm Runtime System for Cluster Computing.
Proceedings of the 2002 IEEE International Conference on Cluster Computing (CLUSTER 2002), 2002

2001
Compiler Optimizations for Adaptive EPIC Processors.
Proceedings of the Embedded Software, First International Workshop, 2001

The emerging power crisis in embedded processors: what can a poor compiler do?
Proceedings of the 2001 International Conference on Compilers, 2001

2000
Multiple context multithreaded superscalar processor architecture.
J. Syst. Archit., 2000

ORION: An Adaptive Home-Based Software Distributed Shared Memory System.
Proceedings of the Seventh International Conference on Parallel and Distributed Systems, 2000

SilkRoad: A Multithreaded Runtime System with Software Distributed Shared Memory for SMP Clusters.
Proceedings of the 2000 IEEE International Conference on Cluster Computing (CLUSTER 2000), November 28th, 2000

1999
Optimizing floating point operations in Scheme.
Comput. Lang., 1999

Source Level Static Branch Prediction.
Comput. J., 1999

tmPVM - Task Migratable PVM.
Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999

1996
BaLinda Lisp: Design and Implementation.
Comput. Lang., 1996

1995
Fast Evaluation of the Elementary Functions in Single Precision.
IEEE Trans. Computers, 1995

Evaluation of the Hitachi S-3800 Supercomputer Using Six Benchmarks.
Int. J. High Perform. Comput. Appl., 1995

Compiling Parallel Lisp for a Shared Memory Multiprocessor.
Proceedings of the Seventh IASTED/ISMM International Conference on Parallel and Distributed Computing and Systems, 1995

Highy Efficient Parallel Lisp Implementation on Distributed Systems.
Proceedings of the Parallel Computing: State-of-the-Art and Perspectives, 1995

Design and Implementation of Abstract Machine for Parallel Lisp Compilation.
Proceedings of the 1995 International Conference on Parallel Processing, 1995

1994
Fast Hardware-Based Algorithms for Elementary Function Computations Using Rectangular Multipliers.
IEEE Trans. Computers, 1994

A Simulation Study on the Interactions between Multithreaded Architectures and the Cache.
Int. J. High Speed Comput., 1994

Fast Evaluation of the Elementary Functions in Double Precision.
Proceedings of the 27th Annual Hawaii International Conference on System Sciences (HICSS-27), 1994

1992
A Model of Speculative Parallelism.
Parallel Process. Lett., 1992

Evaluation of the continuation bit in the Cyclic Pipeline Computer.
Parallel Comput., 1992

1991
Effects of Multiple Instruction Stream Execution on Cache Performance.
Int. J. High Speed Comput., 1991

1990
A self interpreter for BaLinda Lisp.
ACM SIGPLAN Notices, 1990

A preliminary evaluation of a massively parallel processor: GAPP.
Microprocessing and Microprogramming, 1990

1989
BIDDLE: a bidirectional data driven Lisp engine.
Proceedings of the IEEE International Workshop on Tools for Artificial Intelligence: Architectures, 1989


  Loading...