Walid A. Najjar

Orcid: 0000-0001-6761-6801

Affiliations:
  • University of California, Riverside, CA, USA


According to our database1, Walid A. Najjar authored at least 139 papers between 1987 and 2021.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of two.

Awards

IEEE Fellow

IEEE Fellow 2007, "For contributions to dataflow and reconfigurable computing architectures".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2021
Efficient local locking for massively multithreaded in-memory hash-based operators.
VLDB J., 2021

Acceleration of Parallel-Blocked QR Decomposition of Tall-and-Skinny Matrices on FPGAs.
ACM Trans. Archit. Code Optim., 2021

2020
Heterogeneous Acceleration of HAR Applications.
IEEE Trans. Circuits Syst. Video Technol., 2020

High-Performance Parallel Radix Sort on FPGA.
Proceedings of the 28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2020

2019
PDES-A: Accelerators for Parallel Discrete Event Simulation Implemented on FPGAs.
ACM Trans. Model. Comput. Simul., 2019

Accelerating In-Memory Database Selections Using Latency Masking Hardware Threads.
ACM Trans. Archit. Code Optim., 2019

Efficient Main-Memory Top-K Selection For Multicore Architectures.
Proc. VLDB Endow., 2019

GPU Accelerated Top-K Selection With Efficient Early Stopping.
Proceedings of the 10th International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures, 2019

2018
Massively parallel skyline computation for processing-in-memory architectures.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017
PDES-A: a Parallel Discrete Event Simulation Accelerator for FPGAs.
Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, 2017

2016
Optimizing hardware design for Human Action Recognition.
Proceedings of the 26th International Conference on Field Programmable Logic and Applications, 2016

Preface.
Proceedings of the 26th International Conference on Field Programmable Logic and Applications, 2016

FPGA-accelerated group-by aggregation using synchronizing caches.
Proceedings of the 12th International Workshop on Data Management on New Hardware, 2016

ROCCC 2.0.
Proceedings of the FPGAs for Software Programmers, 2016

2015
Evaluation and Acceleration of High-Throughput Fixed-Point Object Detection on FPGAs.
IEEE Trans. Circuits Syst. Video Technol., 2015

FHAST: FPGA-Based Acceleration of Bowtie in Hardware.
IEEE ACM Trans. Comput. Biol. Bioinform., 2015

High-Level Language Tools for Reconfigurable Computing.
Proc. IEEE, 2015

High performance FPGA and GPU complex pattern matching over spatio-temporal streams.
GeoInformatica, 2015

CAMs as Synchronizing Caches for Multithreaded Irregular Applications on FPGAs.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2015

FPGA-based Multithreading for In-Memory Hash Joins.
Proceedings of the Seventh Biennial Conference on Innovative Data Systems Research, 2015

2014
A study on parallelizing XML path filtering using accelerators.
ACM Trans. Embed. Comput. Syst., 2014

Reconfigurable Computing.
IEEE Micro, 2014

Compiling irregular applications for reconfigurable systems.
Int. J. High Perform. Comput. Netw., 2014

High-Throughput Fixed-Point Object Detection on FPGAs.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

2013
High-Performance XML Twig Filtering using GPUs.
Proceedings of the International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures, 2013

Stream-Mode FPGA Acceleration of Complex Pattern Trajectory Querying.
Proceedings of the Advances in Spatial and Temporal Databases, 2013

A High Throughput No-Stall Golomb-Rice Hardware Decoder.
Proceedings of the 21st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2013

FPGA code accelerators - the compiler perspective.
Proceedings of the 50th Annual Design Automation Conference 2013, 2013

Compiled multithreaded data paths on FPGAs for dynamic workloads.
Proceedings of the International Conference on Compilers, 2013

Efficient near-duplicate document detection using FPGAs.
Proceedings of the 2013 IEEE International Conference on Big Data (IEEE BigData 2013), 2013

2012
Multithreaded FPGA acceleration of DNA sequence mapping.
Proceedings of the IEEE Conference on High Performance Extreme Computing, 2012

2011
Efficient XML Path Filtering Using GPUs.
Proceedings of the International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures, 2011

Exploring irregular memory accesses on FPGAs.
Proceedings of the first workshop on Irregular applications: architectures and algorithm, 2011

Massively parallel XML twig filtering using dynamic programming on FPGAs.
Proceedings of the 27th International Conference on Data Engineering, 2011

String Matching in Hardware Using the FM-Index.
Proceedings of the IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines, 2011

2010
Impact of high-level transformations within the ROCCC framework.
ACM Trans. Archit. Code Optim., 2010

Accelerating Dynamic Time Warping Subsequence Search with GPUs and FPGAs.
Proceedings of the ICDM 2010, 2010

Accelerating XML Query Matching through Custom Stack Generation on FPGAs.
Proceedings of the High Performance Embedded Architectures and Compilers, 2010

Exploration of Short Reads Genome Mapping in Hardware.
Proceedings of the International Conference on Field Programmable Logic and Applications, 2010

Designing Modular Hardware Accelerators in C with ROCCC 2.0.
Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2010

2009
Energy-efficient encoding techniques for off-chip data buses.
ACM Trans. Embed. Comput. Syst., 2009

Tunable and Energy Efficient Bus Encoding Techniques.
IEEE Trans. Computers, 2009

Boosting XML Filtering with a Scalable FPGA-based Architecture
CoRR, 2009

Reconfigurable Computing in the New Age of Parallelism.
Proceedings of the Embedded Computer Systems: Architectures, 2009

Boosting XML filtering through a scalable FPGA-based architecture.
Proceedings of the Fourth Biennial Conference on Innovative Data Systems Research, 2009

2008
Efficient hardware code generation for FPGAs.
ACM Trans. Archit. Code Optim., 2008

OpenFPGA CoreLib core library interoperability effort.
Parallel Comput., 2008

A Compiler Intermediate Representation for Reconfigurable Fabrics.
Int. J. Parallel Program., 2008

Compiled hardware acceleration of Molecular Dynamics code.
Proceedings of the FPL 2008, 2008

Compiler generated systolic arrays for wavefront algorithm acceleration on FPGAs.
Proceedings of the FPL 2008, 2008

2007
Dynamic Partial FPGA Reconfiguration in a Prototype Microprocessor System.
Proceedings of the FPL 2007, 2007

A one-shot configurable-cache tuner for improved energy and performance.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

Compiling code accelerators for FPGAs.
Proceedings of the 5th International Conference on Hardware/Software Codesign and System Synthesis, 2007

Compiling PCRE to FPGA for accelerating SNORT IDS.
Proceedings of the 2007 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2007

2006
Efficient indexing data structures for flash-based sensor devices.
ACM Trans. Storage, 2006

Compile-time area estimation for LUT-based FPGAs.
ACM Trans. Design Autom. Electr. Syst., 2006

Dynamic Co-Processor Architecture for Software Acceleration on CSoCs.
Proceedings of the 24th International Conference on Computer Design (ICCD 2006), 2006

A code refinement methodology for performance-improved synthesis from C.
Proceedings of the 2006 International Conference on Computer-Aided Design, 2006

A Compiler Intermediate Representation for Reconfigurable Fabrics.
Proceedings of the 2006 International Conference on Field Programmable Logic and Applications (FPL), 2006

Automation of IP Core Interface Generation for Reconfigurable Computing.
Proceedings of the 2006 International Conference on Field Programmable Logic and Applications (FPL), 2006

Automatic Compilation Framework for Bloom Filter Based Intrusion Detection.
Proceedings of the Reconfigurable Computing: Architectures and Applications, 2006

Impact of Loop Unrolling on Area, Throughput and Clock Frequency in ROCCC: C to VHDL Compiler for FPGAs.
Proceedings of the Reconfigurable Computing: Architectures and Applications, 2006

2005
A highly configurable cache for low energy embedded systems.
ACM Trans. Embed. Comput. Syst., 2005

Splitting the sensor node.
Proceedings of the 3rd International Conference on Embedded Networked Sensor Systems, 2005

RISE - Co-S : high performance sensor storage and Co-processing architecture.
Proceedings of the Second Annual IEEE Communications Society Conference on Sensor and Ad Hoc Communications and Networks, 2005

Power Efficient Instruction Caches for Embedded Systems.
Proceedings of the Embedded Computer Systems: Architectures, 2005

Towards In-Situ Data Storage in Sensor Databases.
Proceedings of the Advances in Informatics, 2005

A tunable bus encoder for off-chip data buses.
Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005

Data Acquisition in Sensor Networks with Large Memories.
Proceedings of the 21st International Conference on Data Engineering Workshops, 2005

VALVE: Variable Length Value Encoder for Off-Chip Data Buses..
Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005

Techniques for synthesizing binaries to an advanced register/memory structure.
Proceedings of the ACM/SIGDA 13th International Symposium on Field Programmable Gate Arrays, 2005

MicroHash: An Efficient Index Structure for Flash-Based Sensor Devices.
Proceedings of the FAST '05 Conference on File and Storage Technologies, 2005

Optimized Generation of Data-Path from C Codes for FPGAs.
Proceedings of the 2005 Design, 2005

2004
Input data reuse in compiling window operations onto reconfigurable hardware.
Proceedings of the 2004 ACM SIGPLAN/SIGBED Conference on Languages, 2004

"How Long is Your Belt?" Towards a Single Device for Multiple Functions.
Proceedings of the IEEE/ACS International Conference on Pervasive Services (ICPS'04), 2004

A quantitative analysis of the speedup factors of FPGAs over processors.
Proceedings of the ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays, 2004

From Here to Main-stream: The Present and Future of Reconfigurable Computing.
Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms, 2004

2003
Automatic compilation to a coarse-grained reconfigurable system-opn-chip.
ACM Trans. Embed. Comput. Syst., 2003

High-Level Language Abstraction for Reconfigurable Computing.
Computer, 2003

A Way-Halting Cache for Low-Energy High-Performance Systems.
IEEE Comput. Archit. Lett., 2003

Profiling tools for hardware/software partitioning of embedded applications.
Proceedings of the 2003 Conference on Languages, 2003

Energy Benefits of a Configurable Line Size Cache for Embedded Systems.
Proceedings of the 2003 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2003), 2003

A Highly-Configurable Cache Architecture for Embedded Systems.
Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

FV-MSB: A Scheme for Reducing Transition Activity on Data Buses.
Proceedings of the High Performance Computing - HiPC 2003, 10th International Conference, 2003

First results with eBlocks: embedded systems building blocks.
Proceedings of the 1st IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2003

Power efficient encoding techniques for off-chip data buses.
Proceedings of the International Conference on Compilers, 2003

2002
Mapping a Single Assignment Programming Language to Reconfigurable Systems.
J. Supercomput., 2002

Improving Software Performance with Configurable Logic.
Des. Autom. Embed. Syst., 2002

Fast Area Estimation to Support Compiler Optimizations in FPGA-Based Reconfigurable Systems.
Proceedings of the 10th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2002), 2002

Compiling ATR Probing Codes for Execution on FPGA Hardware.
Proceedings of the 10th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2002), 2002

2001
An automated process for compiling dataflow graphs into reconfigurable hardware.
IEEE Trans. Very Large Scale Integr. Syst., 2001

A New Adaptive Hardware Tree-Based Multicast Routing in K-Ary N-Cubes.
IEEE Trans. Computers, 2001

Resource Management in Dataflow-Based Multithreaded Execution.
J. Parallel Distributed Comput., 2001

Performance Evaluation of a New Hardware Supported Multicast Scheme for K-ary N-cubes.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

Loop fusion and temporal common subexpression elimination in window-based loops.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

Compiling SA-C Programs to FPGAs: Performance Results.
Proceedings of the Computer Vision Systems, Second International Workshop, 2001

One-Step Compilation of Image Processing Applications to FPGAs.
Proceedings of the 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2001

The Sisal Project: Real World Functional Programming.
Proceedings of the Compiler Optimizations for Scalable Parallel Systems Languages, 2001

A compiler framework for mapping applications to a coarse-grained reconfigurable computer architecture.
Proceedings of the 2001 International Conference on Compilers, 2001

2000
A High Level, Algorithmic Programming Language and Compiler for Reconfigurable Systems.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2000

Compiling and Optimizing Image Processing Algorithms for FPGAs.
Proceedings of the Fifth International Workshop on Computer Architectures for Machine Perception (CAMP 2000), 2000

Compiling Image Processing Applications to Reconfigurable Hardware.
Proceedings of the 12th IEEE International Conference on Application-Specific Systems, 2000

1999
Advances in the dataflow computational model.
Parallel Comput., 1999

Combining Adaptive and Deterministic Routing: Evaluation of a Hybrid Router.
Proceedings of the Network-Based Parallel Computing: Communication, 1999

Cameron: High level Language Compilation for Reconfigurable Systems.
Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, 1999

1997
Foreword to the special issues.
Int. J. Parallel Program., 1997

Preliminary Evaluation of a Hybrid Deterministic/Adaptive Router.
Proceedings of the Parallel Computer Routing and Communication, 1997

Empirical Evaluation of Deterministic and Adaptive Routing with Constant-Area Routers.
Proceedings of the 1997 Conference on Parallel Architectures and Compilation Techniques (PACT '97), 1997

1996
Generation, Optimization, and Evaluation of Multithreaded Code.
J. Parallel Distributed Comput., 1996

Comparison of two storage models in data-driven multithreaded architectures.
Proceedings of the Eighth IEEE Symposium on Parallel and Distributed Processing, 1996

Analysis of Buffer Design for Adaptive Routing in Direct Networks.
Proceedings of the MASCOTS '96, 1996

1995
Exploiting Data Structure Locality in the Dataflow Model.
J. Parallel Distributed Comput., 1995

Design of storage hierarchy in multithreaded architectures.
Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, Michigan, USA, November 29, 1995

Control of loop parallelism in multithreaded code.
Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, 1995

Analysis of communications and overhead reduction in multithreaded execution.
Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, 1995

1994
Authors' Reply.
IEEE Trans. Computers, 1994

An Analysis of Edge Fault Tolerance in Recursively Decomposable Regular Networks.
IEEE Trans. Computers, 1994

An evaluation of medium-grain dataflow code.
Int. J. Parallel Program., 1994

Modeling Adaptive Routing in <i>k</i>-ary <i>n</i>-cube Networks.
Proceedings of the MASCOTS '94, Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems, January 31, 1994

An Evaluation of Optimized Threaded Code Generation.
Proceedings of the Parallel Architectures and Compilation Techniques, 1994

A model for dataflow based vector execution.
Proceedings of the 8th international conference on Supercomputing, 1994

Exploiting Locality in Hybrid Dataflow Programs.
Proceedings of the Multithreaded Computer Architecture, 1994

1993
Conditional Disconnection Probability in Star Graphs.
VLSI Design, 1993

A Quantitative Analysis of Dataflow Program Execution - Preliminaries to a Hybrid Design.
J. Parallel Distributed Comput., 1993

Evaluation of Idealized Adaptive Routing on k-ary n-cubes.
Proceedings of the Fifth IEEE Symposium on Parallel and Distributed Processing, 1993

An evaluation of bottom-up and top-down thread generation techniques.
Proceedings of the 26th Annual International Symposium on Microarchitecture, 1993

The Initial Performance of a Bottom-Up Clustering Algorithm for Dataflow Graphs.
Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, 1993

Generation and Quantitative Evaluation of Dataflow Clusters.
Proceedings of the conference on Functional programming languages and computer architecture, 1993

1992
An Analysis of Loop Latency in Dataflow Execution.
Proceedings of the 19th Annual International Symposium on Computer Architecture. Gold Coast, 1992

1991
A Quantitative Analysis of Locality in Dataflow Programs.
Proceedings of the 24th Annual IEEE/ACM International Symposium on Microarchitecture, 1991

Network resilience of star graphs: a comparative analysis.
Proceedings of the 19th annual conference on Computer Science, 1991

1990
Network Resilience: A Measure of Network Fault Tolerance.
IEEE Trans. Computers, 1990

A data-driven execution paradigm for distributed fault-tolerance.
Proceedings of the 4th ACM SIGOPS European Workshop, Bologna, Italy, September 3-5, 1990, 1990

1989
Limits on Scalability in Gracefully Degradable Large-Scale Systems.
Proceedings of the Eigthth Symposium on Reliable Distributed Systems, 1989

A Single-Assignment Language in a Distributed Memory Multiprocessor.
Proceedings of the PARLE '89: Parallel Architectures and Languages Europe, 1989

1988
Network Disconnection in Distributed Systems.
Proceedings of the 8th International Conference on Distributed Computing Systems, 1988

1987
Parallel Discrete-Event Simulation.
IEEE Des. Test, 1987

Reliability and Performance Modelling of Hypercube-Based Mutliprocessors.
Proceedings of the Computer Performance and Reliability, 1987

Multi-Level Execution In Data-Flow Architectures.
Proceedings of the International Conference on Parallel Processing, 1987


  Loading...