Nachiket Kapre

Orcid: 0000-0002-2187-0406

According to our database1, Nachiket Kapre authored at least 96 papers between 2004 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Ditty: Directory-based Cache Coherence for Multicore Safety-critical Systems.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

2022
RapidLayout: Fast Hard Block Placement of FPGA-optimized Systolic Arrays Using Evolutionary Algorithm.
ACM Trans. Reconfigurable Technol. Syst., 2022

HopliteML: Evolving Application Customized FPGA NoCs with Adaptable Routers and Regulators.
ACM Trans. Reconfigurable Technol. Syst., 2022

Managing HBM Bandwidth on Multi-Die FPGAs with FPGA Overlay NoCs.
Proceedings of the 30th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2022

2021
Worst-case latency analysis for the versal NoC network packet switch.
Proceedings of the NOCS '21: International Symposium on Networks-on-Chip, 2021

Mocarabe: High-Performance Time-Multiplexed Overlays for FPGAs.
Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021

2020
HopliteBuf: Network Calculus-Based Design of FPGA NoCs with Provably Stall-Free FIFOs.
ACM Trans. Reconfigurable Technol. Syst., 2020

RapidLayout: Fast Hard Block Placement of FPGA-optimized Systolic Arrays using Evolutionary Algorithms.
CoRR, 2020

DarwiNN: efficient distributed neuroevolution under communication constraints.
Proceedings of the GECCO '20: Genetic and Evolutionary Computation Conference, 2020

RapidLayout: Fast Hard Block Placement of FPGA-Optimized Systolic Arrays using Evolutionary Algorithms.
Proceedings of the 30th International Conference on Field-Programmable Logic and Applications, 2020

Learn the Switches: Evolving FPGA NoCs with Stall-Free and Backpressure Based Routers.
Proceedings of the 30th International Conference on Field-Programmable Logic and Applications, 2020

Exploring The Impact Of Switch Arity On Butterfly Fat Tree Fpga Nocs.
Proceedings of the 28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2020

2019
Partitioning FPGA-Optimized Systolic Arrays for Fun and Profit.
Proceedings of the International Conference on Field-Programmable Technology, 2019

Scaling the Cascades: Interconnect-Aware FPGA Implementation of Machine Learning Problems.
Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019

Timing-Aware Routing in the RapidWright Framework.
Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019

HopliteBuf: FPGA NoCs with Provably Stall-Free FIFOs.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

Enhancing Butterfly Fat Tree NoCs for FPGAs with Lightweight Flow Control.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

RapidRoute: Fast Assembly of Communication Structures for FPGA Overlays.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

2018
CaffePresso: Accelerating Convolutional Networks on Embedded SoCs.
ACM Trans. Embed. Comput. Syst., 2018

FastTrack: Leveraging Heterogeneous FPGA Wires to Design Low-Cost High-Performance Soft NoCs.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

Implementing NEF Neural Networks on Embedded FPGAs.
Proceedings of the International Conference on Field-Programmable Technology, 2018

DaCO: A High-Performance Token Dataflow Coprocessor Overlay for FPGAs.
Proceedings of the International Conference on Field-Programmable Technology, 2018

FastTrack: Exploiting Fast FPGA Wiring for Implementing NoC Shortcuts (Abstract Only).
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

LegUp-NoC: High-Level Synthesis of Loops with Indirect Addressing.
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

Hoplite-Q: Priority-Aware Routing in FPGA Overlay NoCs.
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

2017
Hoplite: A Deflection-Routed Directional Torus NoC for FPGAs.
ACM Trans. Reconfigurable Technol. Syst., 2017

The structure and dynamics of knowledge network in domain-specific Q&A sites: a case study of stack overflow.
Empir. Softw. Eng., 2017

Out-of-Order Dataflow Scheduling for FPGA Overlays.
CoRR, 2017

Applying Models of Computation to OpenCL Pipes for FPGA Computing.
Proceedings of the 5th International Workshop on OpenCL, 2017

HopliteRT: An efficient FPGA NoC for real-time applications.
Proceedings of the International Conference on Field Programmable Technology, 2017

Enabling partial reconfiguration and low latency routing using segmented FPGA NoCs.
Proceedings of the 27th International Conference on Field Programmable Logic and Applications, 2017

Deflection-routed butterfly fat trees on FPGAs.
Proceedings of the 27th International Conference on Field Programmable Logic and Applications, 2017

120-core microAptiv MIPS Overlay for the Terasic DE5-NET FPGA board.
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

Implementing FPGA Overlay NoCs Using the Xilinx UltraScale Memory Cascades.
Proceedings of the 25th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2017

On Bit-Serial NoCs for FPGAs.
Proceedings of the 25th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2017

eBSP: Managing NoC traffic for BSP workloads on the 16-core Adapteva Epiphany-III processor.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017

2016
Optimizing Soft Vector Processing in FPGA-Based Embedded Systems.
ACM Trans. Reconfigurable Technol. Syst., 2016

Software-Specific Named Entity Recognition in Software Engineering Social Content.
Proceedings of the IEEE 23rd International Conference on Software Analysis, 2016

Software-specific part-of-speech tagging: an experimental study on stack overflow.
Proceedings of the 31st Annual ACM Symposium on Applied Computing, 2016

Learning to Extract API Mentions from Informal Natural Language Discussions.
Proceedings of the 2016 IEEE International Conference on Software Maintenance and Evolution, 2016

Deflection routing for multi-level FPGA overlay NoCs.
Proceedings of the 2016 International Conference on Field-Programmable Technology, 2016

Boosting convergence of timing closure using feature selection in a Learning-driven approach.
Proceedings of the 26th International Conference on Field Programmable Logic and Applications, 2016

Vector FPGA acceleration of 1-D DWT computations using sparse matrix skeletons.
Proceedings of the 26th International Conference on Field Programmable Logic and Applications, 2016

Survey of domain-specific languages for FPGA computing.
Proceedings of the 26th International Conference on Field Programmable Logic and Applications, 2016

Hoplite-DSP: Harnessing the Xilinx DSP48 multiplexers to efficiently support NoCs on FPGAs.
Proceedings of the 26th International Conference on Field Programmable Logic and Applications, 2016

Case for Design-Specific Machine Learning in Timing Closure of FPGA Designs.
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

Machine-Learning driven Auto-Tuning of High-Level Synthesis for FPGAs (Abstract Only).
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

GPU-Accelerated High-Level Synthesis for Bitwidth Optimization of FPGA Datapaths.
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

Improving Classification Accuracy of a Machine Learning Approach for FPGA Timing Closure.
Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2016

Communication Optimization for the 16-Core Epiphany Floating-Point Processor Array.
Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2016

Marathon: Statically-Scheduled Conflict-Free Routing on FPGA Overlay NoCs.
Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2016

Evaluating Embedded FPGA Accelerators for Deep Learning Applications.
Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2016

Preventive Detection of Mosquito Populations using Embedded Machine Learning on Low Power IoT Platforms.
Proceedings of the 7th Annual Symposium on Computing for Development, 2016

CaffePresso: an optimized library for deep learning on embedded accelerator-based platforms.
Proceedings of the 2016 International Conference on Compilers, 2016

2015
Communication Optimization of Iterative Sparse Matrix-Vector Multiply on GPUs and FPGAs.
IEEE Trans. Parallel Distributed Syst., 2015

A Case for Embedded FPGA-based SoCs in Energy-Efficient Acceleration of Graph Problems.
Supercomput. Front. Innov., 2015

G-DMA: improving memory access performance for hardware accelerated sparse graph computation.
Proceedings of the International Conference on ReConFigurable Computing and FPGAs, 2015

GraphMMU: Memory Management Unit for Sparse Graph Accelerators.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Enhancing Speedups for FPGA Accelerated SPICE through Frequency Scaling and Precision Reduction.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Limits of FPGA acceleration of 3D Green's Function computation for geophysical applications.
Proceedings of the 25th International Conference on Field Programmable Logic and Applications, 2015

Hoplite: Building austere overlay NoCs for FPGAs.
Proceedings of the 25th International Conference on Field Programmable Logic and Applications, 2015

On Data Forwarding in Deeply Pipelined Soft Processors.
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

FPGA Acceleration of Irregular Iterative Computations using Criticality-Aware Dataflow Optimizations (Abstract Only).
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

InTime: A Machine Learning Approach for Efficient Selection of FPGA CAD Tool Parameters.
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

Zedwulf: Power-Performance Tradeoffs of a 32-Node Zynq SoC Cluster.
Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2015

Driving Timing Convergence of FPGA Designs through Machine Learning and Cloud Computing.
Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2015

Sparse Graph Processing with Soft-Processors.
Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2015

Energy-Efficient Acceleration of OpenCV Saliency Computation Using Soft Vector Processors.
Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2015

Custom FPGA-based soft-processors for sparse graph acceleration.
Proceedings of the 26th IEEE International Conference on Application-specific Systems, 2015

2014
Relax-Miracle: GPU parallelization of semi-analytic fourier-domain solvers for earthquake modeling.
Proceedings of the 21st International Conference on High Performance Computing, 2014

Fanout decomposition dataflow optimizations for FPGA-based Sparse LU factorization.
Proceedings of the 2014 International Conference on Field-Programmable Technology, 2014

Analysis and optimization of a deeply pipelined FPGA soft processor.
Proceedings of the 2014 International Conference on Field-Programmable Technology, 2014

Heterogeneous dataflow architectures for FPGA-based sparse LU factorization.
Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014

Comparing soft and hard vector processing in FPGA-based embedded systems.
Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014

MixFX-SCORE: Heterogeneous Fixed-Point Compilation of Dataflow Computations.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

Timing Fault Detection in FPGA-Based Circuits.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

Breaking Sequential Dependencies in FPGA-Based Sparse LU Factorization.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

2013
System-level FPGA device driver with high-level synthesis support.
Proceedings of the 2013 International Conference on Field-Programmable Technology, 2013

Application Composition and Communication Optimization in Iterative Solvers Using FPGAs.
Proceedings of the 21st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2013

Exploiting Input Parameter Uncertainty for Reducing Datapath Precision of SPICE Device Models.
Proceedings of the 21st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2013

2012
${\rm SPICE}^2$: Spatial Processors Interconnected for Concurrent Execution for Accelerating the SPICE Circuit Simulator Using an FPGA.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2012

Enhancing performance of Tall-Skinny QR factorization using FPGAs.
Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL), 2012

FX-SCORE: A Framework for Fixed-Point Compilation of SPICE Device Models Using Gappa++.
Proceedings of the 2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines, 2012

A High Throughput FPGA-Based Implementation of the Lanczos Method for the Symmetric Extremal Eigenvalue Problem.
Proceedings of the Reconfigurable Computing: Architectures, Tools and Applications, 2012

2011
SPICE²: A Spatial, Parallel Architecture for Accelerating the Spice Circuit Simulator.
PhD thesis, 2011

Spatial hardware implementation for sparse graph algorithms in GraphStep.
ACM Trans. Auton. Adapt. Syst., 2011

An NoC Traffic Compiler for Efficient FPGA Implementation of Sparse Graph-Oriented Workloads.
Int. J. Reconfigurable Comput., 2011

VLIW-SCORE: Beyond C for sequential control of SPICE FPGA acceleration.
Proceedings of the 2011 International Conference on Field-Programmable Technology, 2011

2010
An NoC Traffic Compiler for efficient FPGA implementation of Parallel Graph Applications.
Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip, 2010

2009
Pipelining Saturated Accumulation.
IEEE Trans. Computers, 2009

Performance comparison of single-precision SPICE Model-Evaluation on FPGA, GPU, Cell, and multi-core processors.
Proceedings of the 19th International Conference on Field Programmable Logic and Applications, 2009

Accelerating SPICE Model-Evaluation using FPGAs.
Proceedings of the FCCM 2009, 2009

2007
Optimistic Parallelization of Floating-Point Accumulation.
Proceedings of the 18th IEEE Symposium on Computer Arithmetic (ARITH-18 2007), 2007

2006
Packet Switched vs. Time Multiplexed FPGA Overlay Networks.
Proceedings of the 14th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2006), 2006

GraphStep: A System Architecture for Sparse-Graph Algorithms.
Proceedings of the 14th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2006), 2006

2004
Design Patterns for Reconfigurable Computing.
Proceedings of the 12th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2004), 2004


  Loading...