James C. Hoe

Orcid: 0000-0002-9302-5287

Affiliations:
  • Carnegie Mellon University, Pittsburgh, USA


According to our database1, James C. Hoe authored at least 115 papers between 1994 and 2023.

Collaborative distances:

Awards

IEEE Fellow

IEEE Fellow 2013, "For contributions to high-level hardware design and synthesis".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
Exploiting the Common Case When Accelerating Input-Dependent Stream Processing by FPGA.
IEEE Trans. Computers, May, 2023

Perspectives on AI Architectures and Co-design for Earth System Predictability.
CoRR, 2023

Ensō: A Streaming Interface for NIC-Application Communication.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

2022
Pigasus 2.0: making the pigasus IDS robust to attacks and different workloads.
Proceedings of the SIGCOMM '22 Poster and Demo Sessions, 2022

Flexible Hardware Accelerator Design Generation with Spiral.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2022

2021
A Roadmap for Enabling a Future-Proof In-Network Computing Data Plane Ecosystem.
CoRR, 2021

We need kernel interposition over the network dataplane.
Proceedings of the HotOS '21: Workshop on Hot Topics in Operating Systems, 2021

HerQules: securing programs via hardware-enforced message queues.
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

2020
High-Performance Memory Snapshotting for Real-Time, Consistent, Hypervisor-Based Monitors.
IEEE Trans. Dependable Secur. Comput., 2020

Achieving 100Gbps Intrusion Prevention on a Single Server.
Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020

Beyond Peak Performance: Comparing the Real Performance of AI-Optimized FPGAs and GPUs.
Proceedings of the International Conference on Field-Programmable Technology, 2020

Partial Reconfiguration for Design Optimization.
Proceedings of the 30th International Conference on Field-Programmable Logic and Applications, 2020

A Service-Oriented Memory Architecture for FPGA Computing.
Proceedings of the 30th International Conference on Field-Programmable Logic and Applications, 2020

2019
Efficient SpMV Operation for Large and Highly Sparse Matrices using Scalable Multi-way Merge Parallelization.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Quantifying the Benefits of Dynamic Partial Reconfiguration for Embedded Vision Applications.
Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019

Processor Assisted Worklist Scheduling for FPGA Accelerated Graph Processing on a Shared-Memory Platform.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

2018
SPIRAL: Extreme Performance Portability.
Proc. IEEE, 2018

Time-Shared Execution of Realtime Streaming Pipelines by Dynamic Partial Reconfiguration.
CoRR, 2018

PageRank Acceleration for Large Graphs with Scalable Hardware and Two-Step SpMV.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018

Time-Shared Execution of Realtime Computer Vision Pipelines by Dynamic Partial Reconfiguration.
Proceedings of the 28th International Conference on Field Programmable Logic and Applications, 2018

2017
Using Vivado-HLS for Structural Design: a NoC Case Study.
CoRR, 2017

Amorphous Dynamic Partial Reconfiguration with Flexible Boundaries to Remove Fragmentation.
CoRR, 2017

Using Vivado-HLS for Structural Design: a NoC Case Study (Abstract Only).
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

2016
3D Point Cloud Reduction Using Mixed-Integer Quadratic Programming.
Proceedings of the Deep Learning and Convolutional Neural Networks for Medical Image Computing, 2016

FFTs with Near-Optimal Memory Access Through Block Data Layouts: Algorithm, Architecture and Design Automation.
J. Signal Process. Syst., 2016

HAMLeT Architecture for Parallel Data Reorganization in Memory.
IEEE Micro, 2016

FPGA compute acceleration is first about energy efficiency: technical perspective.
Commun. ACM, 2016

A Study of Pointer-Chasing Performance on Shared-Memory Processor-FPGA Systems.
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

2015
The CONNECT Network-on-Chip Generator.
Computer, 2015

Enabling portable energy efficiency with memory accelerated library.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

DELPHI: a framework for RTL-based architecture design evaluation using DSENT models.
Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015

Data reorganization in memory using 3D-stacked DRAM.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

CoRAM++: Supporting data-structure-specific memory interfaces for FPGA computing.
Proceedings of the 25th International Conference on Field Programmable Logic and Applications, 2015

Nautilus: fast automated IP design space search using guided genetic algorithms.
Proceedings of the 52nd Annual Design Automation Conference, 2015

2014
FPGA-Accelerated Simulation of Computer Systems
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01744-5, 2014

Highly-parallel special-purpose multicore architecture for SystemC/TLM simulations.
Proceedings of the XIVth International Conference on Embedded Computer Systems: Architectures, 2014

FFTS with near-optimal memory access through block data layouts.
Proceedings of the IEEE International Conference on Acoustics, 2014

Algorithm/hardware co-optimized SAR image reconstruction with 3D-stacked logic in memory.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2014

HAMLeT: Hardware accelerated memory layout transform within 3D-stacked DRAM.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2014

GraphGen: An FPGA Framework for Vertex-Centric Graph Computation.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

Understanding the design space of DRAM-optimized hardware FFT accelerators.
Proceedings of the IEEE 25th International Conference on Application-Specific Systems, 2014

2013
C-to-CoRAM: compiling perfect loop nests to the portable CoRAM abstraction.
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

Cross-platform FPGA accelerator development using CoRAM and CONNECT.
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

3D Point Cloud Reduction Using Mixed-Integer Quadratic Programming.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013

A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing.
Proceedings of the 2013 IEEE International 3D Systems Integration Conference (3DIC), 2013

2012
Computer Generation of Hardware for Linear Digital Signal Processing Transforms.
ACM Trans. Design Autom. Electr. Syst., 2012

Highly Efficient Performance Portable Tracking of Evolving Surfaces.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Improving fixed-point accuracy of FFT cores in O-OFDM systems.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

CONNECT: re-examining conventional wisdom for designing nocs in the context of FPGAs.
Proceedings of the ACM/SIGDA 20th International Symposium on Field Programmable Gate Arrays, 2012

Prototype and evaluation of the CoRAM memory architecture for FPGA-based computing.
Proceedings of the ACM/SIGDA 20th International Symposium on Field Programmable Gate Arrays, 2012

Algorithm and architecture optimization for large size two dimensional discrete fourier transform (abstract only).
Proceedings of the ACM/SIGDA 20th International Symposium on Field Programmable Gate Arrays, 2012

Memory Bandwidth Efficient Two-Dimensional Fast Fourier Transform Algorithm and Implementation for Large Problem Sizes.
Proceedings of the 2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines, 2012

2011
Automatic Pipelining From Transactional Datapath Specifications.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2011

Commercial Antivirus Software Effectiveness: An Empirical Study.
Computer, 2011

Integrating formal verification and high-level processor pipeline synthesis.
Proceedings of the IEEE 9th Symposium on Application Specific Processors, 2011

FIST: A fast, lightweight, FPGA-friendly packet latency estimator for NoC modeling in full-system simulations.
Proceedings of the NOCS 2011, 2011

CoRAM: an in-fabric memory architecture for FPGA-based computing.
Proceedings of the ACM/SIGDA 19th International Symposium on Field Programmable Gate Arrays, 2011

2010
High Performance Stereo Vision Designed for Massively Data Parallel Platforms.
IEEE Trans. Circuits Syst. Video Technol., 2010

High-Level Design and Validation of the BlueSPARC Multithreaded Processor.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2010

The Future of Architectural Simulation.
IEEE Micro, 2010

Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Fast bilateral filtering by adapting block size.
Proceedings of the International Conference on Image Processing, 2010

Hardware implementation of the discrete fourier transform with non-power-of-two problem size.
Proceedings of the IEEE International Conference on Acoustics, 2010

Automatic multithreaded pipeline synthesis from transactional datapath specifications.
Proceedings of the 47th Design Automation Conference, 2010

2009
ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs.
ACM Trans. Reconfigurable Technol. Syst., 2009

Permuting streaming data using RAMs.
J. ACM, 2009

Chip-Level Redundancy in Distributed Shared-Memory Multiprocessors.
Proceedings of the 2009 15th IEEE Pacific Rim International Symposium on Dependable Computing, 2009

Implementing a high-performance multithreaded microprocessor: A case study in high-level design and validation.
Proceedings of the 7th ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE 2009), 2009

2009 MEMOCODE Co-Design Contest.
Proceedings of the 7th ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE 2009), 2009

Real time stereo vision using exponential step cost aggregation on GPU.
Proceedings of the International Conference on Image Processing, 2009

Automatic generation of streaming datapaths for arbitrary fixed permutations.
Proceedings of the Design, Automation and Test in Europe, 2009

Dependable VLSI: device, design and architecture: how should they cooperate?
Proceedings of the 14th Asia South Pacific Design Automation Conference, 2009

2008
MEMOCODE 2006 guest editors' introduction.
Des. Autom. Embed. Syst., 2008

MEMOCODE 2008 Co-Design Contest.
Proceedings of the 6th ACM & IEEE International Conference on Formal Methods and Models for Co-Design (MEMOCODE 2008), 2008

Domain-specific library generation for parallel software and hardware platforms.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

A complexity-effective architecture for accelerating full-system multiprocessor simulations using FPGAs.
Proceedings of the ACM/SIGDA 16th International Symposium on Field Programmable Gate Arrays, 2008

Formal datapath representation and manipulation for implementing DSP transforms.
Proceedings of the 45th Design Automation Conference, 2008

2007
Time-Multiplexed Multiple-Constant Multiplication.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2007

RAMP: Research Accelerator for Multiple Processors.
IEEE Micro, 2007

PAI: A Lightweight Mechanism for Single-Node Memory Recovery in DSM Servers.
Proceedings of the 13th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC 2007), 2007

Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding.
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

MEMOCODE 2007 Co-Design Contest.
Proceedings of the 5th ACM & IEEE International Conference on Formal Methods and Models for Co-Design (MEMOCODE 2007), May 30, 2007

PROToFLEX: FPGA-accelerated Hybrid Functional Simulator.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

FFT Compiler: from math to efficient hardware HLDVT invited short paper.
Proceedings of the IEEE International High Level Design Validation and Test Workshop, 2007

Generating FPGA-Accelerated DFT Libraries.
Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines, 2007

2006
Statistical sampling of microarchitecture simulation.
ACM Trans. Model. Comput. Simul., 2006

SimFlex: Statistical Sampling of Computer System Simulation.
IEEE Micro, 2006

Reunion: Complexity-Effective Multicore Redundancy.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

Simulation sampling with live-points.
Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006

Spiral: Joint Runtime and Energy Optimization of Linear Transforms.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Research accelerator for multiple processors.
Proceedings of the 2006 IEEE Hot Chips 18 Symposium (HCS), 2006

Fast and accurate resource estimation of automatically generated custom DFT IP cores.
Proceedings of the ACM/SIGDA 14th International Symposium on Field Programmable Gate Arrays, 2006

2005
TRUSS: A Reliable, Scalable Server Architecture.
IEEE Micro, 2005

TurboSMARTS: accurate microarchitecture simulation sampling in minutes.
Proceedings of the International Conference on Measurements and Modeling of Computer Systems, 2005

Automatic generation of customized discrete fourier transform IPs.
Proceedings of the 42nd Design Automation Conference, 2005

2004
Operation-centric hardware description and synthesis.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2004

SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture.
SIGMETRICS Perform. Evaluation Rev., 2004

Fingerprinting: Bounding Soft-Error-Detection Latency and Bandwidth.
IEEE Micro, 2004

Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures.
Proceedings of the 37th Annual International Symposium on Microarchitecture (MICRO-37 2004), 2004

Synchronous extensions to operation centric hardware description languages.
Proceedings of the 2nd ACM & IEEE International Conference on Formal Methods and Models for Co-Design (MEMOCODE 2004), 2004

Custom-optimized multiplierless implementations of DSP algorithms.
Proceedings of the 2004 International Conference on Computer-Aided Design, 2004

Automatic cost minimization for multiplierless implementations of discrete signal transforms.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

In-system FPGA prototyping of an itanium microarchitecture.
Proceedings of the ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays, 2004

Multiple constant multiplication by time-multiplexed mapping of addition chains.
Proceedings of the 41th Design Automation Conference, 2004

2003
Superscalar out-of-order demystified in four instructions.
Proceedings of the 2003 workshop on Computer architecture education, 2003

SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling.
Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

High-level modeling and FPGA prototyping of microprocessors.
Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2003

2001
Dual use of superscalar datapath for transient-fault detection and recovery.
Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001

2000
Operation-centric hardware description and synthesis.
PhD thesis, 2000

Synthesis of Operation-Centric Hardware Descriptions.
Proceedings of the 2000 IEEE/ACM International Conference on Computer-Aided Design, 2000

1999
A Personal Supercomputer for Climate Research.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1999

Hardware Synthesis from Term Rewriting Systems.
Proceedings of the VLSI: Systems on a Chip, 1999

1998
MPI-StarT: Delivering Network Performance to Numerical Applications.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1998

1995
START-NG: Delivering Seamless Parallel Computing.
Proceedings of the Euro-Par '95 Parallel Processing, 1995

1994
Network substrate for parallel processing on a workstation cluster.
Proceedings of the Hot Interconnects II, 1994


  Loading...