Guy Lemieux

Orcid: 0000-0002-7924-8695

Affiliations:
  • University of British Columbia, Canada


According to our database1, Guy Lemieux authored at least 82 papers between 1996 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
MEDUSA: A Multi-Resolution Machine Learning Congestion Estimation Method for 2D and 3D Global Routing.
ACM Trans. Design Autom. Electr. Syst., September, 2023

Cache Abstraction for Data Race Detection in Heterogeneous Systems with Non-coherent Accelerators.
ACM Trans. Embed. Comput. Syst., 2023

2022
Mixing Low-Precision Formats in Multiply-Accumulate Units for DNN Training.
Proceedings of the International Conference on Field-Programmable Technology, 2022

2020
Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

2019
TinBiNN: Tiny Binarized Neural Network Overlay in about 5, 000 4-LUTs and 5mW.
CoRR, 2019

Full Deep Neural Network Training On A Pruned Weight Budget.
Proceedings of the Second Conference on Machine Learning and Systems, SysML 2019, 2019

Software-based Dynamic Overlays Require Fast, Fine-grained Partial Reconfiguration.
Proceedings of the 10th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, 2019

Low-Level Loop Analysis and Pipelining of Applications Mapped to Xilinx FPGAs.
Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019

2018
DropBack: Continuous Pruning During Training.
CoRR, 2018

An Accelerated OpenVX Overlay for Pure Software Programmers.
Proceedings of the International Conference on Field-Programmable Technology, 2018

Modular Block-RAM-Based Longest-Prefix Match Ternary Content-Addressable Memories.
Proceedings of the 28th International Conference on Field Programmable Logic and Applications, 2018

2017
Exploring automated space/time tradeoffs for OpenVX compute graphs.
Proceedings of the International Conference on Field Programmable Technology, 2017

Real-time object detection in software with custom vector instructions and algorithm changes.
Proceedings of the 28th IEEE International Conference on Application-specific Systems, 2017

2016
Modular Switched Multiported SRAM-Based Memories.
ACM Trans. Reconfigurable Technol. Syst., 2016

Automated Space/Time Scaling of Streaming Task Graph.
CoRR, 2016

A Multi-ported Memory Compiler Utilizing True Dual-Port BRAMs.
Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2016

2015
Fast and Memory-Efficient Routing Algorithms for Field Programmable Gate Arrays With Sparse Intracluster Routing Crossbars.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2015

Area Optimization of Arithmetic Units by Component Sharing for FPGAs (Abstract Only).
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

Wavefront Skipping using BRAMs for Conditional Algorithms on Vector Processors.
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

Rapid Overlay Builder for Xilinx FPGAs.
Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2015

Modular SRAM-Based Binary Content-Addressable Memories.
Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2015

2014
Deep and narrow binary content-addressable memories using FPGA-based BRAMs.
Proceedings of the 2014 International Conference on Field-Programmable Technology, 2014

Soft vector processors with streaming pipelines.
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

Modular multi-ported SRAM-based memories.
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

2013
TputCache: High-frequency, multi-way cache for high-throughput FPGA applications.
Proceedings of the 23rd International Conference on Field programmable Logic and Applications, 2013

An efficient FPGA overlay for portable custom instruction set extensions.
Proceedings of the 23rd International Conference on Field programmable Logic and Applications, 2013

Safe Overclocking of Tightly Coupled CGRAs and Processor Arrays using Razor.
Proceedings of the 21st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2013

Embedded supercomputing in FPGAs with the VectorBlox MXP Matrix Processor.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2013

2012
Rapid Synthesis and Simulation of Computational Circuits in an MPPA.
J. Signal Process. Syst., 2012

VENICE: A compact vector processor for FPGA applications.
Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012

Pipeline frequency boosting: Hiding dual-ported block RAM latency using intentional clock skew.
Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012

Routing algorithms for FPGAS with sparse intra-cluster routing crossbars.
Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL), 2012

Parallel FPGA placement based on individual LUT placement (abstract only).
Proceedings of the ACM/SIGDA 20th International Symposium on Field Programmable Gate Arrays, 2012

Reducing the cost of floating-point mantissa alignment and normalization in FPGAs.
Proceedings of the ACM/SIGDA 20th International Symposium on Field Programmable Gate Arrays, 2012

Accelerator compiler for the VENICE vector processor.
Proceedings of the ACM/SIGDA 20th International Symposium on Field Programmable Gate Arrays, 2012

ZUMA: An Open FPGA Overlay Architecture.
Proceedings of the 2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines, 2012

2011
Performance and Cost Tradeoffs in Metal-Programmable Structured ASICs (MPSAs).
IEEE Trans. Very Large Scale Integr. Syst., 2011

Deterministic Timing-Driven Parallel Placement by Simulated Annealing Using Half-Box Window Decomposition.
Proceedings of the 2011 International Conference on Reconfigurable Computing and FPGAs, 2011

Configuration Bitstream Reduction for SRAM-based FPGAs by Enumerating LUT Input Permutations.
Proceedings of the 2011 International Conference on Reconfigurable Computing and FPGAs, 2011

Scalable and deterministic timing-driven parallel placement for FPGAs.
Proceedings of the ACM/SIGDA 19th International Symposium on Field Programmable Gate Arrays, 2011

The role of FPGAs in a converged future with heterogeneous programmable processors: pre-conference workshop.
Proceedings of the ACM/SIGDA 19th International Symposium on Field Programmable Gate Arrays, 2011

A CAD framework for Malibu: an FPGA with time-multiplexed coarse-grained elements.
Proceedings of the ACM/SIGDA 19th International Symposium on Field Programmable Gate Arrays, 2011

VEGAS: soft vector processor with scratchpad memory.
Proceedings of the ACM/SIGDA 19th International Symposium on Field Programmable Gate Arrays, 2011

2010
A 4 GHz Non-Resonant Clock Driver With Inductor-Assisted Energy Return to Power Grid.
IEEE Trans. Circuits Syst. I Regul. Pap., 2010

The impact of interconnect architecture on via-programmed structured ASICs (VPSAs).
Proceedings of the ACM/SIGDA 18th International Symposium on Field Programmable Gate Arrays, 2010

2009
Vector Processing as a Soft Processor Accelerator.
ACM Trans. Reconfigurable Technol. Syst., 2009

Estimating reliability and throughput of source-synchronous wave-pipelined interconnect.
Proceedings of the Third International Symposium on Networks-on-Chips, 2009

PGR: Period and glitch reduction via clock skew scheduling, delay padding and GlitchLess.
Proceedings of the 2009 International Conference on Field-Programmable Technology, 2009

Replace: An incremental placement algorithm for field programmable gate arrays.
Proceedings of the 19th International Conference on Field Programmable Logic and Applications, 2009

Towards reliable 5Gbps wave-pipelined and 3Gbps surfing interconnect in 65nm FPGAs.
Proceedings of the ACM/SIGDA 17th International Symposium on Field Programmable Gate Arrays, 2009

PERG-Rx: a hardware pattern-matching engine supporting limited regular expressions.
Proceedings of the ACM/SIGDA 17th International Symposium on Field Programmable Gate Arrays, 2009

2008
Interconnect Driver Design for Long Wires in Field-Programmable Gate Arrays.
J. Signal Process. Syst., 2008

GlitchLess: Dynamic Power Minimization in FPGAs Through Edge Alignment and Glitch Filtering.
IEEE Trans. Very Large Scale Integr. Syst., 2008

Perturb+mutate: Semisynthetic circuit generation for incremental placement and routing.
ACM Trans. Reconfigurable Technol. Syst., 2008

Energy Recovery from High-Frequency Clocks Using DC-DC Converters.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2008

PERG: A scalable FPGA-based pattern-matching engine with consolidated Bloomier filters.
Proceedings of the 2008 International Conference on Field-Programmable Technology, 2008

Vector processing as a soft-core CPU accelerator.
Proceedings of the ACM/SIGDA 16th International Symposium on Field Programmable Gate Arrays, 2008

Designing with extreme parallelism.
Proceedings of the ACM/SIGDA 16th International Symposium on Field Programmable Gate Arrays, 2008

Extreme parallel architectures for the masses.
Proceedings of the ACM/SIGDA 16th International Symposium on Field Programmable Gate Arrays, 2008

2007
A Survey and Taxonomy of GALS Design Styles.
IEEE Des. Test Comput., 2007

Congestion estimation and localization in FPGAS: a visual tool for interconnect prediction.
Proceedings of the Ninth International Workshop on System-Level Interconnect Prediction (SLIP 2007), 2007

A 3GHz Switching DC-DC Converter Using Clock-Tree Charge-Recycling in 90nm CMOS with Integrated Output Filter.
Proceedings of the 2007 IEEE International Solid-State Circuits Conference, 2007

A Case for Soft Vector Processors in FPGAs.
Proceedings of the 2007 International Conference on Field-Programmable Technology, 2007

GlitchLess: an active glitch minimization technique for FPGAs.
Proceedings of the ACM/SIGDA 15th International Symposium on Field Programmable Gate Arrays, 2007

2006
System-on-Chip: Reuse and Integration.
Proc. IEEE, 2006

Un/DoPack: re-clustering of large system-on-chip designs with interconnect variation for low-cost FPGAs.
Proceedings of the 2006 International Conference on Computer-Aided Design, 2006

Perturber: semi-synthetic circuit generation using ancestor control for testing incremental place and route.
Proceedings of the 2006 IEEE International Conference on Field Programmable Technology, 2006

Semi-Synthetic Circuit Generation Using Graph Monomorphism for Testing Incremental Placement and Incremental Routing Tools.
Proceedings of the 2006 International Conference on Field Programmable Logic and Applications (FPL), 2006

2005
FPGA Defect Tolerance: Impact of Granularity.
Proceedings of the 2005 IEEE International Conference on Field-Programmable Technology, 2005

Defect-Tolerant FPGA Switch Block and Connection Block with Fine-Grain Redundancy for Yield Enhancement.
Proceedings of the 2005 International Conference on Field Programmable Logic and Applications (FPL), 2005

Logic block clustering of large designs for channel-width constrained FPGAs.
Proceedings of the 42nd Design Automation Conference, 2005

An improved "soft" eFPGA design and implementation strategy.
Proceedings of the IEEE 2005 Custom Integrated Circuits Conference, 2005

2004
Directional and single-driver wires in FPGA interconnect.
Proceedings of the 2004 IEEE International Conference on Field-Programmable Technology, 2004

Design of interconnection networks for programmable logic.
Kluwer, ISBN: 978-1-4020-7700-5, 2004

2002
Analytical Framework for Switch Block Design.
Proceedings of the Field-Programmable Logic and Applications, 2002

Circuit design of routing switches.
Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2002

2001
Using sparse crossbars within LUT.
Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2001

2000

Generating highly-routable sparse crossbars for PLDs.
Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2000

1998
Design and Implementation of the NUMAchine Multiprocessor.
Proceedings of the 35th Conference on Design Automation, 1998

1997
On two-step routing for FPGAS.
Proceedings of the 1997 International Symposium on Physical Design, 1997

1996
Segmented Routing for Speed-Performance and Routability in Field-Programmable Gate Arrays.
VLSI Design, 1996


  Loading...