S. K. Nandy

Affiliations:
  • Indian Institute of Science (IISc), Department of Computational and Data Sciences, CAD Laboratory, Bangalore, India
  • ERNET, India


According to our database1, S. K. Nandy authored at least 155 papers between 1986 and 2022.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2022
Symmetric Convolutional Filters: A Novel Way to Constrain Parameters in CNN.
CoRR, 2022

A Survey on High-Throughput Non-Binary LDPC Decoders: ASIC, FPGA, and GPU Architectures.
IEEE Commun. Surv. Tutorials, 2022

2021
Factorization of Boolean Polynomials: Parallel Algorithms and Experimental Evaluation.
Program. Comput. Softw., 2021

2020
Towards Accelerated Genome Informatics on Parallel HPC Platforms: The ReneGENE-GI Perspective.
J. Signal Process. Syst., 2020

2019
EX-DRIVE: An Execution Driven Functional Verification Flow.
J. Low Power Electron., 2019

Applying Modified Householder Transform to Kalman Filter.
Proceedings of the 32nd International Conference on VLSI Design and 18th International Conference on Embedded Systems, 2019

A Systematic Approach for Acceleration of Matrix-Vector Operations in CGRA through Algorithm-Architecture Co-Design.
Proceedings of the 32nd International Conference on VLSI Design and 18th International Conference on Embedded Systems, 2019

Parallel Factorization of Boolean Polynomials.
Proceedings of the Perspectives of System Informatics, 2019

2018
A Hardware Architecture for Radial Basis Function Neural Network Classifier.
IEEE Trans. Parallel Distributed Syst., 2018

Efficient Realization of Householder Transform Through Algorithm-Architecture Co-Design for Acceleration of QR Factorization.
IEEE Trans. Parallel Distributed Syst., 2018

Efficient Realization of Givens Rotation through Algorithm-Architecture Co-design for Acceleration of QR Factorization.
CoRR, 2018

Design Space Exploration of an Execution-Driven Functional Simulation Methodology.
Proceedings of the 31st International Conference on VLSI Design and 17th International Conference on Embedded Systems, 2018

An Algorithm - Architecture Co-Designed System for Dynamic Execution-Driven Pre-Silicon Verification.
Proceedings of the 8th International Symposium on Embedded Computing and System Design, 2018

ReneGENE-GI: Empowering Precision Genomics with FPGAs on HPCs.
Proceedings of the Applied Reconfigurable Computing. Architectures, Tools, and Applications, 2018

ReneGENE-Novo: Co-designed Algorithm-Architecture for Accelerated Preprocessing and Assembly of Genomic Short Reads.
Proceedings of the Applied Reconfigurable Computing. Architectures, Tools, and Applications, 2018

Achieving Efficient Realization of Kalman Filter on CGRA Through Algorithm-Architecture Co-design.
Proceedings of the Applied Reconfigurable Computing. Architectures, Tools, and Applications, 2018

2017
Accelerating BLAS and LAPACK via Efficient Floating Point Architecture Design.
Parallel Process. Lett., 2017

Energy aware synthesis of application kernels through composition of data-paths on a CGRA.
Integr., 2017

REDEFINE<sup>®</sup>™: a case for WCET-friendly hardware accelerators for real time applications (work-in-progress).
Proceedings of the 2017 International Conference on Compilers, 2017

2016
REFRESH: REDEFINE for Face Recognition Using SURE Homogeneous Cores.
IEEE Trans. Parallel Distributed Syst., 2016

Role based shared memory access control mechanisms in NoC based MP-SoC.
Nano Commun. Networks, 2016

Accelerating BLAS on Custom Architecture through Algorithm-Architecture Co-design.
CoRR, 2016

An Energy Efficient Dynamically Reconfigurable QR Decomposition for Wireless MIMO Communication.
Proceedings of the 29th International Conference on VLSI Design and 15th International Conference on Embedded Systems, 2016

Achieving Efficient QR Factorization by Algorithm-Architecture Co-design of Householder Transformation.
Proceedings of the 29th International Conference on VLSI Design and 15th International Conference on Embedded Systems, 2016

Efficient Realization of Table Look-Up Based Double Precision Floating Point Arithmetic.
Proceedings of the 29th International Conference on VLSI Design and 15th International Conference on Embedded Systems, 2016

VOP: Architecture of a Processor for Vector Operations in On-Line Learning of Neural Networks.
Proceedings of the 29th International Conference on VLSI Design and 15th International Conference on Embedded Systems, 2016

VOP: Architecture of a Processor for Vector Operations in On-Line Learning of Neural Networks.
Proceedings of the 29th International Conference on VLSI Design and 15th International Conference on Embedded Systems, 2016

RHyMe: REDEFINE Hyper Cell Multicore for Accelerating HPC Kernels.
Proceedings of the 29th International Conference on VLSI Design and 15th International Conference on Embedded Systems, 2016

AccuRA: Accurate alignment of short reads on scalable reconfigurable accelerators.
Proceedings of the International Conference on Embedded Computer Systems: Architectures, 2016

Flexible resource allocation and management for application graphs on ReNÉ MPSoC.
Proceedings of the 7th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and the 5th Workshop on Design Tools and Architectures For Multicore Embedded Computing Platforms, 2016

Performance Evaluation of Feed-Forward Backpropagation Neural Network for Classification on a Reconfigurable Hardware Architecture.
Proceedings of the Applied Reconfigurable Computing - 12th International Symposium, 2016

2015
Scalable and Energy Efficient, Dynamically Reconfigurable Fast Fourier Transform Architecture.
J. Low Power Electron., 2015

Router Attack toward NoC-enabled MPSoC and Monitoring Countermeasures against such Threat.
Circuits Syst. Signal Process., 2015

A Flexible Scalable Hardware Architecture for Radial Basis Function Neural Networks.
Proceedings of the 28th International Conference on VLSI Design, 2015

Micro-architectural Enhancements in Distributed Memory CGRAs for LU and QR Factorizations.
Proceedings of the 28th International Conference on VLSI Design, 2015

Hardware Solution for Real-Time Face Recognition.
Proceedings of the 28th International Conference on VLSI Design, 2015

An accelerator for classification using radial basis function neural network.
Proceedings of the 28th IEEE International System-on-Chip Conference, 2015

A deterministic, minimal routing algorithm for a toroidal, rectangular honeycomb topology using a 2-tupled relative address.
Proceedings of the 28th IEEE International System-on-Chip Conference, 2015

Location Obfuscation for Location Data Privacy.
Proceedings of the 2015 IEEE World Congress on Services, 2015

Energy Aware Synthesis of Application Kernels Expressed in Functional Languages on a Coarse Grained Composable Reconfigurable Array.
Proceedings of the IEEE International Symposium on Nanoelectronic and Information Systems, 2015

Compiling HPC Kernels for the REDEFINE CGRA.
Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

High Performance Computing Cloud - A Platform-as-a-Service Perspective.
Proceedings of the International Conference on Cloud Computing and Big Data, 2015

2014
A framework for post-silicon realization of arbitrary instruction extensions on reconfigurable data-paths.
J. Syst. Archit., 2014

Efficient QR Decomposition Using Low Complexity Column-wise Givens Rotation (CGR).
Proceedings of the 2014 27th International Conference on VLSI Design, 2014

Scalable and energy-efficient reconfigurable accelerator for column-wise givens rotation.
Proceedings of the 22nd International Conference on Very Large Scale Integration, 2014

Co-exploration of NLA kernels and specification of Compute Elements in distributed memory CGRAs.
Proceedings of the XIVth International Conference on Embedded Computer Systems: Architectures, 2014

Synthesis of Instruction Extensions on HyperCell, a reconfigurable datapath.
Proceedings of the XIVth International Conference on Embedded Computer Systems: Architectures, 2014

Energy Efficient, Scalable, and Dynamically Reconfigurable FFT Architecture for OFDM Systems.
Proceedings of the 2014 Fifth International Symposium on Electronic System Design, 2014

Hardware architecture of bi-cubic convolution interpolation for real-time image scaling.
Proceedings of the 2014 International Conference on Field-Programmable Technology, 2014

Real Time Routing in Road Networks.
Proceedings of the 2014 IEEE Fourth International Conference on Big Data and Cloud Computing, 2014

Efficient Storage of Big-Data for Real-Time GPS Applications.
Proceedings of the 2014 IEEE Fourth International Conference on Big Data and Cloud Computing, 2014

Efficient and scalable CGRA-based implementation of Column-wise Givens Rotation.
Proceedings of the IEEE 25th International Conference on Application-Specific Systems, 2014

2013
Optimal Pipeline Depth and Supply Voltage for Power-constrained Processors.
Proceedings of the 26th International Conference on VLSI Design and 12th International Conference on Embedded Systems, 2013

High throughput, low latency, memory optimized 64K point FFT architecture using novel radix-4 butterfly unit.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

Virtual Machine Placement Optimization Supporting Performance SLAs.
Proceedings of the IEEE 5th International Conference on Cloud Computing Technology and Science, 2013

Elastic Resources Framework in IaaS, Preserving Performance SLAs.
Proceedings of the 2013 IEEE Sixth International Conference on Cloud Computing, Santa Clara, CA, USA, June 28, 2013

2012
Resource Usage Monitoring in Clouds.
Proceedings of the 13th ACM/IEEE International Conference on Grid Computing, 2012

2011
Data Flow Graph Partitioning Algorithms and Their Evaluations for Optimal Spatio-temporal Computation on a Coarse Grain Reconfigurable Architecture.
IPSJ Trans. Syst. LSI Des. Methodol., 2011

A Method for Flexible Reduction over Binary Fields using a Field Multiplier.
Proceedings of the SECRYPT 2011 - Proceedings of the International Conference on Security and Cryptography, Seville, Spain, 18, 2011

USHA: Unified software and hardware architecture for video decoding.
Proceedings of the IEEE 9th Symposium on Application Specific Processors, 2011

A Fully Pipelined Modular Multiple Precision Floating Point Multiplier with Vector Support.
Proceedings of the International Symposium on Electronic System Design, 2011

Accelerating Reduction for Enabling Fast Multiplication over Large Binary Fields.
Proceedings of the E-Business and Telecommunications - International Joint Conference, 2011

Interconnect-topology independent mapping algorithm for a Coarse Grained Reconfigurable Architecture.
Proceedings of the 2011 International Conference on Field-Programmable Technology, 2011

Dataflow Graph Partitioning for Optimal Spatio-Temporal Computation on a Coarse Grain Reconfigurable Architecture.
Proceedings of the Reconfigurable Computing: Architectures, Tools and Applications, 2011

2010
Enhancements for variable N-point streaming FFT/IFFT on REDEFINE, a runtime reconfigurable architecture.
Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, 2010

Design space exploration of systolic realization of QR factorization on a runtime reconfigurable platform.
Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, 2010

Accelerating Numerical Linear Algebra Kernels on a Scalable Run Time Reconfigurable Platform.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2010

Towards minimizing execution delays on dynamically reconfigurable processors: a case study on REDEFINE.
Proceedings of the 2010 International Conference on Compilers, 2010

I/O Virtualization Architecture for Security.
Proceedings of the 10th IEEE International Conference on Computer and Information Technology, 2010

2009
REDEFINE: Runtime reconfigurable polymorphic ASIC.
ACM Trans. Embed. Comput. Syst., 2009

Generic routing rules and a scalable access enhancement for the Network-on-Chip RECONNECT.
Proceedings of the Annual IEEE International SoC Conference, SoCC 2009, 2009

RETHROTTLE: Execution throttling in the REDEFINE SoC architecture.
Proceedings of the 2009 International Conference on Embedded Computer Systems: Architectures, 2009

High-throughput flexible constraint length Viterbi decoders on de Bruijn, shuffle-exchange and butterfly connected architectures.
Proceedings of the 2009 International Conference on Embedded Computer Systems: Architectures, 2009

Architecture of Run-Time Reconfigurable Channel Decoder.
Proceedings of IEEE International Conference on Communications, 2009

I/O Device Virtualization in the Multi-core era, a QoS Perspective.
Proceedings of the Workshops at the Grid and Pervasive Computing Conference, 2009

Streaming FFT on REDEFINE-v2: an application-architecture design space exploration.
Proceedings of the 2009 International Conference on Compilers, 2009

An Input Triggered Polymorphic ASIC for H.264 Decoding.
Proceedings of the 20th IEEE International Conference on Application-Specific Systems, 2009

Compiling Techniques for Coarse Grained Runtime Reconfigurable Architectures.
Proceedings of the Reconfigurable Computing: Architectures, 2009

2008
On the effectiveness of phase based regression models to trade power and performance using dynamic processor adaptation.
J. Syst. Archit., 2008

Realizing a flexible constraint length Viterbi decoder for software radio on a de Bruijn interconnection network.
Proceedings of the 2008 IEEE International Symposium on System-on-Chip, 2008

Reconfigurable Viterbi decoder on mesh connected multiprocessor architecture.
Proceedings of the 19th IEEE International Conference on Application-Specific Systems, 2008

RECONNECT: A NoC for polymorphic ASICs using a low overhead single cycle router.
Proceedings of the 19th IEEE International Conference on Application-Specific Systems, 2008

Synthesis of application accelerators on Runtime Reconfigurable Hardware.
Proceedings of the 19th IEEE International Conference on Application-Specific Systems, 2008

Architecture of a polymorphic ASIC for interoperability across multi-mode H.264 decoders.
Proceedings of the 19th IEEE International Conference on Application-Specific Systems, 2008

2007
Low-Power Hierarchical Scan Test for Multiple Clock Domains.
J. Low Power Electron., 2007

REDEFINE: Architecture of a SoC Fabric for Runtime Composition of Computation Structures.
Proceedings of the FPL 2007, 2007

Program Phase Directed Dynamic Cache Way Reconfiguration for Power Efficiency.
Proceedings of the 12th Conference on Asia South Pacific Design Automation, 2007

2006
Instruction Reuse in SPEC, media and packet processing benchmarks: A comparative study of power, performance and related microarchitectural optimizations.
J. Embed. Comput., 2006

Molecular Caches: A caching structure for dynamic creation of application-specific Heterogeneous cache regions.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

Framework for Enabling Highly Available Distributed Applications for Utility Computing.
Proceedings of the Parallel and Distributed Processing and Applications, 2006

On the Implementation of a Streaming Video over Peer to Peer network using Middleware Components.
Proceedings of the Fifth International Conference on Networking and the International Conference on Systems (ICN / ICONS / MCL 2006), 2006

Efficient Key Management and Distribution for MANET.
Proceedings of IEEE International Conference on Communications, 2006

A Cost Effective Pipelined Divider for Double Precision Floating Point Number.
Proceedings of the 2006 IEEE International Conference on Application-Specific Systems, 2006

High Performance VLSI Architecture Design for H.264 CAVLC Decoder.
Proceedings of the 2006 IEEE International Conference on Application-Specific Systems, 2006

A Framework for Measurement of End-To-End Qos Requirements in Loosely Coupled Systems.
Proceedings of the 20th International Conference on Advanced Information Networking and Applications (AINA 2006), 2006

Towards Self-Composing, Prioritized and Consequential Services.
Proceedings of the 2006 IEEE International Conference on Services Computing (SCC 2006), 2006

2005
Functional and architectural adaptation in pervasive computing environments.
Proceedings of the 3rd International Workshop on Middleware for Pervasive and Ad-hoc Computing (MPAC 2005), held at the ACM/IFIP/USENIX 6th International Middleware Conference, November 28, 2005

A low power and low cost scan test architecture for multi-clock domain SoCs using virtual divide and conquer.
Proceedings of the Proceedings 2005 IEEE International Test Conference, 2005

A Framework for QoS Adaptive Grid Meta Scheduling.
Proceedings of the 16th International Workshop on Database and Expert Systems Applications (DEXA 2005), 2005

Throughput Driven, Highly Available Streaming Stored Playback Video Service over a Peer-to-Peer Network.
Proceedings of the 19th International Conference on Advanced Information Networking and Applications (AINA 2005), 2005

2004
On the Correctness of Program Execution When Cache Coherence Is Maintained Locally at Data-Sharing Boundaries in Distributed Shared Memory Multiprocessors.
Int. J. Parallel Program., 2004

On the effectiveness of prefetching and reuse in reducing L1 data cache traffic: a case study of Snort.
Proceedings of the 3rd Workshop on Memory Performance Issues, 2004

Streaming stored playback video over a peer-to-peer network.
Proceedings of IEEE International Conference on Communications, 2004

An Architectural View of the Entities Required for Execution of Task in Pervasive Space.
Proceedings of the 10th IEEE International Workshop on Future Trends of Distributed Computing Systems (FTDCS 2004), 2004

A framework for resource discovery in pervasive computing for mobile aware task execution.
Proceedings of the First Conference on Computing Frontiers, 2004

Power-performance trade-off using pipeline delays.
Proceedings of the 2004 Conference on Asia South Pacific Design Automation: Electronic Design and Solution Fair 2004, 2004

Exploiting program execution phases to trade power and performance for media workload.
Proceedings of the 2004 Conference on Asia South Pacific Design Automation: Electronic Design and Solution Fair 2004, 2004

Can Streaming Of Stored Playback Video Be Supported On Peer to Peer Infrastructure?
Proceedings of the 18th International Conference on Advanced Information Networking and Applications (AINA 2004), 2004

2003
On the Effectiveness of Flow Aggregation in Improving Instruction Reuse in Network Processing Applications.
Int. J. Parallel Program., 2003

Multivoltage scheduling with voltage-partitioned variable storage.
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

Traffic Profiling for Efficient Network Resource Utilization.
Proceedings of the International Conference on Internet Computing, 2003

Enhancing Speedup in Network Processing Applications by Exploiting Instruction Reuse with Flow Aggregation.
Proceedings of the 2003 Design, 2003

A complexity effective communication model for behavioral modeling of signal processing applications.
Proceedings of the 40th Design Automation Conference, 2003

Simultaneous MultiStreaming for Complexity-Effective VLIW Architectures.
Proceedings of the Advances in Computer Systems Architecture, 2003

Enhancing Speedup in Network Processing Applications by Exploiting Instruction Reuse with Flow Aggregation.
Proceedings of the Embedded Software for SoC, 2003

2002
Multithreaded Architectural Support for Speculative Trace Scheduling in VLIW Processors.
Proceedings of the 15th Annual Symposium on Integrated Circuits and Systems Design, 2002

On the Benefits of Speculative Trace Scheduling in VLIW Processors.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2002

Speculative Trace Scheduling in VLIW Processors.
Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002

Enforcing Cache Coherence at Data Sharing Boundaries without Global Control: A Hardware-Software Approach (Research Note).
Proceedings of the Euro-Par 2002, 2002

2001
ReDeEm_RTL: A Software Tool for Customizing Soft Cells for Embedded Applications.
Proceedings of the 14th International Conference on VLSI Design (VLSI Design 2001), 2001

Architecture of Reconfigurable a Low Power Gigabit AT Switch.
Proceedings of the 14th International Conference on VLSI Design (VLSI Design 2001), 2001

Content adaptive motion estimation for mobile video encoders.
Proceedings of the 2001 International Symposium on Circuits and Systems, 2001

2000
Reconfigurable Filter Coprocessor Architecture for DSP Applications.
J. VLSI Signal Process., 2000

Harmony - An Architecture for Providing Quality of Service in Mobile Computing Environments.
J. Interconnect. Networks, 2000

Controller redesign based clock and register power minimization.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2000

Performance evaluation of multithreaded architectures for media processing applications.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2000

Design Space Exploration for Orividing QoS Within the Harmony Framework.
Proceedings of the 2000 IEEE International Conference on Multimedia and Expo, 2000

Power minimization using control generated clocks.
Proceedings of the 37th Conference on Design Automation, 2000

1999
Architectural Synthesis of Computational Engines for Subband Adaptive Filtering.
J. VLSI Signal Process., 1999

A computational engine for multirate FIR digital filtering.
Signal Process., 1999

Synthesis of ASIPs for DSP algorithms.
Integr., 1999

Synthesis of Configurable Architectures for DSP Algorithms.
Proceedings of the 12th International Conference on VLSI Design (VLSI Design 1999), 1999

Automatic Generation of Tree Multipliers Using Placement-Driven Netlists.
Proceedings of the IEEE International Conference On Computer Design, 1999

Harmony - A Framework for Providing Quality of Service in Wireless Mobile Computing Environment.
Proceedings of the High Performance Computing, 1999

1998
Arbitrary Precision Arithmetic - SIMD Style.
Proceedings of the 11th International Conference on VLSI Design (VLSI Design 1991), 1998

1997
Signal compression through spatial frequency-based motion estimation.
Integr., 1997

Modeling multi-threaded architectures in PAMELA for real-time high performance applications.
Proceedings of the Fourth International on High-Performance Computing, 1997

An asynchronous architecture for digital signal processors.
Proceedings of the European Design and Test Conference, 1997

1996
Spatial frequency based motion estimation for image sequence compression.
Proceedings of the 3rd International Conference on High Performance Computing, 1996

1995
Design and realization of high-performance wave-pipelined 8×8 b multiplier in CMOS technology.
IEEE Trans. Very Large Scale Integr. Syst., 1995

Wave pipelined architecture folding: a method to achieve low power and low area.
Proceedings of the 8th International Conference on VLSI Design (VLSI Design 1995), 1995

1994
Geometric Design Rule Check of VLSI Layouts in Mesh Connected Processors.
VLSI Design, 1994

Geometric Design Rule Check of VLSI Layouts in Distributed Computing Environment.
VLSI Design, 1994

A Methodology for Architecture Synthesis of Cascaded IIR Filters on TLU FPGAs.
Proceedings of the Seventh International Conference on VLSI Design, 1994

High Speed Digital Filtering on SRAM-Based FPGAs.
Proceedings of the Seventh International Conference on VLSI Design, 1994

A 600MHz Half-Bit Level Pipelined Multiplier Macrocell.
Proceedings of the Seventh International Conference on VLSI Design, 1994

TWTXBB: A Low Latency, High Throughput Multiplier Architecture Using a New 4 --> 2 Compressor.
Proceedings of the Seventh International Conference on VLSI Design, 1994

1993
NPCPL: Normal Process Complementary Pass Transistor Logic for Low Latency, High Throughput Designs.
Proceedings of the Sixth International Conference on VLSI Design, 1993

A Parallel Progressive Refinement Image Rendering Algorithm on a Scalable Multithreaded VLSI Processor Array.
Proceedings of the 1993 International Conference on Parallel Processing, 1993

A 400 MHz Wave-Pipelined 8 X 8-Bit Multiplier in CMOS Technology.
Proceedings of the Proceedings 1993 International Conference on Computer Design: VLSI in Computers & Processors, 1993

Architectural Synthesis of Performance-Driven Multipliers with Accumulator Interleaving.
Proceedings of the 30th Design Automation Conference. Dallas, 1993

1990
VOXEL based modeling and rendering irregular solids.
Microprocessing and Microprogramming, 1990

Quasi dynamic approach to layout compaction.
Microprocessing and Microprogramming, 1990

1989
Special Purpose Architecture for Accelerating Bitmap DRC.
Proceedings of the 26th ACM/IEEE Design Automation Conference, 1989

1986
Dual quadtree representation for VLSI designs.
Proceedings of the 23rd ACM/IEEE Design Automation Conference. Las Vegas, 1986


  Loading...