Ravi R. Iyer

Orcid: 0000-0001-5383-9561

  • Intel Labs, Hillsboro, OR, USA

According to our database1, Ravi R. Iyer authored at least 142 papers between 1997 and 2024.

Collaborative distances:


IEEE Fellow

IEEE Fellow 2015, "For contributions to computer architecture and cache/memory systems".




In proceedings 
PhD thesis 


Online presence:

On csauthors.net:


LLaVaOLMoBitnet1B: Ternary LLM goes Multimodal!
CoRR, 2024

RAPID: Enabling fast online policy learning in dynamic public cloud environments.
Neurocomputing, November, 2023

Eidetic: An In-Memory Matrix Multiplication Accelerator for Neural Networks.
IEEE Trans. Computers, June, 2023

Intent-Driven Orchestration: Enforcing Service Level Objectives for Cloud Native Deployments.
SN Comput. Sci., May, 2023

Quantization for Bayesian Deep Learning: Low-Precision Characterization and Robustness.
Proceedings of the IEEE International Symposium on Workload Characterization, 2023

Mem-Rec: Memory Efficient Recommendation System using Alternative Representation.
Proceedings of the Asian Conference on Machine Learning, 2023

Streaming Encoding Algorithms for Scalable Hyperdimensional Computing.
CoRR, 2022

Evolving Zero Cost Proxies For Neural Architecture Scoring.
CoRR, 2022

EZNAS: Evolving Zero-Cost Proxies For Neural Architecture Scoring.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

DPM-NFV: Dynamic Power Management Framework for 5G User Plane Function using Bayesian Optimization.
Proceedings of the IEEE Global Communications Conference, 2022

Advances in Microprocessor Cache Architectures Over the Last 25 Years.
IEEE Micro, 2021

Improving Robustness and Efficiency in Active Learning with Contrastive Loss.
CoRR, 2021

Mitigating Sampling Bias and Improving Robustness in Active Learning.
CoRR, 2021

RHNAS: Realizable Hardware and Neural Architecture Search.
CoRR, 2021

Cache Compression with Efficient in-SRAM Data Comparison.
Proceedings of the IEEE International Conference on Networking, Architecture and Storage, 2021

E2E Visual Analytics: Achieving >10X Edge/Cloud Optimizations.
Proceedings of the IEEE International Conference on Networking, Architecture and Storage, 2021

Trends and Opportunities for SRAM Based In-Memory and Near-Memory Computation.
Proceedings of the 22nd International Symposium on Quality Electronic Design, 2021

Compute-Capable Block RAMs for Efficient Deep Learning Acceleration on FPGAs.
Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021

A 93 TOPS/Watt Near-Memory Reconfigurable SAD Accelerator for HEVC/AV1/JEM Encoding.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

RLDRM: Closed Loop Dynamic Cache Allocation with Deep Reinforcement Learning for Network Function Virtualization.
Proceedings of the 6th IEEE Conference on Network Softwarization, 2020

Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks.
IEEE Micro, 2019

Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

A Systematic and Realistic Network-on-Chip Traffic Modeling and Generation Technique for Emerging Many-Core Systems.
IEEE Trans. Multi Scale Comput. Syst., 2018

QoS Management on Heterogeneous Architecture for Multiprogrammed, Parallel, and Domain-Specific Applications.
IEEE Syst. J., 2017

Visual IoT: Ultra-Low-Power Processing Architectures and Implications.
IEEE Micro, 2017

Race-to-sleep + content caching + display caching: a recipe for energy-efficient video streaming on handhelds.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Visual IoT: Architectural Challenges and Opportunities; Toward a Self-Learning and Energy-Neutral IoT.
IEEE Micro, 2016

Exploiting Core Criticality for Enhanced GPU Performance.
Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, 2016

The convergence of physical/digital worlds: implications on workloads & architecture.
Proceedings of the 2016 IEEE International Symposium on Workload Characterization, 2016

Cache QoS: From concept to reality in the Intel® Xeon® processor E5-2600 v3 product family.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Heterogeneous Computing [Guest editors' introduction].
IEEE Micro, 2015

Adaptive Keyframe Selection for Video Summarization.
Proceedings of the 2015 IEEE Winter Conference on Applications of Computer Vision, 2015

Towards Distributed Video Summarization.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

VIP: virtualizing IP chains on handheld platforms.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Low-complexity HOG for efficient video saliency.
Proceedings of the 2015 IEEE International Conference on Image Processing, 2015

Domain knowledge based energy management in handhelds.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Design of a low power SoC testchip for wearables and IoTs.
Proceedings of the 2015 IEEE Hot Chips 27 Symposium (HCS), 2015

Platform-aware dynamic configuration support for efficient text processing on heterogeneous system.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

A Case Study on the Communication and Computation Behaviors of Real Applications in NoC-Based MPSoCs.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2014

QoS management on heterogeneous architecture for parallel applications.
Proceedings of the 32nd IEEE International Conference on Computer Design, 2014

A systematic network-on-chip traffic modeling and generation methodology.
Proceedings of the 2014 IEEE Asia Pacific Conference on Circuits and Systems, 2014

Reducing cache and TLB power by exploiting memory region and privilege level semantics.
J. Syst. Archit., 2013

Machine Learning-Based Runtime Scheduler for Mobile Offloading Framework.
Proceedings of the IEEE/ACM 6th International Conference on Utility and Cloud Computing, 2013

Orchestrated scheduling and prefetching for GPGPUs.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

OpenCL-Based Remote Offloading Framework for Trusted Mobile Cloud Computing.
Proceedings of the 19th IEEE International Conference on Parallel and Distributed Systems, 2013

OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2013

Dynamic QoS management for chip multiprocessors.
ACM Trans. Archit. Code Optim., 2012

SNARF: a social networking-inspired accelerator remoting framework.
Proceedings of the first edition of the MCC workshop on Mobile cloud computing, 2012

Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Reducing L1 caches power by exploiting software semantics.
Proceedings of the International Symposium on Low Power Electronics and Design, 2012

QuickIA: Exploring heterogeneous architectures on real prototypes.
Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

Exploiting Semantics of Virtual Memory to Improve the Efficiency of the On-Chip Memory System.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

PCASA: Probabilistic control-adjusted Selective Allocation for shared caches.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

Cache revive: architecting volatile STT-RAM caches for enhanced performance in CMPs.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

Accelerator-rich architectures: Implications, opportunities and challenges.
Proceedings of the 17th Asia and South Pacific Design Automation Conference, 2012

Optimizing datacenter power with memory system levers for guaranteed quality-of-service.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Efficient interaction between OS and architecture in heterogeneous platforms.
ACM SIGOPS Oper. Syst. Rev., 2011

CHOP: Integrating DRAM Caches for CMP Server Platforms.
IEEE Micro, 2011

CogniServe: Heterogeneous Server Architecture for Large-Scale Recognition.
IEEE Micro, 2011

RAFT: A router architecture with frequency tuning for on-chip networks.
J. Parallel Distributed Comput., 2011

CoQoS: Coordinating QoS-aware shared resources in NoC-based SoCs.
J. Parallel Distributed Comput., 2011

Low-Power, Resilient Interconnection with Orthogonal Latin Squares.
IEEE Des. Test Comput., 2011

HeteroScouts: hardware assist for OS scheduling in heterogeneous CMPs.
Proceedings of the SIGMETRICS 2011, 2011

ISIS: An accelerator for Sphinx speech recognition.
Proceedings of the IEEE 9th Symposium on Application Specific Processors, 2011

Keynote I: The era of heterogeneity: Are we prepared?
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011

Shared Resource Monitoring and Throughput Optimization in Cloud-Computing Datacenters.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Cost-effectively offering private buffers in SoCs and CMPs.
Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

ACCESS: Smart scheduling for asymmetric cache CMPs.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Buffer-integrated-Cache: a cost-effective SRAM architecture for handheld and embedded platforms.
Proceedings of the 48th Design Automation Conference, 2011

Template-based memory access engine for accelerators in SoCs.
Proceedings of the 16th Asia South Pacific Design Automation Conference, 2011

Quality of service shared cache management in chip multiprocessor architecture.
ACM Trans. Archit. Code Optim., 2010

PIRATE: QoS and performance management in CMP architectures.
SIGMETRICS Perform. Evaluation Rev., 2010

Performance characterization and acceleration of Optical Character Recognition on handheld platforms.
Proceedings of the 2010 IEEE International Symposium on Workload Characterization, 2010

CHOP: Adaptive filter-based DRAM caching for CMP server platforms.
Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

NCID: a non-inclusive cache, inclusive directory architecture for flexible and efficient cache hierarchies.
Proceedings of the 7th Conference on Computing Frontiers, 2010

Modeling virtual machine performance: challenges and approaches.
SIGMETRICS Perform. Evaluation Rev., 2009

Virtual platform architectures for resource metering in datacenters.
SIGMETRICS Perform. Evaluation Rev., 2009

VM<sup>3</sup>: Measuring, modeling and managing VM shared resources.
Comput. Networks, 2009

Hardware/Software Co-Simulation for Last Level Cache Exploration.
Proceedings of the International Conference on Networking, Architecture, and Storage, 2009

A case for dynamic frequency tuning in on-chip networks.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

CMPSched$im: Evaluating OS/CMP interaction on shared cache management.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

Performance characterization and optimization of mobile augmented reality on handheld platforms.
Proceedings of the 2009 IEEE International Symposium on Workload Characterization, 2009

Rate-based QoS techniques for cache/memory in CMP platforms.
Proceedings of the 23rd international conference on Supercomputing, 2009

Accelerating mobile augmented reality on a handheld platform.
Proceedings of the 27th International Conference on Computer Design, 2009

Using checksum to reduce power consumption of display systems for low-motion content.
Proceedings of the 27th International Conference on Computer Design, 2009

Optimizing communication and capacity in a 3D stacked reconfigurable cache hierarchy.
Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

HiPPAI: High Performance Portable Accelerator Interface for SoCs.
Proceedings of the 16th International Conference on High Performance Computing, 2009

Architecture Support for Improving Bulk Memory Copying and Initialization Performance.
Proceedings of the PACT 2009, 2009

Towards hybrid last level caches for chip-multiprocessors.
SIGARCH Comput. Archit. News, 2008

Towards modeling & analysis of consolidated CMP servers.
SIGARCH Comput. Archit. News, 2008

Characterization & analysis of a server consolidation benchmark.
Proceedings of the 4th International Conference on Virtual Execution Environments, 2008

Implications of cache asymmetry on server consolidation performance.
Proceedings of the 4th International Symposium on Workload Characterization (IISWC 2008), 2008

Performance and power optimization through data compression in Network-on-Chip architectures.
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

Achieving 10Gbps Network Processing: Are We There Yet?.
Proceedings of the High Performance Computing, 2008

To Snoop or Not to Snoop: Evaluation of Fine-Grain and Coarse-Grain Snoop Filtering Techniques.
Proceedings of the Euro-Par 2008, 2008

Editorial: Special Section on CMP Architectures.
IEEE Trans. Parallel Distributed Syst., 2007

Hardware Support for Accelerating Data Movement in Server Platform.
IEEE Trans. Computers, 2007

From chaos to QoS: case studies in CMP resource management.
SIGARCH Comput. Archit. News, 2007

Exploring Large-Scale CMP Architectures Using ManySim.
IEEE Micro, 2007

I/O processing in a virtualized platform: a simulation-driven approach.
Proceedings of the 3rd International Conference on Virtual Execution Environments, 2007

QoS policies and architecture for cache/memory in CMP platforms.
Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2007

A Framework for Providing Quality of Service in Chip Multi-Processors.
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

Understanding the Memory Performance of Data-Mining Workloads on Small, Medium, and Large-Scale CMPs Using Hardware-Software Co-simulation.
Proceedings of the 2007 IEEE International Symposium on Performance Analysis of Systems and Software, 2007

Accelerating Full-System Simulation through Characterizing and Predicting Operating System Performance.
Proceedings of the 2007 IEEE International Symposium on Performance Analysis of Systems and Software, 2007

Addressing Cache/Memory Overheads in Enterprise Java CMP Servers.
Proceedings of the IEEE 10th International Symposium on Workload Characterization, 2007

Exploring DRAM cache architectures for CMP server platforms.
Proceedings of the 25th International Conference on Computer Design, 2007

Design of a Dynamic Priority-Based Fast Path Architecture for On-Chip Interconnects.
Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects, 2007

Constraint-Aware Large-Scale CMP Cache Design.
Proceedings of the High Performance Computing, 2007

qTLB: Looking Inside the Look-Aside Buffer.
Proceedings of the High Performance Computing, 2007

CacheScouts: Fine-Grain Monitoring of Shared Caches in CMP Platforms.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

A Network Processor-Based, Content-Aware Switch.
IEEE Micro, 2006

Molecular Caches: A caching structure for dynamic creation of application-specific Heterogeneous cache regions.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

Architectural Characterization of VM Scaling on an SMP Machine.
Proceedings of the Frontiers of High Performance Computing and Networking, 2006

Exploring Small-Scale and Large-Scale CMP Architectures for Commercial Java Servers.
Proceedings of the 2006 IEEE International Symposium on Workload Characterization, 2006

Receive Side Coalescing for Accelerating TCP/IP Processing.
Proceedings of the High Performance Computing, 2006

Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource.
Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006

Exploring the cache design space for large scale CMPs.
SIGARCH Comput. Archit. News, 2005

An Experimental Evaluation of the HP V-Class and SGI Origin 2000 Multiprocessors using Microbenchmarks and Scientific Applications.
Int. J. Parallel Program., 2005

Anatomy and Performance of SSL Processing.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Direct Cache Access for High Bandwidth Network I/O.
Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005

Performance characterization of iSCSI processing in a server platform.
Proceedings of the 24th IEEE International Performance Computing and Communications Conference, 2005

Hardware Support for Bulk Data Movement in Server Platforms.
Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005

Design and Implementation of a Content-Aware Switch Using a Network Processor.
Proceedings of the 13th Annual IEEE Symposium on High Performance Interconnects (HOTIC 2005), 2005

Optimal network processor topologies for efficient packet processing.
Proceedings of the Global Telecommunications Conference, 2005. GLOBECOM '05, St. Louis, Missouri, USA, 28 November, 2005

SpliceNP: a TCP splicer using a network processor.
Proceedings of the 2005 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2005

Characterization and Evaluation of Cache Hierarchies for Web Servers.
World Wide Web, 2004

TCP Onloading for Data Center Servers.
Computer, 2004

ASPEN: Towards Effective Simulation of Threads and Engines in Evolving Platforms.
Proceedings of the 12th International Workshop on Modeling, 2004

CQoS: a framework for enabling QoS in shared caches of CMP platforms.
Proceedings of the 18th Annual International Conference on Supercomputing, 2004

Architectural Characterization of an XML-Centric Commercial Server Workload.
Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004

Architectural Characterization of TCP/IP Packet Processing on the Pentium M Microprocessor.
Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004

On Modeling and Analyzing Cache Hierarchies using CASPER.
Proceedings of the 11th International Workshop on Modeling, 2003

Design and analysis of static memory management policies for CC-NUMA multiprocessors.
J. Syst. Archit., 2002

Comparing the Memory System Performance of DSS Workloads on the HP V-Class and SGI Origin 2000.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Exploring the Cache Design Space for Web Servers.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

Impact of CC-NUMA Memory Management Policies on the Application Performance of Multistage Switching Networks.
IEEE Trans. Parallel Distributed Syst., 2000

Design and Evaluation of a Switch Cache Architecture for CC-NUMA Multiprocessors.
IEEE Trans. Computers, 2000

Using Switch Directories to Speed Up Cache-to-Cache Transfers in CC-NUMA Multiprocessors.
Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

Comparing the memory system performance of the HP V-class and SGI Origin 2000 multiprocessors using microbenchmarks and scientific applications.
Proceedings of the 13th international conference on Supercomputing, 1999

Switch Cache: A Framework for Improving the Remote Memory Access Latency of CC-NUMA Multiprocessors.
Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, 1999

Impact of Switch Design on the Application Performance of Cache-Coherent Multiprocessors.
Proceedings of the 12th International Parallel Processing Symposium / 9th Symposium on Parallel and Distributed Processing (IPPS/SPDP '98), March 30, 1998

Performance of Multistage Bus Networks for a Distributed Shared Memory Multiprocessor.
IEEE Trans. Parallel Distributed Syst., 1997
