Laxmi N. Bhuyan

Orcid: 0000-0002-8759-0458

Affiliations:
  • University of California, Riverside, USA


According to our database1, Laxmi N. Bhuyan authored at least 252 papers between 1982 and 2023.

Collaborative distances:

Awards

ACM Fellow

ACM Fellow 2000, "For his significant contributions to the design and analysis of Interconnection Networks and Parallel Processing.".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
GreenMD: Energy-efficient Matrix Decomposition on Heterogeneous Multi-GPU Systems.
ACM Trans. Parallel Comput., June, 2023

Improving Energy Saving of One-Sided Matrix Decompositions on CPU-GPU Heterogeneous Systems.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

2022
Synergy: A SmartNIC Accelerated 5G Dataplane and Monitor for Mobility Prediction.
Proceedings of the 30th IEEE International Conference on Network Protocols, 2022

Cottage: Coordinated Time Budget Assignment for Latency, Quality and Power Optimization in Web Search.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

2021
PAVER: Locality Graph-Based Thread Block Scheduling for GPUs.
ACM Trans. Archit. Code Optim., 2021

pMACH: Power and Migration Aware Container scHeduling.
Proceedings of the 29th IEEE International Conference on Network Protocols, 2021

SmartWatch: accurate traffic analysis and flow-state tracking for intrusion prevention using SmartNICs.
Proceedings of the CoNEXT '21: The 17th International Conference on emerging Networking EXperiments and Technologies, Virtual Event, Munich, Germany, December 7, 2021

2020
Gemini: Learning to Manage CPU Power for Latency-Critical Search Engines.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Swan: a two-step power management for distributed search engines.
Proceedings of the ISLPED '20: ACM/IEEE International Symposium on Low Power Electronics and Design, 2020

SAOU: safe adaptive overclocking and undervolting for energy-efficient GPU computing.
Proceedings of the ISLPED '20: ACM/IEEE International Symposium on Low Power Electronics and Design, 2020

Slumber: static-power management for GPGPU register files.
Proceedings of the ISLPED '20: ACM/IEEE International Symposium on Low Power Electronics and Design, 2020

2019
P4NFV: P4 Enabled NFV Systems with SmartNICs.
Proceedings of the IEEE Conference on Network Function Virtualization and Software Defined Networks, 2019

GreenMM: energy efficient GPU matrix multiplication through undervolting.
Proceedings of the ACM International Conference on Supercomputing, 2019

Goldilocks: Adaptive Resource Provisioning in Containerized Data Centers.
Proceedings of the 39th IEEE International Conference on Distributed Computing Systems, 2019

μDPM: Dynamic Power Management for the Microsecond Era.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

DREAM: DistRibuted Energy-Aware traffic Management for Data Center Networks.
Proceedings of the Tenth ACM International Conference on Future Energy Systems, 2019

2018
Juggler: a dependence-aware task-based execution framework for GPUs.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Joint Server and Network Energy Saving in Data Centers for Latency-Sensitive Applications.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

CAMO: A novel cache management organization for GPGPUs.
Proceedings of the 23rd Asia and South Pacific Design Automation Conference, 2018

2017
Enabling Work-Efficiency for High Performance Vertex-Centric Graph Analytics on GPUs.
Proceedings of the Seventh Workshop on Irregular Applications: Architectures and Algorithms, 2017

Wireframe: supporting data-dependent parallelism through dependency graph execution in GPUs.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

TailCut: Power Reduction under Quality and Latency Constraints in Distributed Search Systems.
Proceedings of the 37th IEEE International Conference on Distributed Computing Systems, 2017

2016
Tumbler: An Effective Load-Balancing Technique for Multi-CPU Multicore Systems.
ACM Trans. Archit. Code Optim., 2016

GreenLA: green linear algebra software for GPU-accelerated heterogeneous computing.
Proceedings of the International Conference for High Performance Computing, 2016

DynSleep: Fine-grained Power Management for a Latency-Critical Data Center Application.
Proceedings of the 2016 International Symposium on Low Power Electronics and Design, 2016

Eliminating Intra-Warp Load Imbalance in Irregular Nested Patterns via Collaborative Task Engagement.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

CuMAS: Data Transfer Aware Multi-Application Scheduling for Shared GPUs.
Proceedings of the 2016 International Conference on Supercomputing, 2016

2015
Design and analysis of collaborative EPC and RAN caching for LTE mobile networks.
Comput. Networks, 2015

Efficient warp execution in presence of divergence with collaborative context collection.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

PeerWave: Exploiting Wavefront Parallelism on GPUs with Peer-SM Synchronization.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

A multicore vacation scheme for thermal-aware packet processing.
Proceedings of the 33rd IEEE International Conference on Computer Design, 2015

Scalable SIMD-Efficient Graph Processing on GPUs.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

Stadium Hashing: Scalable and Flexible Hashing on GPUs.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014
Lock contention aware thread migrations.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

LightPlay: Efficient Replay with GPUs.
Proceedings of the Languages and Compilers for Parallel Computing, 2014

Optimistic Parallelism on GPUs.
Proceedings of the Languages and Compilers for Parallel Computing, 2014

fAHRW<sup>+</sup>: Fairness-aware and locality-enhanced scheduling for multi-server systems.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

A scalable hash scheduler for decoding of multiple H.264/AVC streams on multi-core architecture.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014

An efficient dynamic scheduling scheme for H.264/AVC encoding on multi-core architecture.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014

CuSha: vertex-centric graph processing on GPUs.
Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

A paradigm shift in GP-GPU computing: task based execution of applications with dynamic data dependencies.
Proceedings of the DIDC'14, 2014

Thermal-aware vacation and rate adaptation for network packet processing.
Proceedings of the tenth ACM/IEEE symposium on Architectures for networking and communications systems, 2014

Shuffling: a framework for lock contention aware thread scheduling for multicore multiprocessor systems.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
ADAPT: A framework for coscheduling multithreaded programs.
ACM Trans. Archit. Code Optim., 2013

A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures.
ACM Trans. Archit. Code Optim., 2013

A hybrid shared memory heterogeneous execution platform for PCIe-based GPGPUs.
Proceedings of the 20th Annual International Conference on High Performance Computing, 2013

Shared memory heterogeneous computation on PCIe-supported platforms.
Proceedings of the 23rd International Conference on Field programmable Logic and Applications, 2013

Thermal prediction and scheduling of network applications on multicore processors.
Proceedings of the Symposium on Architecture for Networking and Communications Systems, 2013

2012
Maintaining Data Consistency in Structured P2P Systems.
IEEE Trans. Parallel Distributed Syst., 2012

An Efficient Parallelized L7-Filter Design for Multicore Servers.
IEEE/ACM Trans. Netw., 2012

Load-Balancing Multipath Switching System with Flow Slice.
IEEE Trans. Computers, 2012

Thread Tranquilizer: Dynamically reducing performance variation.
ACM Trans. Archit. Code Optim., 2012

Analyzing performance and power efficiency of network processing over 10 GbE.
J. Parallel Distributed Comput., 2012

Peer-to-peer indirect reciprocity via personal currency.
J. Parallel Distributed Comput., 2012

P2P consistency support for large-scale interactive applications.
Comput. Networks, 2012

Speculative parallelization on GPGPUs.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

Improving the throughput and delay performance of network processors by applying push model.
Proceedings of the 20th IEEE International Workshop on Quality of Service, 2012

An efficient dynamic multiple-candidate motion vector approach for GPU-based hierarchical motion estimation.
Proceedings of the 31st IEEE International Performance Computing and Communications Conference, 2012

An Adaptive Dynamic Scheduling Scheme for H.264/AVC Decoding on Multicore Architecture.
Proceedings of the 2012 IEEE International Conference on Multimedia and Expo, 2012

Traffic-aware power optimization for network applications on multicore servers.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

2011
A QoS aware multicore hash scheduler for network applications.
Proceedings of the INFOCOM 2011. 30th IEEE International Conference on Computer Communications, 2011

Thread reinforcer: Dynamically determining number of threads via OS level monitoring.
Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011

A new server I/O architecture for high speed networks.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

E-AHRW: An Energy-Efficient Adaptive Hash Scheduler for Stream Processing on Multi-core Servers.
Proceedings of the 2011 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), 2011

Predictive Model-Based Thermal Management for Network Applications.
Proceedings of the 2011 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), 2011

No More Backstabbing... A Faithful Scheduling Policy for Multithreaded Programs.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
Performance characterization of multi-thread and multi-core processors based XML application oriented networking systems.
J. Parallel Distributed Comput., 2010

Optimizing Throughput and Latency under Given Power Budget for Network Packet Processing.
Proceedings of the INFOCOM 2010. 29th IEEE International Conference on Computer Communications, 2010

A Balanced Consistency Maintenance Protocol for Structured P2P Systems.
Proceedings of the INFOCOM 2010. 29th IEEE International Conference on Computer Communications, 2010

Understanding Power Efficiency of TCP/IP Packet Processing over 10GbE.
Proceedings of the IEEE 18th Annual Symposium on High Performance Interconnects, 2010

Experience on Applying Push Model to Packet Processors in High Performance Routers.
Proceedings of the Global Communications Conference, 2010

A new IP lookup cache for high performance IP routers.
Proceedings of the 47th Design Automation Conference, 2010

LATA: a latency and throughput-aware packet processing system.
Proceedings of the 47th Design Automation Conference, 2010

A new TCB cache to efficiently manage TCP sessions for web servers.
Proceedings of the 2010 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2010

Power optimization for multimedia transcoding on multicore servers.
Proceedings of the 2010 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2010

2009
Editorial: EIC Farewell and New EIC Introduction.
IEEE Trans. Parallel Distributed Syst., 2009

Budget-Based Self-Optimized Incentive Search in Unstructured P2P Networks.
Proceedings of the INFOCOM 2009. 28th IEEE International Conference on Computer Communications, 2009

Performance characterization and cache-aware core scheduling in a virtualized multi-core server under 10GbE.
Proceedings of the 2009 IEEE International Symposium on Workload Characterization, 2009

A Hash-based Scalable IP lookup using Bloom and Fingerprint Filters.
Proceedings of the 17th annual IEEE International Conference on Network Protocols, 2009

Performance Measurement of an Integrated NIC Architecture with 10GbE.
Proceedings of the 17th IEEE Symposium on High Performance Interconnects, 2009

EINIC: an architecture for high bandwidth network I/O on multi-core processors.
Proceedings of the 2009 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2009

An adaptive hash-based multilayer scheduler for L7-filter on a highly threaded hierarchical multi-core server.
Proceedings of the 2009 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2009

2008
Editor's Note.
IEEE Trans. Parallel Distributed Syst., 2008

Ordered Round-Robin: An Efficient Sequence Preserving Packet Scheduler.
IEEE Trans. Computers, 2008

Fair link striping with FIFO delivery on heterogeneous channels.
Comput. Commun., 2008

The P2P war: Someone is monitoring your activities.
Comput. Networks, 2008

POND: The Power of Zone Overlapping in DHT Networks.
Proceedings of The 2008 IEEE International Conference on Networking, 2008

Performance Characterization of a Dual Quad-Core Based Application Oriented Networking System.
Proceedings of The 2008 IEEE International Conference on Networking, 2008

An effective pointer replication algorithm in P2P networks.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

PROD: Relayed file retrieving in overlay networks.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Cyber-Fraud is One Typo Away.
Proceedings of the INFOCOM 2008. 27th IEEE International Conference on Computer Communications, 2008

Quantum-Adaptive Scheduling for Multi-Core Network Processors.
Proceedings of the 28th IEEE International Conference on Distributed Computing Systems (ICDCS 2008), 2008

A Novel Service-Aware Message Scheduler for Cisco Application Oriented Networking Systems.
Proceedings of the 17th International Conference on Computer Communications and Networks, 2008

Intelligent Message Scheduling in Application Oriented Networking Systems.
Proceedings of IEEE International Conference on Communications, 2008

Revisiting the Cache Effect on Multicore Multithreaded Network Processors.
Proceedings of the 11th Euromicro Conference on Digital System Design: Architectures, 2008

Software techniques to improve virtualized I/O performance on multi-core systems.
Proceedings of the 2008 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2008

A scalable multithreaded L7-filter design for multi-core servers.
Proceedings of the 2008 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2008

2007
Hardware Support for Accelerating Data Movement in Server Platform.
IEEE Trans. Computers, 2007

Conserving network processor power consumption by exploiting traffic variability.
ACM Trans. Archit. Code Optim., 2007

Scalable and Decentralized Content-Aware Dispatching in Web Clusters.
Proceedings of the 26th IEEE International Performance Computing and Communications Conference, 2007

Adaptive Max-Min Fair Scheduling in Buffered Crossbar Switches Without Speedup.
Proceedings of the INFOCOM 2007. 26th IEEE International Conference on Computer Communications, 2007

Lexicographic Fairness in WDM Optical Cross-Connects.
Proceedings of the INFOCOM 2007. 26th IEEE International Conference on Computer Communications, 2007

Clustered K-Center: Effective Replica Placement in Peer-to-Peer Systems.
Proceedings of the Global Communications Conference, 2007

Program Mapping onto Network Processors by Recursive Bipartitioning and Refining.
Proceedings of the 44th Design Automation Conference, 2007

Flow-slice: a novel load-balancing scheme for multi-path switching systems.
Proceedings of the 2007 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2007

Compiling PCRE to FPGA for accelerating SNORT IDS.
Proceedings of the 2007 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2007

2006
Load Balancing in a Cluster-Based Web Server for Multimedia Applications.
IEEE Trans. Parallel Distributed Syst., 2006

Editorial: A Message from the New Editor-in-Chief.
IEEE Trans. Parallel Distributed Syst., 2006

Tulip: A New Hash Based Cooperative Web Caching Architecture.
J. Supercomput., 2006

A Network Processor-Based, Content-Aware Switch.
IEEE Micro, 2006

Application Oriented Networking (AON): Adding Intelligence to Next-Generation Internet Routers.
Proceedings of the Wireless Algorithms, 2006

Computing Real Time Jobs in P2P Networks.
Proceedings of the LCN 2006, 2006

Fair Scheduling over multiple servers with flow-dependent server rate.
Proceedings of the LCN 2006, 2006

Efficient server cooperation mechanism in content delivery network.
Proceedings of the 25th IEEE International Performance Computing and Communications Conference, 2006

Effective Load Balancing in P2P Systems.
Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 2006

2005
EaseCAM: An Energy and Storage Efficient TCAM-Based Router Architecture for IP Lookup.
IEEE Trans. Computers, 2005

Anatomy of UDP and M-VIA for cluster communication.
J. Parallel Distributed Comput., 2005

An Experimental Evaluation of the HP V-Class and SGI Origin 2000 Multiprocessors using Microbenchmarks and Scientific Applications.
Int. J. Parallel Program., 2005

Anatomy and Performance of SSL Processing.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

QoS Aware Job Scheduling in a Cluster-Based Web Server for Multimedia Applications.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Efficient file sharing strategy in DHT based P2P systems.
Proceedings of the 24th IEEE International Performance Computing and Communications Conference, 2005

An efficient packet scheduling algorithm in network processors.
Proceedings of the INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies, 2005

Hardware Support for Bulk Data Movement in Server Platforms.
Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005

On fair scheduling in heterogeneous link aggregated services.
Proceedings of the 14th International Conference On Computer Communications and Networks, 2005

Design and Implementation of a Content-Aware Switch Using a Network Processor.
Proceedings of the 13th Annual IEEE Symposium on High Performance Interconnects (HOTIC 2005), 2005

Performance Characterization of a 10-Gigabit Ethernet TOE.
Proceedings of the 13th Annual IEEE Symposium on High Performance Interconnects (HOTIC 2005), 2005

Enhancing Network Processor Simulation Speed with Statistical Input Sampling.
Proceedings of the High Performance Embedded Architectures and Compilers, 2005

Achieving fairness and throughput for best-effort traffic in input-queued crossbar switches.
Proceedings of the Global Telecommunications Conference, 2005. GLOBECOM '05, St. Louis, Missouri, USA, 28 November, 2005

Optimal network processor topologies for efficient packet processing.
Proceedings of the Global Telecommunications Conference, 2005. GLOBECOM '05, St. Louis, Missouri, USA, 28 November, 2005

Distributed packet processing in P2P networks.
Proceedings of the Global Telecommunications Conference, 2005. GLOBECOM '05, St. Louis, Missouri, USA, 28 November, 2005

QoS-aware object replica placement in CDNs.
Proceedings of the Global Telecommunications Conference, 2005. GLOBECOM '05, St. Louis, Missouri, USA, 28 November, 2005

Guaranteed smooth switch scheduling with low complexity.
Proceedings of the Global Telecommunications Conference, 2005. GLOBECOM '05, St. Louis, Missouri, USA, 28 November, 2005

Low power network processor design using clock gating.
Proceedings of the 42nd Design Automation Conference, 2005

SpliceNP: a TCP splicer using a network processor.
Proceedings of the 2005 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2005

2004
NePSim: A Network Processor Simulator with a Power Evaluation Framework.
IEEE Micro, 2004

Assertion Based Verification and Analysis of Network Processor Architectures.
Des. Autom. Embed. Syst., 2004

An Efficient and Robust Web Caching System.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Exploiting Client Cache: A Scalable and Efficient Approach to Build Large Web Cache.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Load Balancing of DNS-Based Distributed Web Server Systems with Page Caching.
Proceedings of the 10th International Conference on Parallel and Distributed Systems, 2004

An efficient scheduling algorithm for combined input-crosspoint-queued (CICQ) switches.
Proceedings of the Global Telecommunications Conference, 2004. GLOBECOM '04, Dallas, Texas, USA, 29 November, 2004

Scheduling real-time multimedia tasks in network processors.
Proceedings of the Global Telecommunications Conference, 2004. GLOBECOM '04, Dallas, Texas, USA, 29 November, 2004

Utilizing Formal Assertions for System Design of Network Processors.
Proceedings of the 2004 Design, 2004

2003
Shared memory multiprocessor architectures for software IP routers.
IEEE Trans. Parallel Distributed Syst., 2003

Switch MSHR: A Technique to Reduce Remote Read Memory Access Time in CC-NUMA Multiprocessors.
IEEE Trans. Computers, 2003

Deficit round-robin scheduling for input-queued switches.
IEEE J. Sel. Areas Commun., 2003

Fair Scheduling for Input Buffered Switches.
Clust. Comput., 2003

Load Sharing in a Transcoding Cluster.
Proceedings of the Distributed Computing, 2003

A Cluster-Based Active Router Architecture Supporting Video/Audio Stream Transcoding Service.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Architectural analysis and instruction-set optimization for design of network protocol processors.
Proceedings of the 1st IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2003

Power efficient encoding techniques for off-chip data buses.
Proceedings of the International Conference on Compilers, 2003

2002
Fair Scheduling in Internet Routers.
IEEE Trans. Computers, 2002

Design and analysis of static memory management policies for CC-NUMA multiprocessors.
J. Syst. Archit., 2002

Comparing the Memory System Performance of DSS Workloads on the HP V-Class and SGI Origin 2000.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Fair Scheduling and Buffer Management in Internet Routers.
Proceedings of the Proceedings IEEE INFOCOM 2002, 2002

2001
Execution-Driven Simulation of IP Router Architectures.
Proceedings of the IEEE International Symposium on Network Computing and Applications (NCA 2001), 2001

2000
Impact of CC-NUMA Memory Management Policies on the Application Performance of Multistage Switching Networks.
IEEE Trans. Parallel Distributed Syst., 2000

Design and Evaluation of a Switch Cache Architecture for CC-NUMA Multiprocessors.
IEEE Trans. Computers, 2000

Exploring the Switch Design Space in a CC-NUMA Multiprocessor Environment.
Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

Using Switch Directories to Speed Up Cache-to-Cache Transfers in CC-NUMA Multiprocessors.
Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

Hardware spatial forwarding for widely shared data.
Proceedings of the 14th international conference on Supercomputing, 2000

Hierarchical Simulation of a Multiprocessor Architecture.
Proceedings of the IEEE International Conference On Computer Design: VLSI In Computers & Processors, 2000

A wave-pipelined router architecture using ternary associative memory.
Proceedings of the 10th ACM Great Lakes Symposium on VLSI 2000, 2000

1999
An Efficient Tree Cache Coherence Protocol for Distributed Shared Memory Multiprocessors.
IEEE Trans. Computers, 1999

A Flexible Clustering and Scheduling Scheme for Efficient Parallel Computation.
Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999

Comparing the memory system performance of the HP V-class and SGI Origin 2000 multiprocessors using microbenchmarks and scientific applications.
Proceedings of the 13th international conference on Supercomputing, 1999

The Impact of Link Arbitration on Switch Performance.
Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, 1999

Switch Cache: A Framework for Improving the Remote Memory Access Latency of CC-NUMA Multiprocessors.
Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, 1999

1998
Impact of Switch Design on the Application Performance of Cache-Coherent Multiprocessors.
Proceedings of the 12th International Parallel Processing Symposium / 9th Symposium on Parallel and Distributed Processing (IPPS/SPDP '98), March 30, 1998

Circular buffered switch design with wormhole routing and virtual channels.
Proceedings of the International Conference on Computer Design: VLSI in Computers and Processors, 1998

1997
Performance of Multistage Bus Networks for a Distributed Shared Memory Multiprocessor.
IEEE Trans. Parallel Distributed Syst., 1997

Evaluation of multi-queue buffered multistage interconnection networks under uniform and non-uniform traffic patterns.
Int. J. Syst. Sci., 1997

1996
Adaptive System-Level Diagnosis for Hypercube Multiprocessors.
IEEE Trans. Computers, 1996

Equalization of Digital Communication Channen Using Hartley-Neural Technique.
Proceedings of the Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, 1996

Evaluating Virtual Channels for Cache-Coherent Shared-Memory Multiprocessors.
Proceedings of the 10th international conference on Supercomputing, 1996

An Efficient Hybrid Cache Coherence Protocol for Shared Memory Multiprocessors.
Proceedings of the 1996 International Conference on Parallel Processing, 1996

1995
Subcube Fault Tolerance in Hypercube Multiprocessors.
IEEE Trans. Computers, 1995

A Combinatorial Analysis of Subcube Reliability in Hybercubes.
IEEE Trans. Computers, 1995

Mapping Molecular Dynamics Computations on to Hypercubes.
Parallel Comput., 1995

High-performance computer architecture.
Future Gener. Comput. Syst., 1995

Accurate communication models for task scheduling in multicomputers.
Proceedings of the Seventh IEEE Symposium on Parallel and Distributed Processing, 1995

Fault-tolerant sorting in SIMD hypercubes.
Proceedings of IPPS '95, 1995

A Submesh Allocation Scheme for Mesh-Connected Multiprocessor Systems.
Proceedings of the 1995 International Conference on Parallel Processing, 1995

Partitioning an Arbitrary Multicomputer Architecture.
Proceedings of the 1995 International Conference on Parallel Processing, 1995

A dynamic cache sub-block design to reduce false sharing.
Proceedings of the 1995 International Conference on Computer Design (ICCD '95), 1995

valuation of multi-queue buffered multistage interconnection networks under uniform and nonuniform traffic patterns.
Proceedings of the 4th International Conference on Computer Communications and Networks (ICCCN '95), 1995

1994
Finite Buffer Analysis of Multistage Interconnection Networks.
IEEE Trans. Computers, 1994

A divide-and-conquer methodology for system-level diagnosis of processor arrays.
Proceedings of the Sixth IEEE Symposium on Parallel and Distributed Processing, 1994

Efficient and scalable cache coherence schemes for shared memory hypercube multiprocessors.
Proceedings of the Proceedings Supercomputing '94, 1994

A Distributed Cache Coherence Protocol for Hypercube Multiprocessors.
Proceedings of the 1994 International Conference on Parallel Processing, 1994

Performance and Reliability of the Multistage Bus Network.
Proceedings of the 1994 International Conference on Parallel Processing, 1994

1993
An Availability Model for MIN-Based Multiprocessors.
IEEE Trans. Parallel Distributed Syst., 1993

Design and Analysis of Cache Coherent Multistage Interconnection Networks.
IEEE Trans. Computers, 1993

Efficient Mapping of Applications on Cache Based Multiprocessors.
J. Parallel Distributed Comput., 1993

An Adaptive System-Level Diagnosis Approach for Hypercube Multiprocessors.
Proceedings of the Fifth IEEE Symposium on Parallel and Distributed Processing, 1993

Parallel Algorithms for Hypercube Allocation.
Proceedings of the Seventh International Parallel Processing Symposium, 1993

Parallel FFT Algorithms for Cache Based Shared Memory Multiprocessors.
Proceedings of the 1993 International Conference on Parallel Processing, 1993

An Adaptive System-Level Diagnosis Approach for Mesh Connected Multiprocessors.
Proceedings of the 1993 International Conference on Parallel Processing, 1993

An Adaptive Submesh Allocation Strategy For Two-Dimensional Mesh Connected Systems.
Proceedings of the 1993 International Conference on Parallel Processing, 1993

Fault Tolerant Subcube Allocation in Hypercubes.
Proceedings of the 1993 International Conference on Parallel Processing, 1993

1992
Design of an Adaptive Cache Coherence Protocol for Large Scale Multiprocessors.
IEEE Trans. Parallel Distributed Syst., 1992

Cache Coherent Shared Memory Hypercube Multiprocessors.
Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing, 1992

Mapping Applications onto a Cache Coherent Multiprocessor.
Proceedings of the Proceedings Supercomputing '92, 1992

A Formal Specification and Verification Technique for Cache Coherence Protocols.
Proceedings of the 1992 International Conference on Parallel Processing, 1992

Extending Multistage Interconnection Networks for Multitasking.
Proceedings of the 1992 International Conference on Parallel Processing, 1992

1991
Analysis of Packet-Switched Multiple-Bus Multiprocessor Systems.
IEEE Trans. Computers, 1991

MVAMIN: Mean Value Analysis Algorithms for Multistage Interconnection Networks.
J. Parallel Distributed Comput., 1991

Multistage bus network (MBN): an interconnection network for cache coherent multiprocessors.
Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing, 1991

Performance Analysis of Layered Task Graphs.
Proceedings of the International Conference on Parallel Processing, 1991

Performance Evaluation of Multistage Interconnection Networks with Finite Buffers.
Proceedings of the International Conference on Parallel Processing, 1991

Load balancing with network cooperation.
Proceedings of the 10th International Conference on Distributed Computing Systems (ICDCS 1991), 1991

1990
Performance Evaluation of a Dataflow Architecture.
IEEE Trans. Computers, 1990

Performance of Multiple-Bus Interconnections for Multiprocessors.
J. Parallel Distributed Comput., 1990

Dependability Modeling for Multiprocessors.
Computer, 1990

An adaptive cache coherence scheme for hierarchical shared-memory multiprocessors.
Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing, 1990

Approximate Analysis of Multiprocessing Task Graphs.
Proceedings of the 1990 International Conference on Parallel Processing, 1990

Availability evaluation of MIN-connected multiprocessors using decomposition technique.
Proceedings of the 20th International Symposium on Fault-Tolerant Computing, 1990

1989
Analysis and Comparison of Cache Coherence Protocols for a Packet-Switched Multiprocessor.
IEEE Trans. Computers, 1989

Approximate Analysis of Single and Multiple Ring Networks.
IEEE Trans. Computers, 1989

Arbiter designs for multiprocessor interconnection networks.
Microprocessing and Microprogramming, 1989

Performance of Multiprocessor Interconnection Networks.
Computer, 1989

Analysis of Computation-Communication Issues in Dynamic Dataflow Architectures.
Proceedings of the 16th Annual International Symposium on Computer Architecture. Jerusalem, 1989

Analysis of MIN Based Multiprocessors with Private Cache Memories.
Proceedings of the International Conference on Parallel Processing, 1989

From Interconnection Network To Task Level Analysis.
Proceedings of the International Conference on Parallel Processing, 1989

A systolic approach to multistage interconnection network design.
Proceedings of the Computer Design: VLSI in Computers and Processors, 1989

1988
VLSI layout of binary tree structures.
Integr., 1988

Approximate Analysis of Task Graphs for Parallel Processing Systems.
Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems, 1988

Design and analysis of multiple token ring networks.
Proceedings of the Seventh Annual Joint Conference of the IEEE Computer and Communcations Societies. Networks: Evolution or Revolution?, 1988

A Queueing Network Model for a Cache Coherence Protocol on Multiple-bus Multiprocessors.
Proceedings of the International Conference on Parallel Processing, 1988

1987
Analysis of Interconnection Networks with Different Arbiter Designs.
J. Parallel Distributed Comput., 1987

Dependability evaluation of interconnection networks.
Inf. Sci., 1987

Guest Editor's Introduction Interconnection Networks for Parallel and Distributed Processing.
Computer, 1987

Performance Analysis of Packet-Switched Multiple-Bus Multiprocessor Systems.
Proceedings of the 8th IEEE Real-Time Systems Symposium (RTSS '87), 1987

Analytical Modeling and Architectural Modifications of a Dataflow Computer.
Proceedings of the 14th Annual International Symposium on Computer Architecture. Pittsburgh, 1987

Design and Analysis of a Decentralized Multiple-Bus Multiprocessor.
Proceedings of the International Conference on Parallel Processing, 1987

Performance Analysis of the MIT Tagged Token Dataflow Architecture.
Proceedings of the International Conference on Parallel Processing, 1987

1986
Dependability Evaluation of Multicomputer Networks.
Proceedings of the International Conference on Parallel Processing, 1986

Effect of Arbitration Policies on the Performance of Interconnection Networks.
Proceedings of the International Conference on Parallel Processing, 1986

1985
Bandwidth Availability of Multiple-Bus Multiprocessors.
IEEE Trans. Computers, 1985

An Analysis of Processor-Memory Interconnection Networks.
IEEE Trans. Computers, 1985

Computation Availability of Multiple-Bus Multiprocessors.
Proceedings of the International Conference on Parallel Processing, 1985

Reliability Simulation of Multiprocessor Systems.
Proceedings of the International Conference on Parallel Processing, 1985

Introduction to session R2 (session overiew): advanced computer architectures.
Proceedings of the 13th ACM Annual Conference on Computer Science, 1985

1984
Generalized Hypercube and Hyperbus Structures for a Computer Network.
IEEE Trans. Computers, 1984

On the Performance of Loosely Coupled Multiprocessors.
Proceedings of the 11th Annual Symposium on Computer Architecture, 1984

1983
Performance Analysis of FFT Algorithms on Multiprocessor Systems.
IEEE Trans. Software Eng., 1983

Design and Performance of Generalized Interconnection Networks.
IEEE Trans. Computers, 1983

An Interference Analysis of Interconnection Networks.
Proceedings of the International Conference on Parallel Processing, 1983

1982
On the Generalized Binary System.
IEEE Trans. Computers, 1982

A general class of processor interconnection strategies.
Proceedings of the 9th International Symposium on Computer Architecture (ISCA 1982), 1982

Design and performance of a general class of interconnection networks.
Proceedings of the International Conference on Parallel Processing, 1982

VLSI Performance of Multistage Interconnection Network Using 4*4 Switches.
Proceedings of the Proceedings of the 3rd International Conference on Distributed Computing Systems, 1982

Applications of SIMD computers in signal processing.
Proceedings of the American Federation of Information Processing Societies: 1982 National Computer Conference, 1982


  Loading...