2025
FPGA-based accelerator for adaptive banded event alignment in nanopore sequencing data analysis.
BMC Bioinform., December, 2025
Pirate: No Compromise Low-Bandwidth VR Streaming for Edge Devices.
Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025
Load and MLP-Aware Thread Orchestration for Recommendation Systems Inference on CPUs.
Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025
2024
Parallelization Strategies for DLRM Embedding Bag Operator on AMD CPUs.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
IEEE Micro, 2024
Salient Store: Enabling Smart Storage for Continuous Learning Edge Servers.
CoRR, 2024
Synergistic and Efficient Edge-Host Communication for Energy Harvesting Wireless Sensor Networks.
CoRR, 2024
Revisiting DNN Training for Intermittently Powered Energy Harvesting Micro Computers.
CoRR, 2024
GPU Cluster Scheduling for Network-Sensitive Deep Learning.
CoRR, 2024
Towards SLO-Compliant and Cost-Effective Serverless Computing on Emerging GPU Architectures.
Proceedings of the 25th International Middleware Conference, 2024
Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUs.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024
GameStreamSR: Enabling Neural-Augmented Game Streaming on Commodity Mobile Platforms.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024
Paldia: Enabling SLO-Compliant and Cost-Effective Serverless Computing on Heterogeneous Hardware.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
Foveated HDR: Efficient HDR Content Generation on Edge Devices Leveraging User's Visual Attention.
Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, 2024
Usas: A Sustainable Continuous-Learning' Framework for Edge Servers.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024
FAAStloop: Optimizing Loop-Based Applications for Serverless Computing.
Proceedings of the 2024 ACM Symposium on Cloud Computing, 2024
2023
Federated Learning with Spiking Neural Networks in Heterogeneous Systems.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2023
Optimizing CPU Performance for Recommendation Systems At-Scale.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023
EdgePC: Efficient Deep Learning Analytics for Point Clouds on Edge Devices.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023
Stash: A Comprehensive Stall-Centric Characterization of Public Cloud VMs for Distributed Deep Learning.
Proceedings of the 43rd IEEE International Conference on Distributed Computing Systems, 2023
2022
End-to-end Characterization of Game Streaming Applications on Mobile Platforms.
Proc. ACM Meas. Anal. Comput. Syst., 2022
Analysis of Distributed Deep Learning in the Cloud.
CoRR, 2022
Cocktail: A Multidimensional Optimization for Model Serving in Cloud.
Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, 2022
Skipper: Enabling efficient SNN training through activation-checkpointing and time-skipping.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022
Pushing Point Cloud Compression to the Edge.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022
Exploiting Frame Similarity for Efficient Inference on Edge Devices.
Proceedings of the 42nd IEEE International Conference on Distributed Computing Systems, 2022
Cypress: input size-sensitive container provisioning and request scheduling for serverless platforms.
Proceedings of the 13th Symposium on Cloud Computing, SoCC 2022, 2022
SandPiper: A Cost-Efficient Adaptive Framework for Online Recommender Systems.
Proceedings of the IEEE International Conference on Big Data, 2022
2021
Exploiting Activation based Gradient Output Sparsity to Accelerate Backpropagation in CNNs.
CoRR, 2021
Cocktail: Leveraging Ensemble Learning for Optimized Model Serving in Public Cloud.
CoRR, 2021
Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
HoloAR: On-the-fly Optimization of 3D Holographic Processing for Augmented Reality.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021
Gesture-SNN: Co-optimizing accuracy, latency and energy of SNNs for neuromorphic vision sensors.
Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2021
GYAN: Accelerating Bioinformatics Tools in Galaxy with GPU-Aware Computation Mapping.
,
,
,
,
,
,
,
,
,
,
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021
GSSA: A Resource Allocation Scheme Customized for 3D NAND SSDs.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021
Kraken: Adaptive Container Provisioning for Deploying Dynamic DAGs in Serverless Platforms.
Proceedings of the SoCC '21: ACM Symposium on Cloud Computing, 2021
CASH: A Credit Aware Scheduling for Public Cloud Platforms.
Proceedings of the 21st IEEE/ACM International Symposium on Cluster, 2021
2020
Fifer: Tackling Underutilization in the Serverless Era.
CoRR, 2020
Towards Designing a Self-Managed Machine Learning Inference Serving System inPublic Cloud.
CoRR, 2020
Selective Caching: Avoiding Performance Valleys in Massively Parallel Architectures.
Proceedings of the 28th Euromicro International Conference on Parallel, 2020
Fifer: Tackling Resource Underutilization in the Serverless Era.
Proceedings of the Middleware '20: 21st International Middleware Conference, 2020
Implications of Public Cloud Resource Heterogeneity for Inference Serving.
Proceedings of the WoSC@Middleware 2020: Proceedings of the 2020 Sixth International Workshop on Serverless Computing, 2020
Déjà View: Spatio-Temporal Compute Reuse for' Energy-Efficient 360° VR Video Streaming.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020
NEBULA: A Neuromorphic Spin-Based Ultra-Low Power Architecture for SNNs and ANNs.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020
Selective Event Processing for Energy Efficient Mobile Gaming with SNIP.
Proceedings of the IEEE International Symposium on Workload Characterization, 2020
Characterizing Bottlenecks in Scheduling Microservices on Serverless Platforms.
Proceedings of the 40th IEEE International Conference on Distributed Computing Systems, 2020
Multiverse: Dynamic VM Provisioning for Virtualized High Performance Computing Clusters.
Proceedings of the 20th IEEE/ACM International Symposium on Cluster, 2020
2019
Distilling the Essence of Raw Video to Reduce Memory Usage and Energy at Edge Devices.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019
CASH: compiler assisted hardware design for improving DRAM energy efficiency in CNN inference.
Proceedings of the International Symposium on Memory Systems, 2019
Opportunistic computing in GPU architectures.
Proceedings of the 46th International Symposium on Computer Architecture, 2019
Understanding Energy Efficiency in IoT App Executions.
Proceedings of the 39th IEEE International Conference on Distributed Computing Systems, 2019
Kube-Knots: Resource Harvesting through Dynamic Container Orchestration in GPU-based Datacenters.
Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019
SOML Read: Rethinking the Read Operation Granularity of 3D NAND SSDs.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019
Spock: Exploiting Serverless Functions for SLO and Cost Aware Resource Procurement in Public Cloud.
Proceedings of the 12th IEEE International Conference on Cloud Computing, 2019
2018
Stochastic Modeling and Optimization of Stragglers.
IEEE Trans. Cloud Comput., 2018
Performance and Power-Efficient Design of Dense Non-Volatile Cache in CMPs.
IEEE Trans. Computers, 2018
Quantifying Data Locality in Dynamic Parallelism in GPUs.
Proc. ACM Meas. Anal. Comput. Syst., 2018
Holistic Management of the GPGPU Memory Hierarchy to Manage Warp-level Latency Tolerance.
CoRR, 2018
A Learning-Guided Hierarchical Approach for Biomedical Image Segmentation.
Proceedings of the 31st IEEE International System-on-Chip Conference, 2018
CritICs Critiquing Criticality in Mobile Apps.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018
Tolerating Write Disturbance Errors in PCM: Experimental Characterization, Analysis, and Mechanisms.
Proceedings of the 26th IEEE International Symposium on Modeling, 2018
Content Popularity-Based Selective Replication for Read Redirection in SSDs.
Proceedings of the 26th IEEE International Symposium on Modeling, 2018
Hybrid-comp: A criticality-aware compressed last-level cache.
Proceedings of the 19th International Symposium on Quality Electronic Design, 2018
Reviving Zombie Pages on SSDs.
Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018
Parallelizing garbage collection with I/O to improve flash resource utilization.
Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, 2018
FLOSS: FLOw sensitive scheduling on mobile platforms.
Proceedings of the 55th Annual Design Automation Conference, 2018
The Curious Case of Container Orchestration and Scheduling in GPU-based Datacenters.
Proceedings of the ACM Symposium on Cloud Computing, 2018
2017
HL-PCM: MLC PCM Main Memory with Accelerated Read.
IEEE Trans. Parallel Distributed Syst., 2017
Optimizing energy consumption in GPUS through feedback-driven CTA scheduling.
Proceedings of the 25th High Performance Computing Symposium, Virginia Beach, VA, USA, April 23, 2017
A Study on Performance and Power Efficiency of Dense Non-Volatile Caches in Multi-Core Systems.
Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, Urbana-Champaign, IL, USA, June 05, 2017
Race-to-sleep + content caching + display caching: a recipe for energy-efficient video streaming on handhelds.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017
REMAP: a reliability/endurance mechanism for advancing PCM.
Proceedings of the International Symposium on Memory Systems, 2017
DEMM: A Dynamic Energy-Saving Mechanism for Multicore Memories.
Proceedings of the 25th IEEE International Symposium on Modeling, 2017
Quantifying the Potential Benefits of On-chip Near-Data Computing in Manycore Processors.
Proceedings of the 25th IEEE International Symposium on Modeling, 2017
Characterizing diverse handheld apps for customized hardware acceleration.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017
Phoenix: A Constraint-Aware Scheduler for Heterogeneous Datacenters.
Proceedings of the 37th IEEE International Conference on Distributed Computing Systems, 2017
A Scale-Out Enterprise Storage Architecture.
Proceedings of the 2017 IEEE International Conference on Computer Design, 2017
Leveraging value locality for efficient design of a hybrid cache in multicore processors.
Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design, 2017
Controlled Kernel Launch for Dynamic Parallelism in GPUs.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017
Exploring the Potential for Collaborative Data Compression and Hard-Error Tolerance in PCM Memories.
Proceedings of the 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2017
Co-training of Feature Extraction and Classification using Partitioned Convolutional Neural Networks.
Proceedings of the 54th Annual Design Automation Conference, 2017
Exploiting Intra-Request Slack to Improve SSD Performance.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017
2016
A Framework for Accelerating Bottlenecks in GPU Execution with Assist Warps.
CoRR, 2016
Exploiting Core Criticality for Enhanced GPU Performance.
Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, 2016
Exploring the potentials of parallel garbage collection in SSDs for enterprise storage systems.
Proceedings of the International Conference for High Performance Computing, 2016
OSCAR: Orchestrating STT-RAM cache traffic for heterogeneous CPU-GPU architectures.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016
Storage consolidation: Not always a panacea, but can we ease the pain?
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016
MLC PCM main memory with accelerated read.
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016
Boosting Access Parallelism to PCM-Based Main Memory.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016
Re-NUCA: A Practical NUCA Architecture for ReRAM Based Last-Level Caches.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016
Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016
μC-States: Fine-grained GPU Datapath Power Management.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016
2015
Guest Editorial: SBAC-PAD 2013.
Int. J. Parallel Program., 2015
Anatomy of GPU Memory System for Multi-Application Execution.
Proceedings of the 2015 International Symposium on Memory Systems, 2015
A case for core-assisted bottleneck acceleration in GPUs: enabling flexible data compression with assist warps.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015
VIP: virtualizing IP chains on handheld platforms.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015
Evaluating the Combined Impact of Node Architecture and Cloud Workload Characteristics on Network Traffic and Performance/Cost.
Proceedings of the 2015 IEEE International Symposium on Workload Characterization, 2015
Domain knowledge based energy management in handhelds.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015
Storage Consolidation on SSDs: Not Always a Panacea, but Can We Ease the Pain?
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015
Exploiting Staleness for Approximating Loads on CMPs.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015
Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015
2014
GemDroid: a framework to evaluate mobile platforms.
Proceedings of the ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, 2014
Short-Circuiting Memory Traffic in Handheld Platforms.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014
Managing GPU Concurrency in Heterogeneous Architectures.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014
Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications.
Proceedings of the Seventh Workshop on General Purpose Processing Using GPUs, 2014
Trading cache hit rate for memory performance.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014
2013
Editorial to special section on networks on chip: Architecture, tools, and methodologies.
ACM Trans. Design Autom. Electr. Syst., 2013
Cross-layered resource allocation in UWB noise-OFDM-based ad hoc surveillance networks.
EURASIP J. Wirel. Commun. Netw., 2013
Orchestrated scheduling and prefetching for GPGPUs.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013
HybridMR: A Hierarchical MapReduce Scheduler for Hybrid Data Centers.
Proceedings of the IEEE 33rd International Conference on Distributed Computing Systems, 2013
CloudPD: Problem determination and diagnosis in shared dynamic clouds.
Proceedings of the 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2013
A heterogeneous multiple network-on-chip design: an application-aware approach.
Proceedings of the 50th Annual Design Automation Conference 2013, 2013
OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2013
Meeting midway: Improving CMP performance with memory-side prefetching.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013
Neither more nor less: Optimizing thread-level parallelism for GPGPUs.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013
2012
Cache invalidation strategies for Internet-based vehicular ad hoc networks.
Comput. Commun., 2012
D-factor: a quantitative model of application slow-down in multi-resource shared systems.
Proceedings of the ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, 2012
Addressing End-to-End Memory Access Latency in NoC-Based Multicores.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012
Cache revive: architecting volatile STT-RAM caches for enhanced performance in CMPs.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012
PEPON: performance-aware hierarchical power budgeting for NoC based multicores.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012
Application-aware prefetch prioritization in on-chip networks.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012
MROrchestrator: A Fine-Grained Resource Orchestration Framework for MapReduce Clusters.
Proceedings of the 2012 IEEE Fifth International Conference on Cloud Computing, 2012
2011
Aérgia: A Network-on-Chip Exploiting Packet Latency Slack.
IEEE Micro, 2011
RAFT: A router architecture with frequency tuning for on-chip networks.
J. Parallel Distributed Comput., 2011
METE: meeting end-to-end QoS in multicores through system-wide resource management.
Proceedings of the SIGMETRICS 2011, 2011
A dynamic energy management scheme for multi-tier data centers.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011
A case for heterogeneous on-chip interconnects for CMPs.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011
Architecting on-chip interconnects for stacked 3D STT-RAM caches in CMPs.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011
ACCESS: Smart scheduling for asymmetric cache CMPs.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011
Migration, Assignment, and Scheduling of Jobs in Virtualized Environment.
Proceedings of the 3rd USENIX Workshop on Hot Topics in Cloud Computing, 2011
Modeling and synthesizing task placement constraints in Google compute clusters.
Proceedings of the ACM Symposium on Cloud Computing in conjunction with SOSP 2011, 2011
2010
Network-on-Chip Architectures - A Holistic Design Exploration
Lecture Notes in Electrical Engineering 45, Springer, ISBN: 978-90-481-3030-6, 2010
Cooperative Caching in Wireless P2P Networks: Design, Implementation, and Evaluation.
IEEE Trans. Parallel Distributed Syst., 2010
On the Effects of Process Variation in Network-on-Chip Architectures.
IEEE Trans. Dependable Secur. Comput., 2010
Towards characterizing cloud backend workloads: insights from Google compute clusters.
SIGMETRICS Perform. Evaluation Rev., 2010
A Superscalar software architecture model for Multi-Core Processors (MCPs).
J. Syst. Softw., 2010
Integration of admission, congestion, and peak power control in QoS-aware clusters.
J. Parallel Distributed Comput., 2010
A realistic mobility model for wireless networks of scale-free node connectivity.
Int. J. Mob. Commun., 2010
Coordinated power management of voltage islands in CMPs.
Proceedings of the SIGMETRICS 2010, 2010
CPM in CMPs: Coordinated Power Management in Chip-Multiprocessors.
Proceedings of the Conference on High Performance Computing Networking, 2010
Aérgia: exploiting packet latency slack in on-chip networks.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010
Performance Analysis of Communications & Radar Coexistence in a Covert UWB OSA System.
Proceedings of the Global Communications Conference, 2010
Cost-driven 3D integration with interconnect layers.
Proceedings of the 47th Design Automation Conference, 2010
2009
RandomCast: An Energy-Efficient Communication Scheme for Mobile Ad Hoc Networks.
IEEE Trans. Mob. Comput., 2009
Peer-to-peer unstructured anycasting using correlated swarms.
Proceedings of the 21st International Teletraffic Congress, 2009
A case for integrated processor-cache partitioning in chip multiprocessors.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009
A case for dynamic frequency tuning in on-chip networks.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009
Application-aware prioritization mechanisms for on-chip networks.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009
On Interest Locality in Content-Based Routing for Large-scale MANETs.
Proceedings of the IEEE 6th International Conference on Mobile Adhoc and Sensor Systems, 2009
Clock-like Flow Replacement Schemes for Resilient Flow Monitoring.
Proceedings of the 29th IEEE International Conference on Distributed Computing Systems (ICDCS 2009), 2009
Mass Purging of Stale TCP Flows in Per-Flow Monitoring Systems.
Proceedings of the 18th International Conference on Computer Communications and Networks, 2009
Cooperative Cache Invalidation Strategies for Internet-Based Vehicular Ad Hoc Networks.
Proceedings of the 18th International Conference on Computer Communications and Networks, 2009
Path-Centric On-Demand Rate Adaptation for Mobile Ad Hoc Networks.
Proceedings of the 18th International Conference on Computer Communications and Networks, 2009
Design and evaluation of a hierarchical on-chip interconnect for next-generation CMPs.
Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009
MDCSim: A multi-tier data center simulation, platform.
Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009
2008
Proxy-RED: an AQM scheme for wireless local area networks.
Wirel. Commun. Mob. Comput., 2008
A dynamic quarantine scheme for controlling unresponsive TCP sessions.
Telecommun. Syst., 2008
Coscheduled distributed-Web servers on system area network.
J. Parallel Distributed Comput., 2008
On cache invalidation for internet-based vehicular ad hoc networks.
Proceedings of the IEEE 5th International Conference on Mobile Adhoc and Sensor Systems, 2008
MIRA: A Multi-layered On-Chip Interconnect Router Architecture.
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008
Exploring Anti-Spam Models in Large Scale VoIP Systems.
Proceedings of the 28th IEEE International Conference on Distributed Computing Systems (ICDCS 2008), 2008
Performance and power optimization through data compression in Network-on-Chip architectures.
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008
2007
Exploring IBA Design Space for Improved Performance.
IEEE Trans. Parallel Distributed Syst., 2007
An SSL Back-End Forwarding Scheme in Cluster-Based Web Servers.
IEEE Trans. Parallel Distributed Syst., 2007
A comprehensive performance and energy consumption analysis of scheduling alternatives in clusters.
J. Supercomput., 2007
An analytical model for interval caching in interactive video servers.
J. Netw. Comput. Appl., 2007
Performance Comparison of Coscheduling Algorithms for Non-Dedicated Clusters Through a Generic Framework.
Int. J. High Perform. Comput. Appl., 2007
Cache invalidation strategies for internet-based mobile ad hoc networks.
Comput. Commun., 2007
Memory-efficient content filtering hardware for high-speed intrusion detection systems.
Proceedings of the 2007 ACM Symposium on Applied Computing (SAC), 2007
A novel dimensionally-decomposed router for on-chip communication in 3D architectures.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007
Characterizing Network Traffic in a Cluster-based, Multi-tier Data Center.
Proceedings of the 27th IEEE International Conference on Distributed Computing Systems (ICDCS 2007), 2007
Design of a Dynamic Priority-Based Fast Path Architecture for On-Chip Interconnects.
Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects, 2007
2006
A novel caching scheme for improving Internet-based mobile ad hoc networks performance.
Ad Hoc Networks, 2006
A Hybrid SoC Interconnect with Dynamic TDMA-Based Transaction-Less Buses and On-Chip Networks.
Proceedings of the 19th International Conference on VLSI Design (VLSI Design 2006), 2006
A Distributed Multi-Point Network Interface for Low-Latency, Deadlock-Free On-Chip Interconnects.
Proceedings of the 1st International ICST Conference on Nano-Networks, 2006
ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006
Clustered Mobility Model for Scale-Free Wireless Networks.
Proceedings of the LCN 2006, 2006
A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks.
Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006
Exploring Fault-Tolerant Network-on-Chip Architectures.
Proceedings of the 2006 International Conference on Dependable Systems and Networks (DSN 2006), 2006
2005
A Holistic Approach to Designing Energy-Efficient Cluster Interconnects.
IEEE Trans. Computers, 2005
Performance analysis of a QoS capable cluster interconnect.
Perform. Evaluation, 2005
A multi-threaded PIPELINED Web server architecture for SMP/SoC machines.
Proceedings of the 14th international conference on World Wide Web, 2005
Improving Performance of Cluster-based Secure Application Servers with User-level Communication.
Proceedings of the 21st International Conference on Data Engineering, 2005
Rcast: A Randomized Communication Scheme for Improving Energy Efficiency in MANETs.
Proceedings of the 25th International Conference on Distributed Computing Systems (ICDCS 2005), 2005
A low latency router supporting adaptivity for on-chip interconnects.
Proceedings of the 42nd Design Automation Conference, 2005
A Load Balancing Scheme for Cluster-based Secure Network Servers.
Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005
Exploiting NIC Memory for Improving Cluster-Based Webserver Performance.
Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005
Design and analysis of an NoC architecture from performance, reliability and energy perspective.
Proceedings of the 2005 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2005
2004
A unified bandwidth reservation and admission control mechanism for QoS provisioning in cellular networks.
Wirel. Commun. Mob. Comput., 2004
Caching and Scheduling in NAD-Based Multimedia Servers.
IEEE Trans. Parallel Distributed Syst., 2004
Cooperative Cache-Based Data Access in Ad Hoc Networks.
Computer, 2004
An adaptive power-conserving service discipline for bluetooth (APCB) wireless networks.
Comput. Commun., 2004
Coscheduling in Clusters: Is It a Viable Alternative?
Proceedings of the ACM/IEEE SC2004 Conference on High Performance Networking and Computing, 2004
A New Class of Scheduling Policies for Providing Time of Service Guarantees in Video-On-Demand Servers.
Proceedings of the Management of Multimedia Networks and Services: 7th IFIP/IEEE International Conference, 2004
Performance comparison of cache invalidation strategies for Internet-based mobile ad hoc networks.
Proceedings of the 2004 IEEE International Conference on Mobile Ad-hoc and Sensor Systems, 2004
Improving Response Time in Cluster-Based Web Servers through Coscheduling.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004
RL-RED: A flowcontrol mechanism for 802.11-basedwireless ad hoc networks.
Proceedings of the IASTED International Conference on Communications, Internet, and Information Technology, November 22, 2004
2003
Providing Time of Service Guarantees in Video-On-Demand Servers.
Proceedings of the Twelfth International World Wide Web Conference - Posters, 2003
A Caching Mechanism for Improving Internet based Mobile Ad Hoc Networks Performance.
Proceedings of the Twelfth International World Wide Web Conference - Posters, 2003
An End-to-End Resource Scheduling Scheme for the Presentation of Composite Multimedia Information in a Networked Environment.
Proceedings of the 9th International Conference on Multi-Media Modeling, 2003
Energy optimization techniques in cluster interconnects.
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003
An Integrated Resource Sharing Policy for Multimedia Storage Servers Based on Network-Attached Disks.
Proceedings of the 23rd International Conference on Distributed Computing Systems (ICDCS 2003), 2003
A novel caching scheme for Internet based mobile ad hoc networks.
Proceedings of the 12th International Conference on Computer Communications and Networks, 2003
Performance Enhancement Techniques for InfiniBand? Architecture.
Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003
A control theoretic approach for designing adaptive AQM schemes.
Proceedings of the Global Telecommunications Conference, 2003
Impact of Job Allocation Strategies on Communication-Driven Coscheduling in Clusters.
Proceedings of the Euro-Par 2003. Parallel Processing, 2003
Co-Ordinated Coscheduling in Time-Sharing Clusters through a Generic Framework.
Proceedings of the 2003 IEEE International Conference on Cluster Computing (CLUSTER 2003), 2003
HaTCh: a two-level caching scheme for estimating the number of active flows.
Proceedings of the 42nd IEEE Conference on Decision and Control, 2003
A Simulation-Based Analysis of Scheduling Policies for Multimedia Server.
Proceedings of the Proceedings 36th Annual Simulation Symposium (ANSS-36 2003), Orlando, Florida, USA, March 30, 2003
2002
MediaWorm: A QoS Capable Router Architecture for Clusters.
IEEE Trans. Parallel Distributed Syst., 2002
A Fast and Efficient Processor Allocation Scheme for Mesh-Connected Multicomputers.
IEEE Trans. Computers, 2002
An admission control scheme for QoS-sensitive cellular networks.
Proceedings of the 2002 IEEE Wireless Communications and Networking Conference Record, 2002
A Strategy to Compute the InfiniBand Arbitration Tables.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002
Power-Aware Prefetch in Mobile Environments.
Proceedings of the 22nd International Conference on Distributed Computing Systems (ICDCS'02), 2002
An adaptive power-conserving service discipline for Bluetooth.
Proceedings of the IEEE International Conference on Communications, 2002
Providing fairness in DiffServ architecture.
Proceedings of the Global Telecommunications Conference, 2002
Stabilized virtual buffer (SVB) - an active queue management scheme for Internet quality-of-service.
Proceedings of the Global Telecommunications Conference, 2002
Integrated Admission and Congestion Control for QoS Support in Clusters.
Proceedings of the 2002 IEEE International Conference on Cluster Computing (CLUSTER 2002), 2002
2001
Impact of Virtual Channels and Adaptive Routing on Application Performance.
IEEE Trans. Parallel Distributed Syst., 2001
Efficient processor management schemes for mesh-connected multicomputers.
Parallel Comput., 2001
On the Effectiveness of a Counter-Based Cache Invalidation Scheme and Its Resiliency to Failures in Mobile Environments.
Proceedings of the 20th Symposium on Reliable Distributed Systems (SRDS 2001), 2001
Calculation of Deadline Missing Probability in a QoS Capable Cluster Interconnect.
Proceedings of the IEEE International Symposium on Network Computing and Applications (NCA 2001), 2001
An Analytical Model for a QoS Capable Cluster Interconnect.
Proceedings of the Proceedings 11th GI/ITG Conference on Measuring, 2001
QoS provisioning in clusters: an investigation of Router and NIC design.
Proceedings of the 28th Annual International Symposium on Computer Architecture, 2001
A Differential Bandwidth Reservation Policy for Multimedia Wireless Networks.
Proceedings of the 30th International Workshops on Parallel Processing (ICPP 2001 Workshops), 2001
Adaptive Block Rearrangement Algorithms for Video-On-Demand Servers.
Proceedings of the 2001 International Conference on Parallel Processing, 2001
Selective Checkpointing and Rollbacks in Multithreaded Distributed Systems.
Proceedings of the 21st International Conference on Distributed Computing Systems (ICDCS 2001), 2001
2000
A Reliable Statistical Admission Control Strategy for Interactive Video-on-Demand Servers with Interval Caching.
Proceedings of the 2000 International Conference on Parallel Processing, 2000
Investigating QoS Support for Traffic Mixes with the MediaWorm Router.
Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000
1999
A Testbed for Evaluation of Fault-Tolerant Routing in Multiprocessor Interconnection Networks.
IEEE Trans. Parallel Distributed Syst., 1999
Alternatives to Coscheduling a Network of Workstations.
J. Parallel Distributed Comput., 1999
Issues in the Design of a Reflective Library for Checkpointing C++ Objects.
Proceedings of the Eighteenth Symposium on Reliable Distributed Systems, 1999
A Closer Look at Coscheduling Approaches for a Network of Workstations.
Proceedings of the Eleventh Annual ACM Symposium on Parallel Algorithms and Architectures, 1999
A Parallel Optimal Branch-and-Bound Algorithm for MIN-Based Multiprocessors.
Proceedings of the International Conference on Parallel Processing 1999, 1999
LAPSES: A Recipe for High Performance Adaptive Router Design.
Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, 1999
1998
A Fast and Efficient Processor Management Scheme for k-ary n-cubes.
J. Parallel Distributed Comput., 1998
The Penn State Computing Condominium Scheduling System.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1998
Analyzing Cache Performance for Video Servers.
Proceedings of the 1998 International Conference on Parallel Processing Workshops, 1998
Virtual channel multiplexing in networks of workstations with irregular topology.
Proceedings of the 5th International Conference On High Performance Computing, 1998
1997
Performance Analysis of Buffering Schemes on Wormhole Routers.
IEEE Trans. Computers, 1997
Performance Benefits of Virtual Channels and Adaptive Routing: An Application-Driven Study.
Proceedings of the 11th international conference on Supercomputing, 1997
Good Processor Management = Fast Allocation + Efficient Scheduling.
Proceedings of the 1997 International Conference on Parallel Processing (ICPP '97), 1997
Communication in Parallel Applications: Characterization and Sensitivity Analysis.
Proceedings of the 1997 International Conference on Parallel Processing (ICPP '97), 1997
A Performance Modeling Technique for Mesh-Connected Multicomputers.
Proceedings of the 1997 International Conference on Parallel and Distributed Systems (ICPADS '97), 1997
Towards a Communication Characterization Methodology for Parallel Applications.
Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture (HPCA '97), 1997
1996
Performance Analysis of Finite-Buffered Asynchronous Multistage Interconnection Networks.
IEEE Trans. Parallel Distributed Syst., 1996
A probabilistic model for the fault tolerance of multilayer perceptrons.
IEEE Trans. Neural Networks, 1996
Allocation and Mapping Based Reliability Analysis of Multistage Interconnection Networks.
IEEE Trans. Computers, 1996
A Task-Based Dependability Model kor k-ary n-Cubes.
Proceedings of the 1996 International Conference on Parallel Processing, 1996
Parallel Simulation of Mesh Routing Algorithms.
Proceedings of the 16th International Conference on Distributed Computing Systems, 1996
1995
Disjoint Task Allocation Algorithms for MIN Machines with Minimal Conflicts.
IEEE Trans. Parallel Distributed Syst., 1995
Distributed Fault Diagnosis in Multistage Network-Based Multiprocessors.
IEEE Trans. Computers, 1995
On Dependability Evaluation of Mesh-Connected Processors.
IEEE Trans. Computers, 1995
A Lazy Scheduling Scheme for Hypercube Computers.
J. Parallel Distributed Comput., 1995
Experimenting with a Shared Virtual Memory Environment for Hypercubes.
J. Parallel Distributed Comput., 1995
Cache Coherence in Multiprocessors: A Survey.
Adv. Comput., 1995
Processor Management Techniques for Mesh-Connected Multiprocessors.
Proceedings of the 1995 International Conference on Parallel Processing, 1995
Fault-Tolerant Routing in Mesh Networks.
Proceedings of the 1995 International Conference on Parallel Processing, 1995
Modeling Virtual Channel Flow Control in Hypercubes.
Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture (HPCA 1995), 1995
1994
Evaluation of a Parallel Branch-and-Bound Algorithm on a Class of Multiprocessors.
IEEE Trans. Parallel Distributed Syst., 1994
A Cache coherence protocol for MIN-based multiprocessors.
J. Supercomput., 1994
Performance Analysis of Cluster-Based Multiprocessors.
IEEE Trans. Computers, 1994
Hypercube Communication Delay with Wormhole Routing.
IEEE Trans. Computers, 1994
Limit Allocation: An Efficient Processor Management Scheme for Hypercubes.
Proceedings of the 1994 International Conference on Parallel Processing, 1994
Performance Analysis of Combining Multistage Interconnection Networks.
Proceedings of the 1994 International Conference on Parallel Processing, 1994
A Shared Memory Environment for Hypercubes.
Proceedings of the 1994 International Conference on Parallel Processing, 1994
Efficient Fully Adaptive Wormhole Routing in n-Dimensional Meshes.
Proceedings of the 14th International Conference on Distributed Computing Systems, 1994
A Switch Cache Design for MIN-Based Shared-Memory Multiprocessors.
Proceedings of the Parallel Processing: CONPAR 94, 1994
1993
An Availability Model for MIN-Based Multiprocessors.
IEEE Trans. Parallel Distributed Syst., 1993
A Cache Coherence Protocol for MIN-Based Multprocessors With Limited Inclusion.
Proceedings of the 1993 International Conference on Parallel Processing, 1993
A Lazy Scheduling Scheme for Improving Hypercube Performance.
Proceedings of the 1993 International Conference on Parallel Processing, 1993
A Queuing Model for Finite-Buffered Multistage Interconnection Networks.
Proceedings of the 1993 International Conference on Parallel Processing, 1993
A Class of Partially Adaptive Routing Algorithms for n_dimensional Meshes.
Proceedings of the 1993 International Conference on Parallel Processing, 1993
Performance of multilayer neural networks in binary-to-binary mappings under weight errors.
Proceedings of International Conference on Neural Networks (ICNN'88), San Francisco, CA, USA, March 28, 1993
1992
A Unified Task-Based Dependability Model for Hypercube Computers.
IEEE Trans. Parallel Distributed Syst., 1992
Analytical Modeling of a Parallel Branch-and-Bound Algorithm on MIN-Based Multiprocessors.
Proceedings of the 6th International Parallel Processing Symposium, 1992
Multitasking in Multistage Interconnection Network Machines.
Proceedings of the 12th International Conference on Distributed Computing Systems, 1992
1991
A Top-Down Processor Allocation Scheme for Hypercube Computers.
IEEE Trans. Parallel Distributed Syst., 1991
A Parallel Branch-and Bound Algorithm for MIN-Based Multiprocessors.
Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems, 1991
On Subcube Dependability in a Hypercube.
Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems, 1991
A Cache-Based Checkpointing Scheme for MIN-Based Multiprocessors.
Proceedings of the International Conference on Parallel Processing, 1991
Modeling wormhole routing in a hypercube.
Proceedings of the 10th International Conference on Distributed Computing Systems (ICDCS 1991), 1991
1990
Dependability Modeling for Multiprocessors.
Computer, 1990
A write update cache coherence protocol for MIN-based multiprocessors with accessibility-based split caches.
Proceedings of the Proceedings Supercomputing '90, New York, NY, USA, November 12-16, 1990, 1990
Fault-Tolerant Task Mapping Algorithms for MIN-Based Multiprocessors.
Proceedings of the 1990 International Conference on Parallel Processing, 1990
Availability evaluation of MIN-connected multiprocessors using decomposition technique.
Proceedings of the 20th International Symposium on Fault-Tolerant Computing, 1990
1989
A Conflict-Free Routing Scheme on Multistage Interconnection Networks.
IEEE Trans. Computers, 1989
Distributed Fault Diagnosis in the Butterfly Parallel Processor.
Proceedings of the International Conference on Parallel Processing, 1989
A Processor Allocation Scheme for Hypercube Computers.
Proceedings of the International Conference on Parallel Processing, 1989
An analytical model for computing hypercube availability.
Proceedings of the Nineteenth International Symposium on Fault-Tolerant Computing, 1989
1988
A Reliability Predictor for MIN-connected Multiprocessor Systems.
Proceedings of the International Conference on Parallel Processing, 1988
A quadtree communication structure for fast data searching and distribution.
Proceedings of the Twelfth International Computer Software and Applications Conference, 1988
1987
Dependability evaluation of interconnection networks.
Inf. Sci., 1987
1986
Dependability Evaluation of Multicomputer Networks.
Proceedings of the International Conference on Parallel Processing, 1986
1985
Bandwidth Availability of Multiple-Bus Multiprocessors.
IEEE Trans. Computers, 1985
Computation Availability of Multiple-Bus Multiprocessors.
Proceedings of the International Conference on Parallel Processing, 1985
Reliability Simulation of Multiprocessor Systems.
Proceedings of the International Conference on Parallel Processing, 1985