2025
Teaching Cloud Infrastructure and Scalable Application Deployment in an Undergraduate Computer Science Program.
Proceedings of the 56th ACM Technical Symposium on Computer Science Education V. 1, 2025
2024
Special Issue on The Past, Present, and Future of Warehouse-Scale Computing.
IEEE Micro, 2024
MoEtion: Efficient and Reliable Checkpointing for Mixture-of-Experts Models at Scale.
CoRR, 2024
AI Metropolis: Scaling Large Language Model-based Multi-Agent Simulation with Out-of-order Execution.
CoRR, 2024
Wave: A Split OS Architecture for Application Engines.
CoRR, 2024
Cloud Atlas: Efficient Fault Localization for Cloud Systems using Language Models and Causal Insight.
CoRR, 2024
SlipStream: Adapting Pipelines for Distributed Training of Large DNNs Amid Failures.
CoRR, 2024
cedar: Composable and Optimized Machine Learning Input Data Pipelines.
CoRR, 2024
ReCycle: Resilient Training of Large DNNs using Pipeline Adaptation.
Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, 2024
High-throughput and Flexible Host Networking for Accelerated Computing.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024
SGLang: Efficient Execution of Structured Language Model Programs.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
2023
R<sup>3</sup>: Record-Replay-Retroaction for Database-Backed Applications.
Proc. VLDB Endow., 2023
Efficiently Programming Large Language Models using SGLang.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2023
Zelda: Video Analytics using Vision-Language Models.
CoRR, 2023
FlexShard: Flexible Sharding for Industry-Scale Sequence Recommendation Models.
CoRR, 2023
Honeycomb: Secure and Efficient GPU Executions via Static Validation.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023
RecD: Deduplication for End-to-End Deep Learning Recommendation Model Training Infrastructure.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023
Transactions Make Debugging Easy.
Proceedings of the 13th Conference on Innovative Data Systems Research, 2023
2022
RAIL: Predictable, Low Tail Latency for NVMe Flash.
ACM Trans. Storage, 2022
Optimizing Video Analytics with Declarative Model Relationships.
Proc. VLDB Endow., 2022
Apiary: A DBMS-Backed Transactional Function-as-a-Service Framework.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2022
Towards <i>μ</i>s tail latency and terabit ethernet: disaggregating the host network stack.
Proceedings of the SIGCOMM '22: ACM SIGCOMM 2022 Conference, Amsterdam, The Netherlands, August 22, 2022
Understanding data storage and ingestion for large-scale deep recommendation model training: industrial product.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022
Hermod: principled and practical scheduling for serverless functions.
Proceedings of the 13th Symposium on Cloud Computing, SoCC 2022, 2022
VIVA: An End-to-End System for Interactive Video Analytics.
Proceedings of the 12th Conference on Innovative Data Systems Research, 2022
A Progress Report on DBOS: A Database-oriented Operating System.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 12th Conference on Innovative Data Systems Research, 2022
ShEF: shielded enclaves for cloud FPGAs.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022
SOL: safe on-node learning in cloud platforms.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022
RecShard: statistical feature-based memory optimization for industry-scale neural recommendation.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022
2021
DBOS: A DBMS-oriented Operating System.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proc. VLDB Endow., 2021
Practical Scheduling for Real-World Serverless Computing.
CoRR, 2021
Understanding and Co-designing the Data Ingestion Pipeline for Industry-Scale RecSys Training.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2021
RAMBO: Resource Allocation for Microservices Using Bayesian Optimization.
IEEE Comput. Archit. Lett., 2021
INFaaS: Automated Model-less Inference Serving.
Proceedings of the 2021 USENIX Annual Technical Conference, 2021
Syrup: User-Defined Scheduling Across the Stack.
Proceedings of the SOSP '21: ACM SIGOPS 28th Symposium on Operating Systems Principles, 2021
ghOSt: Fast & Flexible User-Space Delegation of Linux Scheduling.
Proceedings of the SOSP '21: ACM SIGOPS 28th Symposium on Operating Systems Principles, 2021
A case against (most) context switches.
Proceedings of the HotOS '21: Workshop on Hot Topics in Operating Systems, 2021
SmartHarvest: harvesting idle CPUs safely and efficiently in the cloud.
Proceedings of the EuroSys '21: Sixteenth European Conference on Computer Systems, 2021
Interference-Aware Scheduling for Inference Serving.
Proceedings of the EuroMLSys@EuroSys 2021, 2021
Llama: A Heterogeneous & Serverless Framework for Auto-Tuning Video Analytics Pipelines.
Proceedings of the SoCC '21: ACM Symposium on Cloud Computing, 2021
Faa$T: A Transparent Auto-Scaling Cache for Serverless Applications.
Proceedings of the SoCC '21: ACM Symposium on Cloud Computing, 2021
2020
AsmDB: Understanding and Mitigating Front-End Stalls in Warehouse-Scale Computers.
IEEE Micro, 2020
The Hot Chips Renaissance.
IEEE Micro, 2020
RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers (Technical Report).
CoRR, 2020
DBOS: A Proposal for a Data-Centric Operating System.
CoRR, 2020
RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers.
Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020
A Polystore Based Database Operating System (DBOS).
Proceedings of the Heterogeneous Data Management, Polystores, and Analytics for Healthcare, 2020
Leveraging application classes to save power in highly-utilized data centers.
Proceedings of the SoCC '20: ACM Symposium on Cloud Computing, 2020
Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020
Classifying Memory Access Patterns for Prefetching.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020
2019
Pocket: Elastic Ephemeral Storage for Serverless Analytics.
login Usenix Mag., 2019
Outsourcing Everyday Jobs to Thousands of Cloud Functions with gg.
login Usenix Mag., 2019
INFaaS: Managed & Model-less Inference Serving.
CoRR, 2019
A New Frontier for Pull-Based Graph Processing.
CoRR, 2019
From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers.
Proceedings of the 2019 USENIX Annual Technical Conference, 2019
Shinjuku: Preemptive Scheduling for μsecond-scale Tail Latency.
Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation, 2019
A Case for Managed and Model-less Inference Serving.
Proceedings of the Workshop on Hot Topics in Operating Systems, 2019
Mind the Gap: A Case for Informed Request Scheduling at the NIC.
Proceedings of the 18th ACM Workshop on Hot Topics in Networks, 2019
Centralized Core-granular Scheduling for Serverless Functions.
Proceedings of the ACM Symposium on Cloud Computing, SoCC 2019, 2019
TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019
2018
<i>QuMan</i>: Profile-based Improvement of Cluster Utilization.
ACM Trans. Archit. Code Optim., 2018
Plasticine: A Reconfigurable Accelerator for Parallel Patterns.
IEEE Micro, 2018
Uncovering the Security Implications of Cloud Multi-Tenancy with Bolt.
IEEE Micro, 2018
Trevor: Automatic configuration and scaling of stream processing pipelines.
CoRR, 2018
DNN Dataflow Choice Is Overrated.
,
,
,
,
,
,
,
,
,
,
CoRR, 2018
Amdahl's law for tail latency.
Commun. ACM, 2018
Understanding Ephemeral Storage for Serverless Analytics.
Proceedings of the 2018 USENIX Annual Technical Conference, 2018
Selecta: Heterogeneous Cloud Storage Configuration for Data Analytics.
Proceedings of the 2018 USENIX Annual Technical Conference, 2018
Making pull-based graph processing performant.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018
Spatial: a language and compiler for application accelerators.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018
Learning Memory Access Patterns.
Proceedings of the 35th International Conference on Machine Learning, 2018
GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018
Memory Hierarchy for Web Search.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018
2017
Corrigendum to "The IX Operating System: Combining Low Latency, High Throughput and Efficiency in a Protected Dataplane".
ACM Trans. Comput. Syst., 2017
The IX Operating System: Combining Low Latency, High Throughput, and Efficiency in a Protected Dataplane.
ACM Trans. Comput. Syst., 2017
DRAF: A Low-Power DRAM-Based Reconfigurable Acceleration Fabric.
IEEE Micro, 2017
AppSwitch: Resolving the Application Identity Crisis.
CoRR, 2017
Persona: A High-Performance Bioinformatics Framework.
Proceedings of the 2017 USENIX Annual Technical Conference, 2017
Plasticine: A Reconfigurable Architecture For Parallel Paterns.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017
3D nanosystems enable <i>embedded</i> abundant-data computing: special session paper.
Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion, 2017
ReFlex: Remote Flash ≈ Local Flash.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017
TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017
Bolt: I Know What You Did Last Summer... In The Cloud.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017
2016
Improving Resource Efficiency at Scale with Heracles.
ACM Trans. Comput. Syst., 2016
Security Implications of Data Mining in Cloud Scheduling.
IEEE Comput. Archit. Lett., 2016
Automatic Generation of Efficient Accelerators for Reconfigurable Hardware.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016
HRL: Efficient and flexible reconfigurable logic for near-data processing.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016
Flash storage disaggregation.
Proceedings of the Eleventh European Conference on Computer Systems, 2016
Generating Configurable Hardware from Parallel Patterns.
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016
HCloud: Resource-Efficient Provisioning in Shared Cloud Systems.
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016
2015
Energy-Efficient Abundant-Data Computing: The N3XT 1, 000x.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Computer, 2015
Convolution engine: balancing efficiency and flexibility in specialized computing.
Commun. ACM, 2015
Heracles: improving resource efficiency at scale.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015
Energy proportionality and workload consolidation for latency-critical applications.
Proceedings of the Sixth ACM Symposium on Cloud Computing, 2015
Tarcil: reconciling scheduling speed and quality in large shared clusters.
Proceedings of the Sixth ACM Symposium on Cloud Computing, 2015
Practical Near-Data Processing for In-Memory Analytics Frameworks.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015
2014
Quality-of-Service-Aware Scheduling in Heterogeneous Data centers with Paragon.
IEEE Micro, 2014
IX: A Protected Dataplane Operating System for High Throughput and Low Latency.
Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, 2014
Towards energy proportionality for large-scale latency-critical workloads.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014
Dynamic management of TurboMode in modern multi-core chips.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014
Reconciling high server utilization and sub-millisecond quality-of-service.
Proceedings of the Ninth Eurosys Conference 2014, 2014
Quasar: resource-efficient and QoS-aware cluster management.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014
2013
QoS-Aware scheduling in heterogeneous datacenters with paragon.
ACM Trans. Comput. Syst., 2013
Measuring and analyzing the energy use of enterprise computing systems.
Sustain. Comput. Informatics Syst., 2013
Selected Research from Hot Chips 24.
IEEE Micro, 2013
The Netflix Challenge: Datacenter Edition.
IEEE Comput. Archit. Lett., 2013
Locality-aware task management for unstructured parallelism: a quantitative limit study.
Proceedings of the 25th ACM Symposium on Parallelism in Algorithms and Architectures, 2013
Advancing computer systems without technology progress.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013
ZSim: fast and accurate microarchitectural simulation of thousand-core systems.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013
Convolution engine: balancing efficiency & flexibility in specialized computing.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013
iBench: Quantifying interference for datacenter applications.
Proceedings of the IEEE International Symposium on Workload Characterization, 2013
QoS-Aware Admission Control in Heterogeneous Datacenters.
Proceedings of the 10th International Conference on Autonomic Computing, 2013
Resource efficient computing for warehouse-scale datacenters.
Proceedings of the Design, Automation and Test in Europe, 2013
Paragon: QoS-aware scheduling for heterogeneous datacenters.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2013
2012
Improving System Energy Efficiency with Memory Rank Subsetting.
ACM Trans. Archit. Code Optim., 2012
Scalable and Efficient Fine-Grained Cache Partitioning with Vantage.
IEEE Micro, 2012
Decoupling Datacenter Storage Studies from Access to Large-Scale Applications.
IEEE Comput. Archit. Lett., 2012
Dune: Safe User-level Access to Privileged CPU Features.
Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation, 2012
Towards energy-proportional datacenter memory with mobile DRAM.
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012
ECHO: Recreating network traffic maps for datacenters with tens of thousands of servers.
Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012
SCD: A scalable coherence directory with flexible sharer set encoding.
Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012
Proceedings of the 2012 IEEE Hot Chips 24 Symposium (HCS), 2012
Green enterprise computing data: Assumptions and realities.
Proceedings of the 2012 International Green Computing Conference, 2012
A case of system-level hardware/software co-design and co-verification of a commodity multi-processor system with custom hardware.
Proceedings of the 10th International Conference on Hardware/Software Codesign and System Synthesis, 2012
2011
,
,
,
,
,
,
,
,
,
,
,
,
,
Commun. ACM, 2011
Understanding sources of ineffciency in general-purpose chips.
Commun. ACM, 2011
Time and Cost-Efficient Modeling and Generation of Large-Scale TPCC/TPCE/TPCH Workloads.
Proceedings of the Topics in Performance Evaluation, Measurement and Characterization, 2011
MARS: adaptive remote execution for multi-threaded mobile devices.
Proceedings of the 3rd ACM SOSP Workshop on Networking, 2011
Storage I/O generation and replay for datacenter applications.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011
Vantage: scalable and efficient fine-grain cache partitioning.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011
Decoupling datacenter studies from access to large-scale applications: A modeling approach for storage workloads.
Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011
Cross-Examination of Datacenter Workload Modeling Techniques.
Proceedings of the 31st IEEE International Conference on Distributed Computing Systems Workshops (ICDCS 2011 Workshops), 2011
A few ways can take you a long way: Efficient and highly associative caches with scalable partitioning for many-core CMPs.
Proceedings of the 2011 IEEE Hot Chips 23 Symposium (HCS), 2011
Hardware acceleration of transactional memory on commodity systems.
Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, 2011
Dynamic Fine-Grain Scheduling of Pipeline Parallelism.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011
2010
An analysis of on-chip interconnection networks for large-scale chip multiprocessors.
ACM Trans. Archit. Code Optim., 2010
On the energy (in)efficiency of Hadoop clusters.
ACM SIGOPS Oper. Syst. Rev., 2010
Tainting is not pointless.
ACM SIGOPS Oper. Syst. Rev., 2010
Server Engineering Insights for Large-Scale Online Services.
IEEE Micro, 2010
Implementing and evaluating nested parallel transactions in software transactional memory.
Proceedings of the SPAA 2010: Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2010
Evaluating Bufferless Flow Control for On-chip Networks.
Proceedings of the NOCS 2010, 2010
The ZCache: Decoupling Ways and Associativity.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010
Understanding sources of inefficiency in general-purpose chips.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010
Eigenbench: A simple exploration tool for orthogonal TM characteristics.
Proceedings of the 2010 IEEE International Symposium on Workload Characterization, 2010
Making nested parallel transactions practical using lightweight hardware support.
Proceedings of the 24th International Conference on Supercomputing, 2010
Implementing and Evaluating a Model Checker for Transactional Memory Systems.
Proceedings of the 15th IEEE International Conference on Engineering of Complex Computer Systems, 2010
FARM: A Prototyping Environment for Tightly-Coupled, Heterogeneous Architectures.
Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2010
Evaluating impact of manageability features on device performance.
Proceedings of the 6th International Conference on Network and Service Management, 2010
Flexible architectural support for fine-grain scheduling.
Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010
2009
Optimizing Memory Transactions for Multicore Systems.
Proceedings of the Multicore Processors and Systems, 2009
The case for RAMClouds: scalable high-performance storage entirely in DRAM.
,
,
,
,
,
,
,
,
,
,
,
,
ACM SIGOPS Oper. Syst. Rev., 2009
Guest Editors' Introduction: Hot Chips Turns 20.
IEEE Micro, 2009
Power Management of Datacenter Workloads Using Per-Core Power Gating.
IEEE Comput. Archit. Lett., 2009
Nemesis: Preventing Authentication & Access Control Vulnerabilities in Web Applications.
Proceedings of the 18th USENIX Security Symposium, 2009
Future scaling of processor-memory interfaces.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009
Feedback-directed barrier optimization in a strongly isolated STM.
Proceedings of the 36th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2009
A memory system design framework: creating smart memories.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009
Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system.
Proceedings of the 2009 IEEE International Symposium on Workload Characterization, 2009
Fast memory snapshot for concurrent programmingwithout synchronization.
Proceedings of the 23rd international conference on Supercomputing, 2009
The stanford pervasive parallelism lab.
Proceedings of the 2009 IEEE Hot Chips 21 Symposium (HCS), 2009
Decoupling Dynamic Information Flow Tracking with a dedicated coprocessor.
Proceedings of the 2009 IEEE/IFIP International Conference on Dependable Systems and Networks, 2009
2008
Comparative evaluation of memory models for chip multiprocessors.
ACM Trans. Archit. Code Optim., 2008
Real-World Buffer Overflow Protection for Userspace and Kernelspace.
Proceedings of the 17th USENIX Security Symposium, 2008
Improving software concurrency with hardware-assisted memory snapshot.
Proceedings of the SPAA 2008: Proceedings of the 20th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2008
Ased: availability, security, and debugging support usingtransactional memory.
Proceedings of the SPAA 2008: Proceedings of the 20th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2008
Hardware Enforcement of Application Security Policies Using Tagged Memory.
Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation, 2008
A Comparison of High-Level Full-System Power Models.
Proceedings of the Workshop on Power Aware Computing and Systems, 2008
STAMP: Stanford Transactional Applications for Multi-Processing.
Proceedings of the 4th International Symposium on Workload Characterization (IISWC 2008), 2008
Thread-safe dynamic binary translation using transactional memory.
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008
2007
From chaos to QoS: case studies in CMP resource management.
SIGARCH Comput. Archit. News, 2007
RAMP: Research Accelerator for Multiple Processors.
IEEE Micro, 2007
Transactional Memory: The Hardware-Software Interface.
IEEE Micro, 2007
Models and Metrics to Enable Energy-Efficiency Optimizations.
Computer, 2007
Towards soft optimization techniques for parallel cognitive applications.
Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007
JouleSort: a balanced energy-efficiency benchmark.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2007
Transactional collection classes.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007
Transactional programming in a multi-core environment.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007
Potential show-stoppers for transactional synchronization.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007
An effective hybrid transactional memory system with strong isolation guarantees.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007
Comparing memory systems for chip multiprocessors.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007
Raksha: a flexible information flow architecture for software security.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007
Evaluating MapReduce for Multi-core and Multiprocessor Systems.
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007
A Scalable, Non-blocking Approach to Transactional Memory.
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007
A practical FPGA-based framework for novel CMP research.
Proceedings of the ACM/SIGDA 15th International Symposium on Field Programmable Gate Arrays, 2007
Register pointer architecture for efficient embedded processors.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007
ATLAS: a chip-multiprocessor with transactional memory support.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007
A low power front-end for embedded processors using a block-aware instruction set.
Proceedings of the 2007 International Conference on Compilers, 2007
The OpenTM Transactional Application Programming Interface.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007
2006
Block-aware instruction set architecture.
ACM Trans. Archit. Code Optim., 2006
Executing Java programs with transactional memory.
Sci. Comput. Program., 2006
The Atomos transactional programming language.
Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, 2006
Architectural Semantics for Practical Transactional Memory.
Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006
Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006
The common case transactional behavior of multithreaded programs.
Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006
Research accelerator for multiple processors.
Proceedings of the 2006 IEEE Hot Chips 18 Symposium (HCS), 2006
Transactional memory implementation overview.
Proceedings of the 2006 IEEE Hot Chips 18 Symposium (HCS), 2006
Simultaneously improving code size, performance, and energy in embedded processors.
Proceedings of the Conference on Design, Automation and Test in Europe, 2006
Tradeoffs in transactional memory virtualization.
Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006
Testing implementations of transactional memory.
Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006
2005
Energy-efficient and high-performance instruction fetch using a block-aware ISA.
Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005
TAPE: a transactional application profiling environment.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005
Heuristics for Profile-Driven Method-Level Speculative Parallelization.
Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005
Automatic power management schemes for Internet servers and data centers.
Proceedings of the Global Telecommunications Conference, 2005. GLOBECOM '05, St. Louis, Missouri, USA, 28 November, 2005
Improving Instruction Delivery with a Block-Aware ISA.
Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005
Characterization of TCC on Chip-Multiprocessors.
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005
2004
Transactional Coherence and Consistency: Simplifying Parallel Hardware and Software.
IEEE Micro, 2004
Transactional Memory Coherence and Consistency.
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004
Programming with transactional coherence and consistency (TCC).
Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, 2004
The Stream Virtual Machine.
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT 2004), 29 September, 2004
2003
Scalable Vector Processors for Embedded Systems.
IEEE Micro, 2003
Overcoming the Limitations of Conventional Vector Processors.
Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003
2002
Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks.
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002
2001
Hardware/compiler codevelopment for an embedded media processor.
Proc. IEEE, 2001
2000
Exploiting On-Chip Memory Bandwidth in the VIRAM Compiler.
Proceedings of the Intelligent Memory Systems, Second International Workshop, 2000
How to Solve the Current Memory Access and Data Transfer Bottlenecks: At the Processor Architecture or at the Compiler Level?
Proceedings of the 2000 Design, 2000
1998
A New Direction for Computer Architecture Research.
Computer, 1998
Embedded memories in system design - from technology to systems architecture.
Proceedings of the 1998 IEEE/ACM International Conference on Computer-Aided Design, 1998
1997
A case for intelligent RAM.
IEEE Micro, 1997
Scalable Processors in the Billion-Transistor Era: IRAM.
,
,
,
,
,
,
,
,
,
,
,
,
Computer, 1997
The Energy Efficiency of IRAM Architectures.
Proceedings of the 24th International Symposium on Computer Architecture, 1997
Intelligent RAM (IRAM): The Industrial Setting, Applications and Architectures.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Proceedings 1997 International Conference on Computer Design: VLSI in Computers & Processors, 1997
Pipelined Multi-Queue Management in a VLSI ATM Switch Chip with Credit-Based Flow-Control.
Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97), 1997