Carole-Jean Wu

Orcid: 0000-0002-9032-7239

Affiliations:
  • Facebook AI Research
  • Arizona State University, AZ, USA


According to our database1, Carole-Jean Wu authored at least 116 papers between 2011 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Beyond Efficiency: Scaling AI Sustainably.
IEEE Micro, 2024

Revisiting Reliability in Large-Scale Machine Learning Research Clusters.
CoRR, 2024

Characterizing and Efficiently Accelerating Multimodal Generation Model Inference.
CoRR, 2024

Unlocking the Potential of Renewable Energy Through Curtailment Prediction.
CoRR, 2024

Is Flash Attention Stable?
CoRR, 2024

Introducing v0.5 of the AI Safety Benchmark from MLCommons.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
CoRR, 2024

Croissant: A Metadata Format for ML-Ready Datasets.
CoRR, 2024

HeteroSwitch: Characterizing and Taming System-Induced Data Heterogeneity in Federated Learning.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

Generative AI Beyond LLMs: System Implications of Multi-Modal Generation.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2024

MAD-Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

CHAI: Clustered Head Attention for Efficient LLM Inference.
Proceedings of the Forty-first International Conference on Machine Learning, 2024


LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Special Issue on Environmentally Sustainable Computing.
IEEE Micro, 2023

Architectural CO<sub>2</sub> Footprint Tool: Designing Sustainable Computer Systems With an Architectural Carbon Modeling Tool.
IEEE Micro, 2023

Federated Ensemble Learning: Increasing the Capacity of Label Private Recommendation Systems.
IEEE Data Eng. Bull., 2023

Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data.
CoRR, 2023

Data Acquisition: A New Frontier in Data-centric AI.
CoRR, 2023

Carbon Responder: Coordinating Demand Response for the Datacenter Fleet.
CoRR, 2023

GEVO-ML: Optimizing Machine Learning Code with Evolutionary Computation.
CoRR, 2023

READ: Recurrent Adaptation of Large Transformers.
CoRR, 2023

Design Space Exploration and Optimization for Carbon-Efficient Extended Reality Systems.
CoRR, 2023

GreenScale: Carbon-Aware Systems for Edge Computing.
CoRR, 2023

Green Federated Learning.
CoRR, 2023

Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference.
CoRR, 2023

FlexShard: Flexible Sharding for Industry-Scale Sequence Recommendation Models.
CoRR, 2023

Tectonic-Shift: A Composite Storage Fabric for Large-Scale ML Training.
Proceedings of the 2023 USENIX Annual Technical Conference, 2023


RecD: Deduplication for End-to-End Deep Learning Recommendation Model Training Infrastructure.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

Carbon-Efficient Design Optimization for Computing Systems.
Proceedings of the 2nd Workshop on Sustainable Computer Systems, 2023

MP-Rec: Hardware-Software Co-design to Enable Multi-path Recommendation.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

Carbon Explorer: A Holistic Framework for Designing Carbon Aware Datacenters.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
EdgeWise: Energy-efficient CNN Computation on Edge Devices under Stochastic Communication Delays.
ACM Trans. Embed. Comput. Syst., September, 2022

CAMDNN: Content-Aware Mapping of a Network of Deep Neural Networks on Edge MPSoCs.
IEEE Trans. Computers, 2022

Chasing Carbon: The Elusive Environmental Footprint of Computing.
IEEE Micro, 2022

Understanding Scaling Laws for Recommendation Models.
CoRR, 2022

DataPerf: Benchmarks for Data-Centric AI Development.
CoRR, 2022

FEL: High Capacity Learning for Recommendation and Ranking via Federated Ensemble Learning.
CoRR, 2022

A Holistic Approach for Designing Carbon Aware Datacenters.
CoRR, 2022

On Sampling Collaborative Filtering Datasets.
Proceedings of the WSDM '22: The Fifteenth ACM International Conference on Web Search and Data Mining, Virtual Event / Tempe, AZ, USA, February 21, 2022

Towards Fair Federated Recommendation Learning: Characterizing the Inter-Dependence of System and Data Heterogeneity.
Proceedings of the RecSys '22: Sixteenth ACM Conference on Recommender Systems, Seattle, WA, USA, September 18, 2022

Infinite Recommendation Networks: A Data-Centric Approach.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022


PAPAYA: Practical, Private, and Scalable Federated Learning.
Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

Understanding data storage and ingestion for large-scale deep recommendation model training: industrial product.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

ACT: designing sustainable computer systems with an architectural carbon modeling tool.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Understanding the Power of Evolutionary Computation for GPU Code Optimization.
Proceedings of the IEEE International Symposium on Workload Characterization, 2022

FedGPO: Heterogeneity-Aware Global Parameter optimization for Efficient Federated Learning.
Proceedings of the IEEE International Symposium on Workload Characterization, 2022

Hercules: Heterogeneity-Aware Inference Serving for At-Scale Personalized Recommendation.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

A joint management middleware to improve training performance of deep recommendation systems with SSDs.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

RecShard: statistical feature-based memory optimization for industry-scale neural recommendation.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021
Exploiting Parallelism Opportunities with Deep Learning Frameworks.
ACM Trans. Archit. Code Optim., 2021

Dynamic Temperature Management of Near-Sensor Processing for Energy-Efficient High-Fidelity Imaging.
Sensors, 2021

The Vision Behind MLPerf: Understanding AI Inference Performance.
IEEE Micro, 2021

Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale.
IEEE Micro, 2021

SecNDP: Secure Near-Data Processing with Untrusted Memory.
IACR Cryptol. ePrint Arch., 2021

Sustainable AI: Environmental Implications, Challenges and Opportunities.
CoRR, 2021

Understanding and Co-designing the Data Ingestion Pipeline for Industry-Scale RecSys Training.
CoRR, 2021

Socio-Technological Challenges and Opportunities: Paths Forward.
CoRR, 2021

SVP-CF: Selection via Proxy for Collaborative Filtering Data.
CoRR, 2021

Energy-Efficient Mapping for a Network of DNN Models at the Edge.
Proceedings of the IEEE International Conference on Smart Computing, 2021

TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models.
Proceedings of the Fourth Conference on Machine Learning and Systems, 2021

Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery.
Proceedings of the Fourth Conference on Machine Learning and Systems, 2021

AutoFL: Enabling Heterogeneity-Aware Energy Efficient Federated Learning.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Understanding Capacity-Driven Scale-Out Neural Recommendation Inference.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

Understanding Training Efficiency of Deep Learning Recommendation Models at Scale.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

RecSSD: near data processing for solid state drive based recommendation inference.
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

2020
GEVO: GPU Code Optimization Using Evolutionary Computation.
ACM Trans. Archit. Code Optim., 2020

MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance.
IEEE Micro, 2020

CPR: Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery.
CoRR, 2020

AutoScale: Optimizing Energy Efficiency of End-to-End Edge Inference under Stochastic Variance.
CoRR, 2020

Developing a Recommendation Benchmark for MLPerf Training and Inference.
CoRR, 2020


AutoScale: Energy Efficiency Optimization for Stochastic Edge Inference Using Reinforcement Learning.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020


RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

Cross-Stack Workload Characterization of Deep Recommendation Systems.
Proceedings of the IEEE International Symposium on Workload Characterization, 2020

The Architectural Implications of Facebook's DNN-Based Personalized Recommendation.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

GEVO-ML: a proposal for optimizing ML code with evolutionary computation.
Proceedings of the GECCO '20: Genetic and Evolutionary Computation Conference, 2020

Emerging Neural Workloads and Their Impact on Hardware.
Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition, 2020

2019
Optimizing User Satisfaction of Mobile Workloads Subject to Various Sources of Uncertainties.
IEEE Trans. Mob. Comput., 2019

Configurable-ECC: Architecting a Flexible ECC Scheme to Support Different Sized Accesses in High Bandwidth Memory Systems.
IEEE Trans. Computers, 2019

MLPerf Training Benchmark.
CoRR, 2019

The Architectural Implications of Facebook's DNN-based Personalized Recommendation.
CoRR, 2019

Deep Learning Recommendation Model for Personalization and Recommendation Systems.
CoRR, 2019

Genetic improvement of GPU code.
Proceedings of the 6th International Workshop on Genetic Improvement, 2019

Machine Learning at Facebook: Understanding Inference at the Edge.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

Understanding the Future of Energy Efficiency in Multi-Module GPUs.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

2018
DORA: Optimizing Smartphone Energy Efficiency and Web Browser Performance under Interference.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2018

LATTE-CC: Latency Tolerance Aware Adaptive Cache Compression Management for Energy Efficient GPUs.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

2017
MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Understanding the thermal challenges of high-performance mobile devices with a detailed platform temperature model.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

Performance characterization, prediction, and optimization for heterogeneous systems with multi-level memory interference.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

2016
Using Low Cost Erasure and Error Correction Schemes to Improve Reliability of Commodity DRAM Systems.
IEEE Trans. Computers, 2016

RATT-ECC: Rate Adaptive Two-Tiered Error Correction Codes for Reliable 3D Die-Stacked Memory.
ACM Trans. Archit. Code Optim., 2016

ID-cache: instruction and memory divergence based cache management for GPUs.
Proceedings of the 2016 IEEE International Symposium on Workload Characterization, 2016

Ctrl-C: Instruction-Aware Control Loop Based Adaptive Cache Bypassing for GPUs.
Proceedings of the 34th IEEE International Conference on Computer Design, 2016

Improving smartphone user experience by balancing performance and energy with probabilistic QoS guarantee.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

2015
E-ECC: Low Power Erasure and Error Correction Schemes for Increasing Reliability of Commodity DRAM Systems.
Proceedings of the 2015 International Symposium on Memory Systems, 2015

A study of mobile device utilization.
Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015

CAWA: coordinated warp scheduling and cache prioritization for critical warp acceleration of GPGPU workloads.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Characterization and Throttling-Based Mitigation of Memory Interference for Heterogeneous Smartphones.
Proceedings of the 2015 IEEE International Symposium on Workload Characterization, 2015

2014
STEAM: A Smart Temperature and Energy Aware Multicore Controller.
ACM Trans. Embed. Comput. Syst., 2014

Architectural Thermal Energy Harvesting Opportunities for Sustainable Computing.
IEEE Comput. Archit. Lett., 2014

Characterizing the latency hiding ability of GPUs.
Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, 2014

Quantifying the energy cost of data movement for emerging smart phone workloads on mobile platforms.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

ReMAP: Reuse and memory access cost aware eviction policy for last level cache management.
Proceedings of the 32nd IEEE International Conference on Computer Design, 2014

Quantitative Analysis of Control Flow Checking Mechanisms for Soft Errors.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

CAWS: criticality-aware warp scheduling for GPGPU workloads.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
Performance, energy characterizations and architectural implications of an emerging mobile platform benchmark suite - MobileBench.
Proceedings of the IEEE International Symposium on Workload Characterization, 2013

2011
Adaptive timekeeping replacement: Fine-grained capacity management for shared CMP caches.
ACM Trans. Archit. Code Optim., 2011

PACMan: prefetch-aware cache management for high performance caching.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

SHiP: signature-based hit predictor for high performance caching.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Characterization and dynamic mitigation of intra-application cache interference.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011


  Loading...