Christopher J. Hughes

Abhimanyu Rajeshkumar Bambhaniya

Won Woo Ro

Hung-Wei Tseng

Proceedings of the International Conference for High Performance Computing, 2024

2023

VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs.

[BibT_eX]

[DOI]

Geonhwa Jeong

Sana Damani

Eric Qin

Sreenivas Subramoney

Hyesoon Kim

Tushar Krishna

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

2022

Graphite: optimizing graph neural networks on CPUs through cooperative software-hardware techniques.

[BibT_eX]

[DOI]

Yao Yao

Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

2021

SumMerge: an efficient algorithm and implementation for weight repetition-aware DNN inference.

[BibT_eX]

[DOI]

Rohan Baskar Prabhakar

Sachit Kuhar

Rohit Agrawal

Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU.

[BibT_eX]

[DOI]

Geonhwa Jeong

Eric Qin

Ananda Samajdar

Sreenivas Subramoney

Hyesoon Kim

Tushar Krishna

Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

2020

SAVE: Sparsity-Aware Vector Engine for Accelerating DNN Training and Inference on CPUs.

[BibT_eX]

[DOI]

Sara S. Baghsorkhi

Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

SuSy: A Programming Model for Productive Construction of High-Performance Systolic Arrays on FPGAs.

[BibT_eX]

[DOI]

Pradeep Dubey

Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020

SparseTrain: Leveraging Dynamic Sparsity in Software for Training DNNs on General-Purpose SIMD Processors.

[BibT_eX]

[DOI]

Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019

SparseTrain: Leveraging Dynamic Sparsity in Training DNNs on General-Purpose SIMD Processors.

[BibT_eX]

[DOI]

CoRR, 2019

T2S-Tensor: Productively Generating High-Performance Spatial Hardware for Dense Tensor Computations.

[BibT_eX]

[DOI]

Nitish Kumar Srivastava

Timothy G. Mattson

Pradeep Dubey

Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

Forgive-TM: Supporting Lazy Conflict Detection In Eager Hardware Transactional Memory.

[BibT_eX]

[DOI]

Sunjae Park

Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018

Dynamic fine-grained sparse memory accesses.

[BibT_eX]

[DOI]

Berkin Akin

Chiachen Chou

Jongsoo Park

Rajat Agarwal

Proceedings of the International Symposium on Memory Systems, 2018

Transactional pre-abort handlers in hardware transactional memory.

[BibT_eX]

[DOI]

Sunjae Park

Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017

Banshee: bandwidth-efficient DRAM caching via software/hardware cooperation.

[BibT_eX]

[DOI]

Xiangyao Yu

Nadathur Satish

Onur Mutlu

Srinivas Devadas

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

2016

PleaseTM: Enabling transaction conflict management in requester-wins hardware transactional memory.

[BibT_eX]

[DOI]

Sunjae Park

Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

2015

Single-Instruction Multiple-Data Execution

[BibT_eX]

[DOI]

Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01746-9, 2015

IMP: indirect memory prefetcher.

[BibT_eX]

[DOI]

Xiangyao Yu

Nadathur Satish

Srinivas Devadas

Proceedings of the 48th International Symposium on Microarchitecture, 2015

2013

Locality-aware task management for unstructured parallelism: a quantitative limit study.

[BibT_eX]

[DOI]

Richard M. Yoo

Changkyu Kim

Christos Kozyrakis

Proceedings of the 25th ACM Symposium on Parallelism in Algorithms and Architectures, 2013

Performance evaluation of Intel® transactional synchronization extensions for high-performance computing.

[BibT_eX]

[DOI]

Richard M. Yoo

Konrad Lai

Ravi Rajwar

Proceedings of the International Conference for High Performance Computing, 2013

Location-aware cache management for many-core processors with deep cache hierarchy.

[BibT_eX]

[DOI]

Jongsoo Park

Richard M. Yoo

Daya Shanker Khudia

Daehyun Kim

Proceedings of the International Conference for High Performance Computing, 2013

Exploring SIMD for Molecular Dynamics, Using Intel® Xeon® Processors and Intel® Xeon Phi Coprocessors.

[BibT_eX]

[DOI]

Simon J. Pennycook

Mikhail Smelyanskiy

Stephen A. Jarvis

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

2011

DeFT: Design space exploration for on-the-fly detection of coherence misses.

[BibT_eX]

[DOI]

Guru Venkataramani

Sanjeev Kumar

ACM Trans. Archit. Code Optim., 2011

Moguls: a model to explore the memory hierarchy for bandwidth improvements.

[BibT_eX]

[DOI]

Guangyu Sun

Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

LIME: a framework for debugging load imbalance in multi-threaded execution.

[BibT_eX]

[DOI]

Jungju Oh

Guru Venkataramani

Proceedings of the 33rd International Conference on Software Engineering, 2011

2010

Performance and Energy Implications of Many-Core Caches for Throughput Computing.

[BibT_eX]

[DOI]

Changkyu Kim

IEEE Micro, 2010

2009

Parallel scalability in speech recognition.

[BibT_eX]

[DOI]

Wonyong Sung

Kurt Keutzer

IEEE Signal Process. Mag., 2009

Scalable HMM based inference engine in large vocabulary continuous speech recognition.

[BibT_eX]

[DOI]

Wonyong Sung

Kurt Keutzer

Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

2008

Convergence of Recognition, Mining, and Synthesis Workloads and Its Implications.

[BibT_eX]

[DOI]

Jatin Chhugani

Pradeep Dubey

Proc. IEEE, 2008

Atomic Vector Operations on Chip Multiprocessors.

[BibT_eX]

[DOI]

Changkyu Kim

Victor W. Lee

Anthony D. Nguyen

Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

2007

Carbon: architectural support for fine-grained parallelism on chip multiprocessors.

[BibT_eX]

[DOI]

Sanjeev Kumar

Anthony D. Nguyen

Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

Physical simulation for animation and visual effects: parallelization and characterization for chip multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

Computer Vision on Multi-Core Processors: Articulated Body Tracking.

[BibT_eX]

[DOI]

Trista Pei-Chun Chen

Dmitry Budnikov

Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

2006

Hybrid transactional memory.

[BibT_eX]

[DOI]

Sanjeev Kumar

Michael Chu

Partha Kundu

Anthony D. Nguyen

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2006

Incremental approximate matrix factorization for speeding up support vector machines.

[BibT_eX]

[DOI]

Gang Wu

Edward Y. Chang

Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006

2005

Memory-side prefetching for linked data structures for processor-in-memory systems.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2005

2004

A Formal Approach to Frequent Energy Adaptations for Multimedia Applications.

[BibT_eX]

[DOI]

Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

2003

General -Purpose Processors for Multimedia Applications: Predictability and Energy Efficiency

[BibT_eX]

[DOI]

PhD thesis, 2003

2002

RSIM: Simulating Shared-Memory Multiprocessors with ILP Processors.

[BibT_eX]

[DOI]

Parthasarathy Ranganathan

Vijay S. Pai

Computer, 2002

Soft Real- Time Scheduling on Simultaneous Multithreaded Processors.

[BibT_eX]

[DOI]

Rohit Jain

Proceedings of the 23rd IEEE Real-Time Systems Symposium (RTSS'02), 2002

Joint local and global hardware adaptations for energy.

[BibT_eX]

[DOI]

Ruchira Sasanka

Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2002

2001

Saving energy with architectural and frequency adaptations for multimedia applications.

[BibT_eX]

[DOI]

Jayanth Srinivasan

Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001

Variability in the execution of multimedia applications and implications for architecture.

[BibT_eX]

[DOI]

Proceedings of the 28th Annual International Symposium on Computer Architecture, 2001

Speculative precomputation: long-range prefetching of delinquent loads.

[BibT_eX]

[DOI]

Jamison D. Collins

Hong Wang

Dean M. Tullsen