2025
BeaverTalk: Oregon State University's IWSLT 2025 Simultaneous Speech Translation System.
CoRR, May, 2025
FlashKAT: Understanding and Addressing Performance Bottlenecks in the Kolmogorov-Arnold Transformer.
CoRR, May, 2025
Towards Universal Semantics With Large Language Models.
CoRR, May, 2025
ML For Hardware Design Interpretability: Challenges and Opportunities.
CoRR, April, 2025
Hessian-aware Training for Enhancing DNNs Resilience to Parameter Corruptions.
CoRR, April, 2025
MatrixKAN: Parallelized Kolmogorov-Arnold Network.
CoRR, February, 2025
2024
Monte Carlo / Dynamic Code (MC/DC): An accelerated Python package for fully transient neutron transport and rapid methods development.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
J. Open Source Softw., April, 2024
Neurons for Neutrons: A Transformer Model for Computation Load Estimation on Domain-Decomposed Neutron Transport Problems.
CoRR, 2024
LLM-Ref: Enhancing Reference Handling in Technical Writing with Large Language Models.
CoRR, 2024
Swift: High-Performance Sparse Tensor Contraction for Scientific Applications.
CoRR, 2024
LLM-RankFusion: Mitigating Intrinsic Inconsistency in LLM-based Ranking.
CoRR, 2024
FLAASH: Flexible Accelerator Architecture for Sparse High-Order Tensor Contraction.
CoRR, 2024
LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned Proportions.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Simultaneous Masking, Not Prompting Optimization: A Paradigm Shift in Fine-tuning LLMs for Simultaneous Translation.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
2023
RAPID: Enabling fast online policy learning in dynamic public cloud environments.
Neurocomputing, November, 2023
PROMPT: Learning dynamic resource allocation policies for network applications.
Future Gener. Comput. Syst., August, 2023
Partitioning-Guided K-Means: Extreme Empty Cluster Resolution for Extreme Model Compression.
CoRR, 2023
Improving Autoregressive NLP Tasks via Modular Linearized Attention.
Proceedings of the Machine Learning and Knowledge Discovery in Databases: Research Track, 2023
Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech Translation.
Proceedings of the International Conference on Machine Learning, 2023
Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
2022
Polymorphic Accelerators for Deep Neural Networks.
IEEE Trans. Computers, 2022
Improving Methodology for Tropical Cyclone Seasonal Forecasting in the Australian and the South Pacific Ocean Regions by Selecting and Averaging Models via Metropolis-Gibbs Sampling.
Remote. Sens., 2022
PROMPT: Learning Dynamic Resource Allocation Policies for Edge-Network Applications.
CoRR, 2022
Chapter Four - Routerless networks-on-chip.
Adv. Comput., 2022
2020
AI for Computer Architecture: Principles, Practice, and Prospects
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01770-4, 2020
UVMBench: A Comprehensive Benchmark Suite for Researching Unified Virtual Memory in GPUs.
CoRR, 2020
The gem5 Simulator: Version 20.0+.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2020
Accelerated Reply Injection for Removing NoC Bottleneck in GPGPUs.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020
A Deep Reinforcement Learning Framework for Architectural Exploration: A Routerless NoC Case Study.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020
EquiNox: Equivalent NoC Injection Routers for Silicon Interposer-Based Throughput Processors.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020
2019
A Survey of Machine Learning Applied to Computer Architecture Design.
CoRR, 2019
Optimizing Routerless Network-on-Chip Designs: An Innovative Learning-Based Framework.
CoRR, 2019
Design Space Exploration of Memory Controller Placement in Throughput Processors with Deep Learning.
IEEE Comput. Archit. Lett., 2019
On Trade-off Between Static and Dynamic Power Consumption in NoC Power Gating.
Proceedings of the 2019 IEEE/ACM International Symposium on Low Power Electronics and Design, 2019
Dynamically linked MSHRs for adaptive miss handling in GPUs.
Proceedings of the ACM International Conference on Supercomputing, 2019
Express Link Placement for NoC-Based Many-Core Platforms.
Proceedings of the 48th International Conference on Parallel Processing, 2019
Characterizing On-Chip Traffic Patterns in General-Purpose GPUs: A Deep Learning Approach.
Proceedings of the 37th IEEE International Conference on Computer Design, 2019
Shortcut Mining: Exploiting Cross-Layer Shortcut Reuse in DCNN Accelerators.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019
2018
Tolerating Soft Errors in Deep Learning Accelerators with Reliable On-Chip Memory Designs.
Proceedings of the 2018 IEEE International Conference on Networking, 2018
CART: Cache Access Reordering Tree for Efficient Cache and Memory Accesses in GPUs.
Proceedings of the 36th IEEE International Conference on Computer Design, 2018
Routerless Network-on-Chip.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018
2017
CALM: Contention-Aware Latency-Minimal Application Mapping for Flattened Butterfly On-Chip Networks.
ACM Trans. Design Autom. Electr. Syst., 2017
XPro: A Cross-End Processing Architecture for Data Analytics in Wearables.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017
2016
Providing Balanced Mapping for Multiple Applications in Many-Core Chip Multiprocessors.
IEEE Trans. Computers, 2016
A Filtering Mechanism to Reduce Network Bandwidth Utilization of Transaction Execution.
ACM Trans. Archit. Code Optim., 2016
Simulation of NoC power-gating: Requirements, optimizations, and the Agate simulator.
J. Parallel Distributed Comput., 2016
Maximizing the performance of NoC-based MPSoCs under total power and power density constraints.
Proceedings of the 17th International Symposium on Quality Electronic Design, 2016
2015
Power punch: Towards non-blocking power-gating of NoC routers.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015
TAPP: temperature-aware application mapping for NoC-based many-core processors.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015
2014
Futility Scaling: High-Associativity Cache Partitioning.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014
Smart butterfly: reducing static power dissipation of network-on-chip with core-state-awareness.
Proceedings of the International Symposium on Low Power Electronics and Design, 2014
Balancing On-Chip Network Latency in Multi-application Mapping for Chip-Multiprocessors.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
Mitigating the Mismatch between the Coherence Protocol and Conflict Detection in Hardware Transactional Memory.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
MP3: Minimizing performance penalty for power-gating of Clos network-on-chip.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014
Application mapping for express channel-based networks-on-chip.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014
2013
An Analytical Performance Model for Partitioning Off-Chip Memory Bandwidth.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013
RAIR: Interference Reduction in Regionalized Networks-on-Chip.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013
Bubble coloring: avoiding routing- and protocol-induced deadlocks with minimal virtual channel requirement.
Proceedings of the International Conference on Supercomputing, 2013
In-network traffic regulation for Transactional Memory.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013
Worm-Bubble Flow Control.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013
2012
Efficient implementation of globally-aware network flow control.
J. Parallel Distributed Comput., 2012
NoRD: Node-Router Decoupling for Effective Power-gating of On-Chip Routers.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012
2011
Critical Bubble Scheme: An Efficient Implementation of Globally Aware Network Flow Control.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
2004
A comparative study for solution methods of a multicomponent distillation model.
Proceedings of the IEEE International Conference on Systems, 2004