Tulika Mitra

Orcid: 0000-0003-4136-4188

Affiliations:
  • National University of Singapore, Singapore


According to our database1, Tulika Mitra authored at least 197 papers between 1997 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Chameleon: Dual Memory Replay for Online Continual Learning on Edge Devices.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., June, 2024

Flip: Data-centric Edge CGRA Accelerator.
ACM Trans. Design Autom. Electr. Syst., January, 2024

Condensed Sample-Guided Model Inversion for Knowledge Distillation.
CoRR, 2024

Generalizing Teacher Networks for Effective Knowledge Distillation Across Student Architectures.
CoRR, 2024

SparrowSNN: A Hardware/software Co-design for Energy Efficient ECG Classification.
CoRR, 2024

ICED: An Integrated CGRA Framework Enabling DVFS-Aware Acceleration.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

ASADI: Accelerating Sparse Attention Using Diagonal-based In-Situ Computing.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

PACE: A Scalable and Energy Efficient CGRA in a RISC-V SoC for Edge Computing Applications.
Proceedings of the 36th IEEE Hot Chips Symposium, 2024

Shedding the Bits: Pushing the Boundaries of Quantization with Minifloats on FPGAs.
Proceedings of the 34th International Conference on Field-Programmable Logic and Applications, 2024

CRISP: Hybrid Structured Sparsity for Class-Aware Model Pruning.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

SWAT: Scalable and Efficient Window Attention-based Transformers Acceleration on FPGAs.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

ZeD: A Generalized Accelerator for Variably Sparse Matrix Computations in ML.
Proceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques, 2024

2023
The 2022 International Conference on Computer-Aided Design (ICCAD).
IEEE Des. Test, April, 2023

Post-Training Quantization with Low-precision Minifloats and Integers on FPGAs.
CoRR, 2023

Accelerating Unstructured SpGEMM using Structured In-situ Computing.
CoRR, 2023

InkStream: Real-time GNN Inference on Streaming Graphs via Incremental Update.
CoRR, 2023

Accelerating Edge AI with Morpher: An Integrated Design, Compilation and Simulation Framework for CGRAs.
CoRR, 2023

FLEX: Introducing FLEXible Execution on CGRA with Spatio-Temporal Vector Dataflow.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

2022
HiMap: Fast and Scalable High-Quality Mapping on CGRA via Hierarchical Abstraction.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

ASCENT: Communication Scheduling for SDF on Bufferless Software-Defined NoC.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

ChordMap: Automated Mapping of Streaming Applications Onto CGRA.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

Preventing Catastrophic Forgetting and Distribution Mismatch in Knowledge Distillation via Synthetic Data.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

Load balancing for a user-level virtualized 5G cloud-RAN.
Proceedings of the MobiArch '22: Proceedings of the 17th ACM Workshop on Mobility in the Evolving Internet Architecture, 2022

Power-Performance Characterization of TinyML Systems.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022

LISA: Graph Neural Network based Portable Mapping on Spatial Accelerators.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

GraphWave: A Highly-Parallel Compute-at-Memory Graph Processing Accelerator.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

PANORAMA: divide-and-conquer approach for mapping complex loop kernels on CGRA.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

REVAMP: a systematic framework for heterogeneous CGRA realization.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

Robust and Resource-Efficient Data-Free Knowledge Distillation by Generative Pseudo Replay.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
oo7: Low-Overhead Defense Against Spectre Attacks via Program Analysis.
IEEE Trans. Software Eng., 2021

Editorial: Reimagining ACM Transactions on Embedded Computing Systems (TECS).
ACM Trans. Embed. Comput. Syst., 2021

Power-Efficient Heterogeneous Many-Core Design With NCFET Technology.
IEEE Trans. Computers, 2021

Neural Network-Based Performance Prediction for Task Migration on S-NUCA Many-Cores.
IEEE Trans. Computers, 2021

Report on the 2020 Embedded Systems Week (ESWEEK): A Virtual Event during a Pandemic, September 20-25.
IEEE Des. Test, 2021

FSA: fronthaul slicing architecture for 5G using dataplane programmable switches.
Proceedings of the ACM MobiCom '21: The 27th Annual International Conference on Mobile Computing and Networking, 2021

2020
KLEESpectre: Detecting Information Leakage through Speculative Cache Attacks via Symbolic Execution.
ACM Trans. Softw. Eng. Methodol., 2020

SPECTRUM: A Software-defined Predictable Many-core Architecture for LTE/5G Baseband Processing.
ACM Trans. Embed. Comput. Syst., 2020

High-Throughput CNN Inference on Embedded ARM Big.LITTLE Multicore Processors.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Neural Network Inference on Mobile SoCs.
IEEE Des. Test, 2020

Survey on Education for Cyber-Physical Systems.
IEEE Des. Test, 2020

Guest Editors' Introduction: Selected Papers from IEEE VLSI Test Symposium.
IEEE Des. Test, 2020

ESWEEK 2019 Conference Report.
IEEE Des. Test, 2020

IsoRAN: Isolation and Scaling for 5G RANvia User-Level Data Plane Virtualization.
CoRR, 2020

Mobile Application Processors: Techniques for Software Power-Performance Optimization.
IEEE Consumer Electron. Mag., 2020

Simultaneous Progressing Switching Protocols for Timing Predictable Real-Time Network-on-Chips.
Proceedings of the 26th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 2020

Poster: IsoRAN: Isolation and Scaling for 5G RAN via User-Level Data Plane Virtualization.
Proceedings of the 2020 IFIP Networking Conference, 2020

Time-Predictable Software-Defined Architecture with Sdf-Based Compiler Flow for 5g Baseband Processing.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Unified Thread- and Data-Mapping for Multi-Threaded Multi-Phase Applications on SPM Many-Cores.
Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition, 2020

BrezeFlow: Unified Debugger for Android CPU Power Governors and Schedulers on Edge Devices.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Slicing 5G fronthaul networks using programmable switches.
Proceedings of the CoNEXT '20: The 16th International Conference on emerging Networking EXperiments and Technologies, 2020

2019
Synergy: An HW/SW Framework for High Throughput CNNs on Embedded Heterogeneous SoC.
ACM Trans. Embed. Comput. Syst., 2019

CASCADE: High Throughput Data Streaming via Decoupled Access-Execute CGRA.
ACM Trans. Embed. Comput. Syst., 2019

Scratchpad-Memory Management for Multi-Threaded Applications on Many-Core Architectures.
ACM Trans. Embed. Comput. Syst., 2019

OPTiC: Optimizing Collaborative CPU-GPU Computing on Mobile Devices With Thermal Constraints.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

High-Throughput CNN Inference on Embedded ARM big.LITTLE Multi-Core Processors.
CoRR, 2019

Scalable Optimal Greedy Scheduler for Asymmetric Multi-/Many-Core Processors.
Proceedings of the Embedded Computer Systems: Architectures, Modeling, and Simulation, 2019

SPECTRUM: a software defined predictable many-core architecture for LTE baseband processing.
Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, 2019

4D-CGRA: Introducing Branch Dimension to Spatio-Temporal Application Mapping on CGRAs.
Proceedings of the International Conference on Computer-Aided Design, 2019

Prediction-Based Task Migration on S-NUCA Many-Cores.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2019

Time-Predictable Computing by Design: Looking Back, Looking Forward.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

HyCUBE: A 0.9V 26.4 MOPS/mW, 290 pJ/op, Power Efficient Accelerator for IoT Applications.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2019

2018
LOCUS: Low-Power Customizable Many-Core Architecture for Wearables.
ACM Trans. Embed. Comput. Syst., 2018

Guest Editors' Introduction: Special Issue on Time-Critical Systems Design Part II.
IEEE Des. Test, 2018

Time-Critical Systems Design: A Survey.
IEEE Des. Test, 2018

Guest Editors' Introduction: Special Issue on Time-Critical Systems Design.
IEEE Des. Test, 2018

oo7: Low-overhead Defense against Spectre Attacks via Binary Analysis.
CoRR, 2018

Synergy: A HW/SW Framework for High Throughput CNNs on Embedded Heterogeneous SoC.
CoRR, 2018

Scalable Dynamic Task Scheduling on Adaptive Many-Core.
Proceedings of the 12th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2018

Software Support for Heterogeneous Computing.
Proceedings of the 2018 IEEE Computer Society Annual Symposium on VLSI, 2018

Stitch: Fusible Heterogeneous Accelerators Enmeshed with Many-Core Architecture for Wearables.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

PR<sup>3</sup>: Power Efficient and Low Latency Baseband Processing for LTE Femtocells.
Proceedings of the 2018 IEEE Conference on Computer Communications, 2018

QoS-aware stochastic power management for many-cores.
Proceedings of the 55th Annual Design Automation Conference, 2018

Dnestmap: mapping deeply-nested loops on ultra-low power CGRAs.
Proceedings of the 55th Annual Design Automation Conference, 2018

Analytical Two-Level Near Threshold Cache Exploration for Low Power Biomedical Applications.
Proceedings of the Advanced Computer Architecture - 12th Conference, 2018

2017
Application-Specific Processors.
Proceedings of the Handbook of Hardware/Software Codesign., 2017

Introduction to Hardware/Software Codesign.
Proceedings of the Handbook of Hardware/Software Codesign., 2017

TC-Release++: An Efficient Timestamp-Based Coherence Protocol for Many-Core Architectures.
IEEE Trans. Parallel Distributed Syst., 2017

CGPredict: Embedded GPU Performance Estimation from Single-Threaded Applications.
ACM Trans. Embed. Comput. Syst., 2017

Optimal Greedy Algorithm for Many-Core Scheduling.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2017

Defragmentation of Tasks in Many-Core Architecture.
ACM Trans. Archit. Code Optim., 2017

A Rapid Data Communication Exploration Tool for Hybrid CPU-FPGA Architectures.
Proceedings of the 25th Euromicro International Conference on Parallel, 2017

Mobile heterogeneous computing: a software perspective.
Proceedings of the 15th IEEE/ACM Symposium on Embedded Systems for Real-Time Multimedia, 2017

Design Space exploration of FPGA-based accelerators with multi-level parallelism.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017

Scalable probabilistic power budgeting for many-cores.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017

HyCUBE: A CGRA with Reconfigurable Single-cycle Multi-hop Interconnect.
Proceedings of the 54th Annual Design Automation Conference, 2017

2016
Design of Multiple-Target Tracking System on Heterogeneous System-on-Chip Devices.
IEEE Trans. Veh. Technol., 2016

Adaptive Isolation for Predictability and Security (Dagstuhl Seminar 16441).
Dagstuhl Reports, 2016

Automated partitioning of android applications for trusted execution environments.
Proceedings of the 38th International Conference on Software Engineering, 2016

Efficient Timestamp-Based Cache Coherence Protocol for Many-Core Architectures.
Proceedings of the 2016 International Conference on Supercomputing, 2016

Combined on-line lifetime-energy optimization for asymmetric multicores.
Proceedings of the 2016 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, 2016

Distributed fair scheduling for many-cores.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

Lin-analyzer: a high-level performance analysis tool for FPGA-based accelerators.
Proceedings of the 53rd Annual Design Automation Conference, 2016

Improving mobile gaming performance through cooperative CPU-GPU thermal management.
Proceedings of the 53rd Annual Design Automation Conference, 2016

Distributed scheduling for many-cores using cooperative game theory.
Proceedings of the 53rd Annual Design Automation Conference, 2016

2015
Instruction Cache Locking Using Temporal Reuse Profile.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2015

Heterogeneous Multi-core Architectures.
IPSJ Trans. Syst. LSI Des. Methodol., 2015

Energy-efficient execution of data-parallel applications on heterogeneous mobile platforms.
Proceedings of the 33rd IEEE International Conference on Computer Design, 2015

SelectDirectory: a selective directory for cache coherence in many-core architectures.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

Power-Performance Modelling of Mobile Gaming Workloads on Heterogeneous MPSoCs.
Proceedings of the 52nd Annual Design Automation Conference, 2015

Improving GPGPU energy-efficiency through concurrent kernel execution and DVFS.
Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2015

Approximation-aware scheduling on heterogeneous multi-core architectures.
Proceedings of the 20th Asia and South Pacific Design Automation Conference, 2015

2014
Graph Minor Approach for Application Mapping on CGRAs.
ACM Trans. Reconfigurable Technol. Syst., 2014

Task Scheduling on Adaptive Multi-Core.
IEEE Trans. Computers, 2014

Energy-efficient computing with heterogeneous multi-cores.
Proceedings of the 2014 International Symposium on Integrated Circuits (ISIC), 2014

Design space exploration of multiple loops on FPGAs using high level synthesis.
Proceedings of the 32nd IEEE International Conference on Computer Design, 2014

WCET-Centric dynamic instruction cache locking.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

Integrated CPU-GPU Power Management for 3D Mobile Games.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

Dark silicon as a challenge for hardware/software co-design.
Proceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis, 2014

Price theory based power management for heterogeneous multi-cores.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

2013
An analytical approach for fast and accurate design space exploration of instruction caches.
ACM Trans. Embed. Comput. Syst., 2013

Introduction to the special issue on application-specific processors.
ACM Trans. Embed. Comput. Syst., 2013

Lifetime Reliability Aware Architectural Adaptation.
Proceedings of the 26th International Conference on VLSI Design and 12th International Conference on Embedded Systems, 2013

Implementation of core coalition on FPGAs.
Proceedings of the 21st IEEE/IFIP International Conference on VLSI and System-on-Chip, 2013

Energy-aware synthesis of application specific MPSoCs.
Proceedings of the 2013 IEEE 31st International Conference on Computer Design, 2013

A just-in-time customizable processor.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2013

Correction to "Graph Minor Approach for Application Mapping on CGRAs".
Proceedings of the 2013 International Conference on Field-Programmable Technology, 2013

Hierarchical power management for asymmetric multi-core in dark silicon era.
Proceedings of the 50th Annual Design Automation Conference 2013, 2013

Integrated instruction cache analysis and locking in multitasking real-time systems.
Proceedings of the 50th Annual Design Automation Conference 2013, 2013

Power-performance modeling on asymmetric multi-cores.
Proceedings of the International Conference on Compilers, 2013

Shared cache aware task mapping for WCRT minimization.
Proceedings of the 18th Asia and South Pacific Design Automation Conference, 2013

2012
Bahurupi: A polymorphic heterogeneous multi-core architecture.
ACM Trans. Archit. Code Optim., 2012

Timing analysis of concurrent programs running on shared cache multi-cores.
Real Time Syst., 2012

Online scheduling for multi-core shared reconfigurable fabric.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

WCET-centric partial instruction cache locking.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

2011
Customized MPSoC synthesis for task sequence.
Proceedings of the IEEE 9th Symposium on Application Specific Processors, 2011

A novel online hardware task scheduling and placement algorithm for 3D partially reconfigurable FPGAs.
Proceedings of the 2011 International Conference on Field-Programmable Technology, 2011

Shared reconfigurable fabric for multi-core customization.
Proceedings of the 48th Design Automation Conference, 2011

2010
Scratchpad allocation for concurrent embedded software.
ACM Trans. Program. Lang. Syst., 2010

Modeling shared cache and bus in multi-cores for timing analysis.
Proceedings of the 13th International Workshop on Software and Compilers for Embedded Systems, 2010

Design space exploration of instruction set customizable MPSoCs for multimedia applications.
Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, 2010

Efficient custom instructions generation for system-level design.
Proceedings of the International Conference on Field-Programmable Technology, 2010

Instruction cache locking using temporal reuse profile.
Proceedings of the 47th Design Automation Conference, 2010

Improved procedure placement for set associative caches.
Proceedings of the 2010 International Conference on Compilers, 2010

2009
Cache-aware timing analysis of streaming applications.
Real Time Syst., 2009

Temperature Aware Scheduling for Embedded Processors.
J. Low Power Electron., 2009

Cache-aware optimization of BAN applications.
Des. Autom. Embed. Syst., 2009

An efficient framework for dynamic reconfiguration of instruction-set customization.
Des. Autom. Embed. Syst., 2009

Runtime Adaptive Extensible Embedded Processors - A Survey.
Proceedings of the Embedded Computer Systems: Architectures, 2009

Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores.
Proceedings of the 30th IEEE Real-Time Systems Symposium, 2009

A hybrid local-global approach for multi-core thermal management.
Proceedings of the 2009 International Conference on Computer-Aided Design, 2009

Probabilistic modeling of data cache behavior.
Proceedings of the 9th ACM & IEEE International conference on Embedded software, 2009

Runtime reconfiguration of custom instructions for real-time embedded systems.
Proceedings of the Design, Automation and Test in Europe, 2009

Dynamic thermal management via architectural adaptation.
Proceedings of the 46th Design Automation Conference, 2009

A DVS-based pipelined reconfigurable instruction memory.
Proceedings of the 46th Design Automation Conference, 2009

Generating test programs to cover pipeline interactions.
Proceedings of the 46th Design Automation Conference, 2009

Evaluating design trade-offs in customizable processors.
Proceedings of the 46th Design Automation Conference, 2009

2008
The worst-case execution-time problem - overview of methods and survey of tools.
ACM Trans. Embed. Comput. Syst., 2008

Temperature aware task sequencing and voltage scaling.
Proceedings of the 2008 International Conference on Computer-Aided Design, 2008

Defining neighborhood relations for fast spatial-temporal partitioning of applications on reconfigurable architectures.
Proceedings of the 2008 International Conference on Field-Programmable Technology, 2008

Processor customization for wearable bio-monitoring platforms.
Proceedings of the 2008 International Conference on Field-Programmable Technology, 2008

Exploring locking & partitioning for predictable shared caches on multi-cores.
Proceedings of the 45th Design Automation Conference, 2008

Cache modeling in probabilistic execution time analysis.
Proceedings of the 45th Design Automation Conference, 2008

Static analysis for fast and accurate design space exploration of caches.
Proceedings of the 6th International Conference on Hardware/Software Codesign and System Synthesis, 2008

2007
Chronos: A timing analyzer for embedded software.
Sci. Comput. Program., 2007

Timing Analysis of Body Area Network Applications.
Proceedings of the 7th Intl. Workshop on Worst-Case Execution Time (WCET) Analysis, 2007

Disjoint Pattern Enumeration for Custom Instructions Identification.
Proceedings of the FPL 2007, 2007

Cache-Aware Timing Analysis of Streaming Applications.
Proceedings of the 19th Euromicro Conference on Real-Time Systems, 2007

Instruction-set customization for real-time embedded systems.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

A Retargetable Software Timing Analyzer Using Architecture Description Language.
Proceedings of the 12th Conference on Asia South Pacific Design Automation, 2007

Worst-Case Execution Time and Energy Analysis.
Proceedings of the Compiler Design Handbook: Optimizations and Machine Code Generation, 2007

2006
Modeling out-of-order processors for WCET analysis.
Real Time Syst., 2006

Handling Constraints in Multi-Objective GA for Embedded System Design.
Proceedings of the 19th International Conference on VLSI Design (VLSI Design 2006), 2006

Estimating the Worst-Case Energy Consumption of Embedded Software.
Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2006), 2006

Efficient detection and exploitation of infeasible paths for software timing analysis.
Proceedings of the 43rd Design Automation Conference, 2006

Exploiting forwarding to improve data bandwidth of instruction-set extensions.
Proceedings of the 43rd Design Automation Conference, 2006

Integrated scratchpad memory optimization and task scheduling for MPSoC architectures.
Proceedings of the 2006 International Conference on Compilers, 2006

2005
Modeling Control Speculation for Timing Analysis.
Real Time Syst., 2005

Exploiting Branch Constraints without Exhaustive Path Enumeration.
Proceedings of the 5th Intl. Workshop on Worst-Case Execution Time (WCET) Analysis, 2005

WCET Centric Data Allocation to Scratchpad Memory.
Proceedings of the 26th IEEE Real-Time Systems Symposium (RTSS 2005), 2005

Analyzing Loop Paths for Execution Time Estimation.
Proceedings of the Distributed Computing and Internet Technology, 2005

Satisfying real-time constraints with custom instructions.
Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2005

2004
Modeling Out-of-Order Processors for Software Timing Analysis.
Proceedings of the 25th IEEE Real-Time Systems Symposium (RTSS 2004), 2004

Design space exploration of caches using compressed traces.
Proceedings of the 18th Annual International Conference on Supercomputing, 2004

Configuration bitstream compression for dynamically reconfigurable FPGAs.
Proceedings of the 2004 International Conference on Computer-Aided Design, 2004

Characterizing embedded applications for instruction-set extensible processors.
Proceedings of the 41th Design Automation Conference, 2004

Scalable custom instructions identification for instruction-set extensible processors.
Proceedings of the 2004 International Conference on Compilers, 2004

Impact of Java Memory Model on Out-of-Order Multiprocessors.
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT 2004), 29 September, 2004

2003
Compactly representing parallel program executions.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2003

A Model for Hardware Realization of Kernel Loops.
Proceedings of the Field Programmable Logic and Application, 13th International Conference, 2003

Compression-Domain Editing of 3D Models.
Proceedings of the 2003 Data Compression Conference (DCC 2003), 2003

Using Formal Techniques to Debug the AMBA System-on-Chip Bus Protocol.
Proceedings of the 2003 Design, 2003

Accurate timing analysis by modeling caches, speculation and their interaction.
Proceedings of the 40th Design Automation Conference, 2003

Accurate estimation of cache-related preemption delay.
Proceedings of the 1st IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2003

2002
A Decoupled Architecture for Application-Specific File Prefetching.
Proceedings of the FREENIX Track: 2002 USENIX Annual Technical Conference, 2002

Timing Analysis of Embedded Software for Speculative Processors.
Proceedings of the 15th International Symposium on System Synthesis (ISSS 2002), 2002

Compression-Domain Parallel Rendering.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Specifying multithreaded Java semantics for program verification.
Proceedings of the 24th International Conference on Software Engineering, 2002

A co-simulation study of adaptive EPIC computing.
Proceedings of the 2002 IEEE International Conference on Field-Programmable Technology, 2002

An FPGA Implementation of Triangle Mesh Decompression.
Proceedings of the 10th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2002), 2002

2000
Zodiac: A history-based interactive video authoring system.
Multim. Syst., 2000

On-the-Fly rendering of losslessly compressed irregular volume data.
Proceedings of the 11th IEEE Visualization Conference, 2000

Application-Specific File Prefetching for Multimedia Programs.
Proceedings of the 2000 IEEE International Conference on Multimedia and Expo, 2000

1999
Dynamic 3D Graphics Workload Characterization and the Architectural Implications.
Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture, 1999

Dynamic Vectorization: A Mechanism for Exploiting Far-Flung ILP in Ordinary Programs.
Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999

1998
Implementation and Evaluation of the Parallel Mesa Library.
Proceedings of the International Conference on Parallel and Distributed Systems, 1998

A Breadth-First Approach To Efficient Mesh Traversal.
Proceedings of the 1998 ACM SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware, Lisbon, Portugal, August 31, 1998

1997
Improving Superscalar Instruction Dispatch and Issue by Exploiting Dynamic Code Sequences.
Proceedings of the 24th International Symposium on Computer Architecture, 1997


  Loading...