Venkatesh Akella

Orcid: 0000-0003-3014-5326

Affiliations:
  • University of California, Davis, USA


According to our database1, Venkatesh Akella authored at least 87 papers between 1989 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
TEGRA - Scaling Up Terascale Graph Processing with Disaggregated Computing.
CoRR, 2024

CachedArrays: Optimizing Data Movement for Heterogeneous Memory Systems.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

2023
Efficient Large Scale DLRM Implementation on Heterogeneous Memory Systems.
Proceedings of the High Performance Computing - 38th International Conference, 2023

Scalable Hardware Acceleration of Graph Processing with Photonic Interconnects.
Proceedings of the International Conference on Photonics in Switching and Computing, 2023

2022
A Model for Scalable and Balanced Accelerators for Graph Processing.
IEEE Comput. Archit. Lett., 2022

LLM: Realizing Low-Latency Memory by Exploiting Embedded Silicon Photonics for Irregular Workloads.
Proceedings of the High Performance Computing - 37th International Conference, 2022

SoK: Limitations of Confidential Computing via TEEs for High-Performance Compute Systems.
Proceedings of the 2022 IEEE International Symposium on Secure and Private Execution Environment Design (SEED), 2022

2021
HTA: A Scalable High-Throughput Accelerator for Irregular HPC Workloads.
Proceedings of the High Performance Computing - 36th International Conference, 2021

A Case Against Hardware Managed DRAM Caches for NVRAM Based Systems.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

Performance Analysis of Scientific Computing Workloads on General Purpose TEEs.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

2020
FPGA and GPU-based acceleration of ML workloads on Amazon cloud - A case study using gradient boosted decision tree library.
Integr., 2020

Performance Analysis of Scientific Computing Workloads on Trusted Execution Environments.
CoRR, 2020

Predicting soil permanganate oxidizable carbon (POXC) by coupling DRIFT spectroscopy and artificial neural networks (ANN).
Comput. Electron. Agric., 2020

HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems.
Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

AutoTM: Automatic Tensor Movement in Heterogeneous Memory Systems using Integer Linear Programming.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019
Multiplier-Free Implementation of Galois Field Fourier Transform on a FPGA.
IEEE Trans. Circuits Syst. II Express Briefs, 2019

2018
A case for exposing extra-architectural state in the ISA: position paper.
Proceedings of the 7th International Workshop on Hardware and Architectural Support for Security and Privacy, 2018

Improving Provisioned Power Efficiency in HPC Systems with GPU-CAPP.
Proceedings of the 25th IEEE International Conference on High Performance Computing, 2018

Scalable Hardware Accelerator for Mini-Batch Gradient Descent.
Proceedings of the 2018 on Great Lakes Symposium on VLSI, 2018

2017
Improving Execution Time of Parallel Programs on Large Scale Chip Multiprocessors with Constant Average Power Processing.
Proceedings of the 2017 IEEE International Conference on Computer Design, 2017

Design and Evaluation of AWGR-Based Photonic NoC Architectures for 2.5D Integrated High Performance Computing Systems.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

2016
Photonic Interconnects for Interposer-based 2.5D/3D Integrated Systems on a Chip.
Proceedings of the Second International Symposium on Memory Systems, 2016

HogWild++: A New Mechanism for Decentralized Asynchronous Stochastic Gradient Descent.
Proceedings of the IEEE 16th International Conference on Data Mining, 2016

Design space exploration of FPGA-based Deep Convolutional Neural Networks.
Proceedings of the 21st Asia and South Pacific Design Automation Conference, 2016

2014
Simultaneously Reducing Latency and Power Consumption in OpenFlow Switches.
IEEE/ACM Trans. Netw., 2014

PDG_GEN: A Methodology for Fast and Accurate Simulation of On-Chip Networks.
IEEE Trans. Computers, 2014

Runtime Adaptation of Applications Using Design Of Experiments: A Smartphone-Based Case Study.
IEEE Embed. Syst. Lett., 2014

2013
MELOADES: Methodology for long-term online adaptation of embedded software for heterogeneous devices.
J. Syst. Archit., 2013

Scalability and performance of a distributed AWGR-based all-optical token interconnect architecture.
Proceedings of the 2013 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference (OFC/NFOEC), 2013

Update rate tradeoffs for improving online power modeling in smartphones.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

2012
Efficient Configurable Decoder Architecture for Nonbinary Quasi-Cyclic LDPC Codes.
IEEE Trans. Circuits Syst. I Regul. Pap., 2012

AWGR-Based Optical Topologies for Scalable and Efficient Global Communications in Large-Scale Multi-Processor Systems.
JOCN, 2012

DCOF - An Arbitration Free Directly Connected Optical Fabric.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2012

DCAF - A Directly Connected Arbitration-Free Photonic Crossbar for Energy-Efficient High Performance Computing.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

2011
Exploiting data-level parallelism for energy-efficient implementation of LDPC decoders and DCT on an FPGA.
ACM Trans. Reconfigurable Technol. Syst., 2011

Hardware Implementation of a Backtracking-Based Reconfigurable Decoder for Lowering the Error Floor of Quasi-Cyclic LDPC Codes.
IEEE Trans. Circuits Syst. I Regul. Pap., 2011

Memory System Optimization for FPGA-Based Implementation of Quasi-Cyclic LDPC Codes Decoders.
IEEE Trans. Circuits Syst. I Regul. Pap., 2011

Buffering and Flow Control in Optical Switches for High Performance Computing.
JOCN, 2011

Inferring packet dependencies to improve trace based simulation of on-chip networks.
Proceedings of the NOCS 2011, 2011

Resilient microring resonator based photonic networks.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Addressing system-level trimming issues in on-chip nanophotonic networks.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

2010
QSN - A Simple Circular-Shift Network for Reconfigurable Quasi-Cyclic LDPC Decoders.
IEEE Trans. Circuits Syst. II Express Briefs, 2010

Optical Router Control Architecture and Contention Resolution Algorithms Capable of Asynchronous, Variable-Length Packet Switching.
JOCN, 2010

Markov decision process (MDP) framework for software power optimization using call profiles on mobile phones.
Des. Autom. Embed. Syst., 2010

Performance Evaluation of a Multicore System with Optically Connected Memory Modules.
Proceedings of the NOCS 2010, 2010

DOS: a scalable optical switch for datacenters.
Proceedings of the 2010 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2010

2009
Markov decision process (MDP) framework for optimizing software on mobile phones.
Proceedings of the 9th ACM & IEEE International conference on Embedded software, 2009

Accelerating FPGA-based emulation of quasi-cyclic LDPC codes with vector processing.
Proceedings of the Design, Automation and Test in Europe, 2009

FPGA-based low-complexity high-throughput tri-mode decoder for quasi-cyclic LDPC codes.
Proceedings of the 47th Annual Allerton Conference on Communication, 2009

2008
Design and evaluation of an optical CPU-DRAM interconnect.
Proceedings of the 26th International Conference on Computer Design, 2008

OCDIMM: Scaling the DRAM Memory Wall Using WDM Based Optical Interconnects.
Proceedings of the 16th Annual IEEE Symposium on High Performance Interconnects (HOTI 2008), 2008

Credit-based dynamic reliability management using online wearout detection.
Proceedings of the 5th Conference on Computing Frontiers, 2008

2007
Using Application Bisection Bandwidth to Guide Tile Size Selection for the Synchroscalar Tile-Based Architecture.
Trans. High Perform. Embed. Archit. Compil., 2007

Life Cycle Aware Computing: Reusing Silicon Technology.
Computer, 2007

2006
Synchroscalar: Evaluation of an embedded, multi-core architecture for media applications.
J. Embed. Comput., 2006

Segmented Bitline Cache: Exploiting Non-uniform Memory Access Patterns.
Proceedings of the High Performance Computing, 2006

Tile size selection for low-power tile-based architectures.
Proceedings of the Third Conference on Computing Frontiers, 2006

2005
Proactive Energy Optimization Algorithms for Wavelet-Based Video Codecs on Power-Aware Processors.
Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, 2005

Scheduling optical packets in wavelength, time, and space domains for all-optical packet switching routers.
Proceedings of IEEE International Conference on Communications, 2005

Complexity metric driven energy optimization framework for implementing MPEG-21 scalable video decoders.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Generic modeling of complexity for motion-compensated wavelet video decoders.
Proceedings of the Electronic Imaging: Image and Video Communications and Processing 2005, 2005

2004
Efficient orchestration of sub-word parallelism in media processors.
Proceedings of the SPAA 2004: Proceedings of the Sixteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2004

Synchroscalar: A Multiple Clock Domain, Power-Aware, Tile-Based Embedded Processor.
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

Rate-distortion-complexity adaptive video compression and streaming.
Proceedings of the 2004 International Conference on Image Processing, 2004

2003
High-performance optical-label switching packet routers and smart edge routers for the next-generation Internet.
IEEE J. Sel. Areas Commun., 2003

Synchroscalar: Initial Lessons in Power-Aware Design of a Tile-Based Embedded Architecture.
Proceedings of the Power-Aware Computer Systems, Third International Workshop, 2003

Improving DSP Performance with a Small Amount of Field Programmable Logic.
Proceedings of the Field Programmable Logic and Application, 13th International Conference, 2003

2001
An Asynchronous Superscalar Architecture for Exploiting Instruction-Level Parallelism.
Proceedings of the 7th International Symposium on Advanced Research in Asynchronous Circuits and Systems (ASYNC 2001), 2001

1999
Automatic Insertion of Gated Clocks at Register Transfer Level.
Proceedings of the 12th International Conference on VLSI Design (VLSI Design 1999), 1999

1998
Micropipelined asynchronous discrete cosine transform (DCT/IDCT) processor.
IEEE Trans. Very Large Scale Integr. Syst., 1998

Asynchronous Comparison-Based Decoders for Delay-Insensitive Codes.
IEEE Trans. Computers, 1998

1997
Asynchronous Processor Survey.
Computer, 1997

1996
Limitations of VLSI Implementation of Delay-Insensitive Codes.
Proceedings of the Digest of Papers: FTCS-26, 1996

Counterflow pipeline based dynamic instruction scheduling.
Proceedings of the 2nd International Symposium on Advanced Research in Asynchronous Circuits and Systems (ASYNC '96), 1996

1995
Asynchronous 2-D discrete cosine transform core processor.
Proceedings of the 1995 International Conference on Computer Design (ICCD '95), 1995

1994
High-level optimizations in compiling process descriptions to asynchronous circuits.
J. VLSI Signal Process., 1994

Specification and Validation of Control-Intensive IC's in hopCP.
IEEE Trans. Software Eng., 1994

CFSIM: A Concurrent Compiled Code Functional Simulator for hopCP.
Int. J. Comput. Simul., 1994

Testing two-phase transition signaling based self-timed circuits in a synthesis environment.
Proceedings of the 7th International Symposium on High Level Synthesis, 1994

Performance Analysis and Optimization of Asynchronous Circuits.
Proceedings of the Proceedings 1994 IEEE International Conference on Computer Design: VLSI in Computer & Processors, 1994

A technique for estimating power in asynchronous circuits.
Proceedings of the International Symposium on Advanced Research in Asynchronous Circuits and Systems, 1994

1993
A transformational approach to asynchronous high-level synthesis.
Proceedings of the VLSI 93, 1993

1992
VLSI asynchronous systems: specification and synthesis.
Microprocess. Microsystems, 1992

From Process-Oriented Functional Specifications to Efficient Asynchronous Circuits.
Proceedings of the Fifth International Conference on VLSI Design, 1992

SHILPA: a high-level synthesis system for self-timed circuits.
Proceedings of the 1992 IEEE/ACM International Conference on Computer-Aided Design, 1992

1989
HOP: A process model for synchronous hardware; semantics and experiments in process composition.
Integr., 1989

Parallel Composition of Lockstep Synchronous Processes for Hardware Validation: Divide-and-Conquer Composition.
Proceedings of the Automatic Verification Methods for Finite State Systems, 1989


  Loading...