Mauricio J. Serrano

According to our database1, Mauricio J. Serrano authored at least 42 papers between 1992 and 2022.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2022
Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

4-Bit Quantization of LSTM-Based Speech Recognition Models.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

2020
Efficient AI System Design With Cross-Layer Approximate Computing.
Proc. IEEE, 2020

2019
DeepTools: Compiler and Execution Runtime Extensions for RaPiD AI Accelerator.
IEEE Micro, 2019

BlueConnect: Decomposing all-reduce for deep learning on heterogeneous network hierarchy.
IBM J. Res. Dev., 2019

Efficient implementation of sparse matrix-sparse vector multiplication for large scale graph analytics.
Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019

2018
Graph Programming Interface (GPI): A Linear Algebra Programming Model for Large Scale Graph Computations.
Int. J. Parallel Program., 2018

2017
Enabling massive deep neural networks with the GraphBLAS.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

2016
Efficient implementation of scatter-gather operations for large scale graph analytics.
Proceedings of the 2016 IEEE High Performance Extreme Computing Conference, 2016

Graph programming interface (GPI): a linear algebra programming model for large scale graph computations.
Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

2015
Active Memory Cube: A processing-in-memory architecture for exascale systems.
IBM J. Res. Dev., 2015

2014
Simple, portable and fast SIMD intrinsic programming: generic simd library.
Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing, 2014

2013
Trace construction using enhanced performance monitoring.
Proceedings of the Computing Frontiers Conference, 2013

2011
Improving the performance of trace-based systems by false loop filtering.
Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, 2011

2009
Placement optimization using data context collected during garbage collection.
Proceedings of the 8th International Symposium on Memory Management, 2009

Building Approximate Calling Context from Partial Call Traces.
Proceedings of the CGO 2009, 2009

2008
Perfdiff: a framework for performance difference analysis in a virtual machine environment.
Proceedings of the Sixth International Symposium on Code Generation and Optimization (CGO 2008), 2008

2007
Call-chain Software Instruction Prefetching in J2EE Server Applications.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006
Accurate, efficient, and adaptive calling context profiling.
Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, 2006

2004
Prefetch inection based on hardware monitoring and object metadata.
Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation 2004, 2004

Whole-Stack Analysis and Optimization of Commercial Workloads on Server Systems.
Proceedings of the Network and Parallel Computing, IFIP International Conference, 2004

2003
Stack allocation and synchronization optimizations for Java using escape analysis.
ACM Trans. Program. Lang. Syst., 2003

2002
Efficiently Adapting Java Binaries in Limited Memory Contexts.
Int. J. Parallel Program., 2002

Value-Profile Guided Stride Prefetching for Irregular Code.
Proceedings of the Compiler Construction, 11th International Conference, 2002

2001
Characterizing the memory behavior of Java workloads: a structured view and opportunities for optimizations.
Proceedings of the Joint International Conference on Measurements and Modeling of Computer Systems, 2001

Register-sensitive selection, duplication, and sequencing of instructions.
Proceedings of the 15th international conference on Supercomputing, 2001

A framework for efficient reuse of binary code in Java.
Proceedings of the 15th international conference on Supercomputing, 2001

2000
The Jalapeño virtual machine.
IBM Syst. J., 2000

Quicksilver: a quasi-static compiler for Java.
Proceedings of the 2000 ACM SIGPLAN Conference on Object-Oriented Programming Systems, 2000

1999
Escape Analysis for Java.
Proceedings of the 1999 ACM SIGPLAN Conference on Object-Oriented Programming Systems, 1999

Dependence Analysis for Java.
Proceedings of the Languages and Compilers for Parallel Computing, 1999

The Jalapeño Dynamic Optimizing Compiler for Java.
Proceedings of the ACM 1999 Conference on Java Grande, JAVA '99, San Francisco, CA, USA, 1999

1998
Thin locks: featherweight Synchronization for Java (with retrospective)
Proceedings of the 20 Years of the ACM SIGPLAN Conference on Programming Language Design and Implementation 1979-1999, 1998

Thin Locks: Featherweight Synchronization for Java.
Proceedings of the ACM SIGPLAN '98 Conference on Programming Language Design and Implementation (PLDI), 1998

1996
Performance Estimation in a Simultaneous Multithreading Processor.
Proceedings of the MASCOTS '96, 1996

1995
Optimized code restructuring of OS/2 executables.
Proceedings of the 1995 Conference of the Centre for Advanced Studies on Collaborative Research, 1995

1994
The Impact of Unresolved Branches on Branch Prediction Scheme Performance.
Proceedings of the 21st Annual International Symposium on Computer Architecture. Chicago, 1994

Performance Estimation of Multistreamed, Supersealar Processors.
Proceedings of the 27th Annual Hawaii International Conference on System Sciences (HICSS-27), 1994

A Model for Performance Estimation in a Multistreamed Superscalar Processor.
Proceedings of the Computer Performance Evaluation, 1994

1993
Optimal Architectures and Algorithms for Mesh-Connected Parallel Computers with Separable Row/Column Buses.
IEEE Trans. Parallel Distributed Syst., 1993

1992
Optimal Aspect Ratio and Number of Separable Row/Column Buses for Mesh-Connected Parallel Computers.
Proceedings of the 6th International Parallel Processing Symposium, 1992


  Loading...