Andreas Moshovos
Orcid: 0000-0001-7768-367XAffiliations:
- University of Toronto, Canada
According to our database1,
Andreas Moshovos
authored at least 135 papers
between 1997 and 2024.
Collaborative distances:
Collaborative distances:
Awards
ACM Fellow
ACM Fellow 2017, "For contributions to high-performance architecture including memory dependence prediction and snooping coherence".
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
Low-Bitwidth Floating Point Quantization for Efficient High-Quality Diffusion Models.
CoRR, 2024
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024
Proceedings of the IEEE International Symposium on Circuits and Systems, 2024
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
2023
39 000-Subexposures/s Dual-ADC CMOS Image Sensor With Dual-Tap Coded-Exposure Pixels for Single-Shot HDR and 3-D Computational Imaging.
IEEE J. Solid State Circuits, November, 2023
Proceedings of the 13th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, 2023
2022
Schrödinger's FP: Dynamic Adaptation of Floating-Point Containers for Deep Learning Training.
CoRR, 2022
CoRR, 2022
A 39, 000 Subexposures/s CMOS Image Sensor with Dual-tap Coded-exposure Data-memory Pixel for Adaptive Single-shot Computational Imaging.
Proceedings of the IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits 2022), 2022
Mokey: enabling narrow fixed-point inference for out-of-the-box floating-point transformer models.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022
Proceedings of the 2022 IEEE Hot Chips 34 Symposium, 2022
2021
Proceedings of the Fourth Conference on Machine Learning and Systems, 2021
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021
2020
TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference.
CoRR, 2020
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference.
CoRR, 2020
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020
Late Breaking Results: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020
2019
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019
Proceedings of the 46th International Symposium on Computer Architecture, 2019
Proceedings of the IEEE International Symposium on Workload Characterization, 2019
Proceedings of the Computational Intelligence Methods for Bioinformatics and Biostatistics, 2019
Proceedings of the 19th IEEE International Conference on Bioinformatics and Bioengineering, 2019
Proceedings of the 2019 IEEE EMBS International Conference on Biomedical & Health Informatics, 2019
Bit-Tactical: A Software/Hardware Approach to Exploiting Value and Bit Sparsity in Neural Networks.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019
2018
Parallel Comput., 2018
CoRR, 2018
Bit-Tactical: Exploiting Ineffectual Computations in Convolutional Neural Networks: Which, Why, and How.
CoRR, 2018
Identifying and Exploiting Ineffectual Computations to Enable Hardware Acceleration of Deep Learning.
Proceedings of the 16th IEEE International New Circuits and Systems Conference, 2018
Proceedings of the 11th International Workshop on Network on Chip Architectures, 2018
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018
Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018
Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018
Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018
Loom: exploiting weight and activation precisions to accelerate convolutional neural networks.
Proceedings of the 55th Annual Design Automation Conference, 2018
2017
Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks.
CoRR, 2017
CoRR, 2017
Tartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Exploiting Numerical Precision Variability.
CoRR, 2017
Dynamic Stripes: Exploiting the Dynamic Precision Requirements of Activation Values in Neural Networks.
CoRR, 2017
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017
Proceedings of the 5th International Conference on Learning Representations, 2017
2016
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016
Proceedings of the 2016 IEEE International Symposium on Workload Characterization, 2016
Proceedings of the 2016 International Conference on Supercomputing, 2016
2015
Proceedings of the 48th International Symposium on Microarchitecture, 2015
Proceedings of the 48th International Symposium on Microarchitecture, 2015
Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015
2014
ACM Trans. Archit. Code Optim., 2014
Proceedings of the XIVth International Conference on Embedded Computer Systems: Architectures, 2014
Proceedings of the 2014 International Conference on ReConFigurable Computing and FPGAs, 2014
Proceedings of the 2014 International Conference on ReConFigurable Computing and FPGAs, 2014
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014
Proceedings of the 32nd IEEE International Conference on Computer Design, 2014
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014
An Architectural Approach to Characterizing and Eliminating Sources of Inefficiency in a Soft Processor Design.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014
2013
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013
STREX: boosting instruction cache reuse in OLTP workloads through stratified transaction execution.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013
Proceedings of the 23rd International Conference on Field programmable Logic and Applications, 2013
Proceedings of the Design, Automation and Test in Europe, 2013
Proceedings of the Design, Automation and Test in Europe, 2013
2012
NCOR: An FPGA-Friendly Nonblocking Data Cache for Soft Processors with Runahead Execution.
Int. J. Reconfigurable Comput., 2012
Proceedings of the 2012 International Conference on Reconfigurable Computing and FPGAs, 2012
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012
Proceedings of the Eighth International Workshop on Data Management on New Hardware, 2012
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012
2011
IEEE Trans. Very Large Scale Integr. Syst., 2011
2010
IEEE Trans. Very Large Scale Integr. Syst., 2010
Proceedings of the ReConFig'10: 2010 International Conference on Reconfigurable Computing and FPGAs, 2010
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2010
Proceedings of the International Conference on Field-Programmable Technology, 2010
2009
A physical-level study of the compacted matrix instruction scheduler for dynamically-scheduled superscalar processors.
Proceedings of the 2009 International Conference on Embedded Computer Systems: Architectures, 2009
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009
Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009
Proceedings of the 19th International Conference on Field Programmable Logic and Applications, 2009
Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, 2009
2008
IEEE Trans. Very Large Scale Integr. Syst., 2008
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008
A physical level study and optimization of CAM-based checkpointed register alias table.
Proceedings of the 2008 International Symposium on Low Power Electronics and Design, 2008
Proceedings of the 4th International Symposium on Workload Characterization (IISWC 2008), 2008
Proceedings of the High Performance Embedded Architectures and Compilers, 2008
Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, 2008
2007
IEEE Comput. Archit. Lett., 2007
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007
Proceedings of the 2007 International Symposium on Low Power Electronics and Design, 2007
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007
2006
IEEE Micro, 2006
Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006
BranchTap: improving performance with very few checkpoints through adaptive speculation control.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006
2005
IEEE Trans. Very Large Scale Integr. Syst., 2005
Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005
Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005
2004
Proceedings of the 2004 International Symposium on Low Power Electronics and Design, 2004
Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004
2003
Clust. Comput., 2003
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003
2002
IEEE Trans. Computers, 2002
Asymmetric-frequency clustering: a power-aware back-end for high-performance processors.
Proceedings of the 2002 International Symposium on Low Power Electronics and Design, 2002
Branch Predictor Prediction: A Power-Aware Branch Predictor for High-Performance Processors.
Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002
2001
Microarchitectural innovations: boosting microprocessor performance beyond semiconductor technology scaling.
Proc. IEEE, 2001
Instruction flow-based front-end throttling for power-aware high-performance processors.
Proceedings of the 2001 International Symposium on Low Power Electronics and Design, 2001
Proceedings of the 15th international conference on Supercomputing, 2001
Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001
2000
J. Instr. Level Parallelism, 2000
Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors.
Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit.
Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000
Memory Dependence Speculation Tradeoffs in Centralized, Continuous-Window Superscalar Processors.
Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000
1999
Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture, 1999
Improving virtual function call target prediction via dependence-based pre-computation.
Proceedings of the 13th international conference on Supercomputing, 1999
1998
Proceedings of the ASPLOS-VIII Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, 1998
1997
Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, 1997
Proceedings of the 24th International Symposium on Computer Architecture, 1997