Kazuki Irie

Brenden M. Lake

CoRR, 2024

MoEUT: Mixture-of-Experts Universal Transformers.

[BibT_eX]

[DOI]

Christopher D. Manning

CoRR, 2024

Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

Exploring the Promise and Limits of Real-Time Recurrent Learning.

[BibT_eX]

[DOI]

Anand Gopalakrishnan

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Self-organising Neural Discrete Representation Learning à la Kohonen.

[BibT_eX]

[DOI]

Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2024, 2024

2023

Unsupervised Learning of Temporal Abstractions With Slot-Based Transformers.

[BibT_eX]

[DOI]

Anand Gopalakrishnan

Sjoerd van Steenkiste

Neural Comput., 2023

SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention.

[BibT_eX]

[DOI]

CoRR, 2023

Automating Continual Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Mindstorms in Natural Language-Based Societies of Mind.

[BibT_eX]

[DOI]

CoRR, 2023

Accelerating Neural Self-Improvement via Bootstrapping.

[BibT_eX]

[DOI]

CoRR, 2023

Topological Neural Discrete Representation Learning à la Kohonen.

[BibT_eX]

[DOI]

CoRR, 2023

Contrastive Training of Complex-Valued Autoencoders for Object Discovery.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Images as Weight Matrices: Sequential Image Generation Through Synaptic Learning Rules.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Approximating Two-Layer Feedforward Networks for Efficient Transformers.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

2022

Learning to Control Rapidly Changing Synaptic Connections: An Alternative Type of Memory in Sequence Processing Artificial Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2022

Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules.

[BibT_eX]

[DOI]

Francesco Faccio

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

A Modern Self-Referential Weight Matrix That Learns to Modify Itself.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

CTL++: Evaluating Generalization on Never-Seen Compositional Patterns of Known Functions, and Compatibility of Neural Representations.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021

Improving Baselines in the Wild.

[BibT_eX]

[DOI]

CoRR, 2021

Training and Generating Neural Networks in Compressed Weight Space.

[BibT_eX]

[DOI]

CoRR, 2021

Linear Transformers Are Secretly Fast Weight Memory Systems.

[BibT_eX]

[DOI]

Imanol Schlag

CoRR, 2021

Going Beyond Linear Transformers with Recurrent Fast Weight Programmers.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Linear Transformers Are Secretly Fast Weight Programmers.

[BibT_eX]

[DOI]

Imanol Schlag

Proceedings of the 38th International Conference on Machine Learning, 2021

The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

2020

Advancing neural language modeling in automatic speech recognition.

[BibT_eX]

[DOI]

PhD thesis, 2020

The Rwth Asr System for Ted-Lium Release 2: Improving Hybrid Hmm With Specaugment.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

How Much Self-Attention Do We Need? Trading Attention for Feed-Forward Layers.

[BibT_eX]

[DOI]

Alexander Gerstenberger

Ralf Schlüter

Hermann Ney

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Domain Robust, Fast, and Compact Neural Language Models.

[BibT_eX]

[DOI]

Alexander Gerstenberger

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

RWTH ASR Systems for LibriSpeech: Hybrid vs Attention - w/o Data Augmentation.

[BibT_eX]

[DOI]

CoRR, 2019

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling.

[BibT_eX]

[DOI]

CoRR, 2019

Model Unit Exploration for Sequence-to-Sequence Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2019

RWTH ASR Systems for LibriSpeech: Hybrid vs Attention.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Language Modeling with Deep Transformers.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A Comparison of Transformer and LSTM Encoder Decoder Models for ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Training Language Models for Long-Span Cross-Sentence Evaluation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Improved Training of End-to-end Attention Models for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Investigation on Estimation of Sentence Probability by Combining Forward, Backward and Bi-directional LSTM-RNNs.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Prediction of LSTM-RNN Full Context States as a Subtask for N-Gram Feedforward Language Models.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

RADMM: Recurrent Adaptive Mixture Model with Applications to Domain Robust Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

The 2016 RWTH Keyword Search System for Low-Resource Languages.

[BibT_eX]

[DOI]

Proceedings of the Speech and Computer - 19th International Conference, 2017

Investigations on byte-level convolutional neural networks for language modeling in low resource speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016

Automatic Speech Recognition Based on Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Speech and Computer - 18th International Conference, 2016

LSTM, GRU, Highway and a Bit of Attention: An Empirical Overview for Language Modeling in Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Investigation on log-linear interpolation of multi-domain neural network language model.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Bag-of-words input for long history representation in neural network-based language models for speech recognition.

[BibT_eX]

[DOI]