Nitish Shirish Keskar

Orcid: 0000-0002-2223-8496

According to our database1, Nitish Shirish Keskar authored at least 38 papers between 2015 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Trans. Mach. Learn. Res., 2023

2022
Generating Negative Samples for Sequential Recommendation.
CoRR, 2022

Modeling Multi-hop Question Answering as Single Sequence Prediction.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Combining Data-driven Supervision with Human-in-the-loop Feedback for Entity Resolution.
CoRR, 2021

Mirostat: a Neural Text decoding Algorithm that directly controls perplexity.
Proceedings of the 9th International Conference on Learning Representations, 2021

Unsupervised Paraphrasing with Pretrained Language Models.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

GeDi: Generative Discriminator Guided Sequence Generation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

Char2Subword: Extending the Subword Embedding Space Using Robust Character Compositionality.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

2020
Unsupervised Paraphrase Generation via Dynamic Blocking.
CoRR, 2020

Char2Subword: Extending the Subword Embedding Space from Pre-trained Models Using Robust Character Compositionality.
CoRR, 2020

Mirostat: A Perplexity-Controlled Neural Text Decoding Algorithm.
CoRR, 2020

ProGen: Language Modeling for Protein Generation.
CoRR, 2020

Improving out-of-distribution generalization via multi-task self-supervised pretraining.
CoRR, 2020

Limits of Detecting Text Generated by Large-Scale Language Models.
Proceedings of the Information Theory and Applications Workshop, 2020

Simple Data Augmentation with the Mask Token Improves Domain Adaptation for Dialog Act Tagging.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

The Thieves on Sesame Street are Polyglots - Extracting Multilingual Models from Monolingual APIs.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Assessing Local Generalization Capability in Deep Models.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

2019
Balancing Communication and Computation in Distributed Optimization.
IEEE Trans. Autom. Control., 2019

A limited-memory quasi-Newton algorithm for bound-constrained non-smooth optimization.
Optim. Methods Softw., 2019

Global Capacity Measures for Deep ReLU Networks via Path Sampling.
CoRR, 2019

CTRL: A Conditional Transformer Language Model for Controllable Generation.
CoRR, 2019

Pretrained AI Models: Performativity, Mobility, and Change.
CoRR, 2019

XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering.
CoRR, 2019

Unifying Question Answering and Text Classification via Span Extraction.
CoRR, 2019

Coarse-grain Fine-grain Coattention Network for Multi-evidence Question Answering.
Proceedings of the 7th International Conference on Learning Representations, 2019

A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation.
Proceedings of the 7th International Conference on Learning Representations, 2019

Neural Text Summarization: A Critical Evaluation.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

2018
Identifying Generalization Properties in Neural Networks.
CoRR, 2018

The Natural Language Decathlon: Multitask Learning as Question Answering.
CoRR, 2018

Using Mode Connectivity for Loss Landscape Analysis.
CoRR, 2018

An Analysis of Neural Language Modeling at Multiple Scales.
CoRR, 2018

Regularizing and Optimizing LSTM Language Models.
Proceedings of the 6th International Conference on Learning Representations, 2018

2017
Improving Generalization Performance by Switching from Adam to SGD.
CoRR, 2017

Weighted Transformer Network for Machine Translation.
CoRR, 2017

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima.
Proceedings of the 5th International Conference on Learning Representations, 2017

2016
A second-order method for convex l<sub>1</sub>-regularized optimization with active-set prediction.
Optim. Methods Softw., 2016

adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2016

2015
A nonmonotone learning rate strategy for SGD training of deep neural networks.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015


  Loading...