Yi Tay

Orcid: 0000-0001-6896-4496

According to our database1, Yi Tay authored at least 129 papers between 2016 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Scaling Instruction-Finetuned Language Models.
J. Mach. Learn. Res., 2024

Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models.
CoRR, 2024

Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models.
CoRR, 2024


2023
PolyViT: Co-training Vision Transformers on Images, Videos and Audio.
Trans. Mach. Learn. Res., 2023

PaLM: Scaling Language Modeling with Pathways.
J. Mach. Learn. Res., 2023

Efficient Transformers: A Survey.
ACM Comput. Surv., 2023

PaLI-X: On Scaling up a Multilingual Vision and Language Model.
CoRR, 2023

PaLM 2 Technical Report.
CoRR, 2023

Larger language models do in-context learning differently.
CoRR, 2023

Surprise: Result List Truncation via Extreme Value Theory.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Recommender Systems with Generative Retrieval.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

The Flan Collection: Designing Data and Methods for Effective Instruction Tuning.
Proceedings of the International Conference on Machine Learning, 2023


UL2: Unifying Language Learning Paradigms.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Recitation-Augmented Language Models.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Language models are multilingual chain-of-thought reasoners.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Inverse Scaling Can Become U-Shaped.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Symbol tuning improves in-context learning in language models.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Transcending Scaling Laws with 0.1% Extra Compute.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

DSI++: Updating Transformer Memory with New Documents.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

CoLT5: Faster Long-Range Transformers with Conditional Computation.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Deep Learning for Recommender Systems.
Proceedings of the Recommender Systems Handbook, 2022

Emergent Abilities of Large Language Models.
Trans. Mach. Learn. Res., 2022

Inverse scaling can become U-shaped.
CoRR, 2022

Scaling Instruction-Finetuned Language Models.
CoRR, 2022

Unifying Language Learning Paradigms.
CoRR, 2022

Transformer Memory as a Differentiable Search Index.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Confident Adaptive Language Modeling.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

A New Generation of Perspective API: Efficient Multilingual Character-level Transformers.
Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

HyperPrompt: Prompt-based Task-Conditioning of Transformers.
Proceedings of the International Conference on Machine Learning, 2022

Charformer: Fast Character Transformers via Gradient-based Subword Tokenization.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Scale Efficiently: Insights from Pretraining and Finetuning Transformers.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Scarf: Self-Supervised Contrastive Learning using Random Feature Corruption.
Proceedings of the Tenth International Conference on Learning Representations, 2022

ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning.
Proceedings of the Tenth International Conference on Learning Representations, 2022

The Efficiency Misnomer.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Dense Feature Memory Augmented Transformers for COVID-19 Vaccination Search Classification.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: EMNLP 2022 - Industry Track, Abu Dhabi, UAE, December 7, 2022

SCENIC: A JAX Library for Computer Vision Research and Beyond.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Improving Compositional Generalization with Self-Training for Data-to-Text Generation.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Sharpness-Aware Minimization Improves Language Model Generalization.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

2021
Rethinking search: making domain experts out of dilettantes.
SIGIR Forum, 2021

PolyViT: Co-training Vision Transformers on Images, Videos and Audio.
CoRR, 2021

Improving Compositional Generalization with Self-Training for Data-to-Text Generation.
CoRR, 2021

Born Again Neural Rankers.
CoRR, 2021

Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers.
CoRR, 2021

The Benchmark Lottery.
CoRR, 2021

Are Pre-trained Convolutions Better than Pre-trained Transformers?
CoRR, 2021

Rethinking Search: Making Experts out of Dilettantes.
CoRR, 2021

Switch Spaces: Learning Product Spaces with Sparse Gating.
CoRR, 2021

Label Smoothed Embedding Hypothesis for Out-of-Distribution Detection.
CoRR, 2021

Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study.
Proceedings of the WSDM '21, 2021

Self-Instantiated Recurrent Units with Dynamic Soft Recursion.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Knowledge Router: Learning Disentangled Representations for Knowledge Graphs.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Synthesizer: Rethinking Self-Attention for Transformer Models.
Proceedings of the 38th International Conference on Machine Learning, 2021

OmniNet: Omnidirectional Representations from Transformers.
Proceedings of the 38th International Conference on Machine Learning, 2021

Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with 1/n Parameters.
Proceedings of the 9th International Conference on Learning Representations, 2021

HyperGrid Transformers: Towards A Single Model for Multiple Tasks.
Proceedings of the 9th International Conference on Learning Representations, 2021

Long Range Arena : A Benchmark for Efficient Transformers.
Proceedings of the 9th International Conference on Learning Representations, 2021

Are Neural Rankers still Outperformed by Gradient Boosted Decision Trees?
Proceedings of the 9th International Conference on Learning Representations, 2021

Do Transformer Modifications Transfer Across Implementations and Applications?
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

On Orthogonality Constraints for Transformers.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

Are Pretrained Convolutions Better than Pretrained Transformers?
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

How Reliable are Model Diagnostics?
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

2020
Holistic Multi-Modal Memory Network for Movie Question Answering.
IEEE Trans. Image Process., 2020

HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections.
CoRR, 2020

HyperML: A Boosting Metric Learning Approach in Hyperbolic Space for Recommender Systems.
Proceedings of the WSDM '20: The Thirteenth ACM International Conference on Web Search and Data Mining, 2020

Choppy: Cut Transformer for Ranked List Truncation.
Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, 2020

Sparse Sinkhorn Attention.
Proceedings of the 37th International Conference on Machine Learning, 2020

Jacobian Adversarially Regularized Networks for Robustness.
Proceedings of the 8th International Conference on Learning Representations, 2020

Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

What It Thinks Is Important Is Important: Robustness Transfers Through Input Gradients.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Interactive Machine Comprehension with Information Seeking Agents.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Would you Rather? A New Benchmark for Learning Machine Alignment with Cultural Values and Social Preferences.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Reverse Engineering Configurations of Neural Text Generation Models.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Multi-Level Head-Wise Match and Aggregation in Transformer for Textual Sequence Matching.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Neural architectures for natural language understanding
PhD thesis, 2019

Pair-Linking for Collective Entity Disambiguation: Two Could Be Better Than All.
IEEE Trans. Knowl. Data Eng., 2019

Deep Learning Based Recommender System: A Survey and New Perspectives.
ACM Comput. Surv., 2019

Quaternion Knowledge Graph Embedding.
CoRR, 2019

Interact and Decide: Medley of Sub-Attention Networks for Effective Group Recommendation.
Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019

Compositional De-Attention Networks.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Quaternion Knowledge Graph Embeddings.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Quaternion Collaborative Filtering for Recommendation.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

DeepRec: An Open-source Toolkit for Deep Learning based Recommendation.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Bridging the Gap between Relevance Matching and Semantic Matching for Short Text Similarity Modeling.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Confusionset-guided Pointer Networks for Chinese Spelling Check.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Lightweight and Efficient Neural Natural Language Processing with Quaternion Networks.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Simple and Effective Curriculum Pointer-Generator Networks for Reading Comprehension over Long Narratives.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Robust Representation Learning of Biomedical Names.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Holographic Factorization Machines for Recommendation.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Hyperbolic Recommender Systems.
CoRR, 2018

Next Item Recommendation with Self-Attention.
CoRR, 2018

Multi-Cast Attention Networks for Retrieval-based Question Answering and Response Prediction.
CoRR, 2018

Multi-range Reasoning for Machine Comprehension.
CoRR, 2018

A Compare-Propagate Architecture with Alignment Factorization for Natural Language Inference.
CoRR, 2018

Latent Relational Metric Learning via Memory-based Attention for Collaborative Ranking.
Proceedings of the 2018 World Wide Web Conference on World Wide Web, 2018

Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering.
Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, 2018

Densely Connected Attention Propagation for Reading Comprehension.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Recurrently Controlled Recurrent Networks.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Multi-Cast Attention Networks.
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018

Multi-Pointer Co-Attention Networks for Recommendation.
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018

Hermitian Co-Attention Networks for Text Matching in Asymmetrical Domains.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

CoupleNet: Paying Attention to Couples with Coupled Attention for Relationship Recommendation.
Proceedings of the Twelfth International Conference on Web and Social Media, 2018

Attentive Gated Lexicon Reader with Contrastive Contextual Co-Attention for Sentiment Classification.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Co-Stack Residual Affinity Networks with Multi-level Attention Refinement for Matching Text Sequences.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Multi-Granular Sequence Encoding via Dilated Compositional Units for Reading Comprehension.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Reasoning with Sarcasm by Reading In-Between.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

Learning to Attend via Word-Aspect Associative Fusion for Aspect-Based Sentiment Analysis.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

Cross Temporal Recurrent Networks for Ranking Question Answer Pairs.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

SkipFlow: Incorporating Neural Coherence Features for End-to-End Automatic Text Scoring.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Learning to Rank Question Answer Pairs with Holographic Dual LSTM Architecture.
CoRR, 2017

Enabling Efficient Question Answer Retrieval via Hyperbolic Neural Networks.
CoRR, 2017

Translational Recommender Networks.
CoRR, 2017

Random Semantic Tensor Ensemble for Scalable Knowledge Graph Link Prediction.
Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, 2017

Learning to Rank Question Answer Pairs with Holographic Dual LSTM Architecture.
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

Cross-Device User Linking: URL, Session, Visiting Time, and Device-log Embedding.
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

Multi-Task Neural Network for Non-discrete Attribute Prediction in Knowledge Graphs.
Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017

Dyadic Memory Networks for Aspect-based Sentiment Analysis.
Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017

NeuPL: Attention-based Semantic Matching and Pair-Linking for Entity Disambiguation.
Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017

Non-Parametric Estimation of Multiple Embeddings for Link Prediction on Dynamic Knowledge Graphs.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
Cross Device Matching for Online Advertising with Neural Feature Ensembles : First Place Solution at CIKM Cup 2016.
CoRR, 2016

Learning Term Embeddings for Taxonomic Relation Identification Using Dynamic Weighting Neural Network.
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016


  Loading...