Yikang Shen

Orcid: 0000-0001-6836-0510

According to our database1, Yikang Shen authored at least 70 papers between 2014 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Stick-breaking Attention.
CoRR, 2024

Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler.
CoRR, 2024

Scaling Granite Code Models to 128K Context.
CoRR, 2024

The infrastructure powering IBM's Gen AI model development.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
CoRR, 2024

Octo-planner: On-device Language Model for Planner-Action Agents.
CoRR, 2024

Efficient Continual Pre-training by Mitigating the Stability Gap.
CoRR, 2024

Parallelizing Linear Transformers with the Delta Rule over Sequence Length.
CoRR, 2024

Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training.
CoRR, 2024

Granite Code Models: A Family of Open Foundation Models for Code Intelligence.
CoRR, 2024

JetMoE: Reaching Llama2 Performance with 0.1M Dollars.
CoRR, 2024

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models.
CoRR, 2024

Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision.
CoRR, 2024

Scattered Mixture-of-Experts Implementation.
CoRR, 2024

API Pack: A Massive Multilingual Dataset for API Call Generation.
CoRR, 2024

Diversity Measurement and Subset Selection for Instruction Tuning Datasets.
CoRR, 2024

Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble.
CoRR, 2024

Structured Code Representations Enable Data-Efficient Adaptation of Code Language Models.
CoRR, 2024

Gated Linear Attention Transformers with Hardware-Efficient Training.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

LaMAGIC: Language-Model-based Topology Generation for Analog Integrated Circuits.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

SALMON: Self-Alignment with Instructable Reward Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

The Consensus Game: Language Model Generation via Equilibrium Search.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

FlexAttention for Efficient High-Resolution Vision-Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

Aligning Large Multimodal Models with Factually Augmented RLHF.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Visual Chain-of-Thought Prompting for Knowledge-Based Visual Reasoning.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Autonomous Tree-search Ability of Large Language Models.
CoRR, 2023

SALMON: Self-Alignment with Principle-Following Reward Models.
CoRR, 2023

GraphText: Graph Reasoning in Text Space.
CoRR, 2023

An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training.
CoRR, 2023

ModuleFormer: Learning Modular Large Language Models From Uncurated Data.
CoRR, 2023

See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning.
CoRR, 2023

Adaptive Online Replanning with Diffusion Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Planning with Large Language Models for Code Generation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Hyper-Decision Transformer for Efficient Online Policy Adaptation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Transformer-Patcher: One Mistake Worth One Neuron.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

TextPSG: Panoptic Scene Graph Generation from Textual Descriptions.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Sparse Universal Transformer.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners.
CoRR, 2022

Syntactic Inductive Biases for Deep Learning Methods.
CoRR, 2022

Prompting Decision Transformer for Few-Shot Policy Generalization.
Proceedings of the International Conference on Machine Learning, 2022

Mixture of Attention Heads: Selecting Attention Heads Per Token.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Unsupervised Dependency Graph Network.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Phrase-aware Unsupervised Constituency Parsing.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Self-Instantiated Recurrent Units with Dynamic Soft Recursion.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Explicitly Modeling Syntax in Language Models with Incremental Parsing and a Dynamic Oracle.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Long Range Arena : A Benchmark for Efficient Transformers.
Proceedings of the 9th International Conference on Learning Representations, 2021

Learning Task Decomposition with Ordered Memory Policy Network.
Proceedings of the 9th International Conference on Learning Representations, 2021

StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
Explicitly Modeling Syntax in Language Model improves Generalization.
CoRR, 2020

Recursive Top-Down Production for Sentence Generation with Latent Trees.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Exploiting Syntactic Structure for Better Language Modeling: A Syntactic Distance Approach.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019
Investigating Biases in Textual Entailment Datasets.
CoRR, 2019

Ordered Memory.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks.
Proceedings of the 7th International Conference on Learning Representations, 2019

2018
Biological Event Trigger Identification with Noise Contrastive Estimation.
IEEE ACM Trans. Comput. Biol. Bioinform., 2018

Generating Contradictory, Neutral, and Entailing Sentences.
CoRR, 2018

Neural Language Modeling by Jointly Learning Syntax and Lexicon.
Proceedings of the 6th International Conference on Learning Representations, 2018

BanditSum: Extractive Summarization as a Contextual Bandit.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Straight to the Tree: Constituency Parsing with Neural Syntactic Distance.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017
Self-organized Hierarchical Softmax.
CoRR, 2017

Exploration of Tree-based Hierarchical Softmax for Recurrent Language Models.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Word Embedding Based Correlation Model for Question/Answer Matching.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
Multidimensional scaling based knowledge provision for new questions in community Question Answering systems.
Proceedings of the 2016 International Joint Conference on Neural Networks, 2016

Convolutional Neural Network based sentiment analysis using Adaboost combination.
Proceedings of the 2016 International Joint Conference on Neural Networks, 2016

2015
Question/Answer Matching for CQA System via Combining Lexical and Sequential Information.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014
Influencing Factors Analysis of People's Answering Behaviours on Social Network Based Questions.
Proceedings of the 2014 IEEE 11th Intl Conf on Ubiquitous Intelligence and Computing and 2014 IEEE 11th Intl Conf on Autonomic and Trusted Computing and 2014 IEEE 14th Intl Conf on Scalable Computing and Communications and Its Associated Workshops, 2014

Choosing the Best Auto-Encoder-Based Bagging Classifier: An Empirical Study.
Proceedings of the Neural Information Processing - 21st International Conference, 2014


  Loading...