2025
Post-training an LLM for RAG? Train on Self-Generated Demonstrations.
CoRR, February, 2025
2024
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Findings of the Association for Computational Linguistics, 2024
2023
MGEL: Multigrained Representation Analysis and Ensemble Learning for Text Moderation.
IEEE Trans. Neural Networks Learn. Syst., October, 2023
Binary and Ternary Natural Language Generation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
2022
BiT: Robustly Binarized Multi-distilled Transformer.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
2021
Current Challenges and Future Directions in Podcast Information Access.
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021
Neural Instant Search for Music and Podcast.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021
Representation of Music Creators on Wikipedia, Differences in Gender and Genre.
Proceedings of the Fifteenth International AAAI Conference on Web and Social Media, 2021
Detecting Extraneous Content in Podcasts.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021
Leveraging Semantic Information to Facilitate the Discovery of Underserved Podcasts.
Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021
2020
The Spotify Podcasts Dataset.
CoRR, 2020
TREC 2020 Podcasts Track Overview.
Proceedings of the Twenty-Ninth Text REtrieval Conference, 2020
100, 000 Podcasts: A Spoken English Document Corpus.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 28th International Conference on Computational Linguistics, 2020
Query Understanding for Surfacing Under-served Music Content.
Proceedings of the CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, 2020
2019
Inferring Advertiser Sentiment in Online Articles using Wikipedia Footnotes.
Proceedings of the Companion of The 2019 World Wide Web Conference, 2019
On the Complexity of Opinions and Online Discussions.
Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, 2019
Unsupervised Neologism Normalization Using Embedding Space Mapping.
Proceedings of the 5th Workshop on Noisy User-generated Text, 2019
2018
Extreme Multilabel Classification for Social Media Chairs' Welcome and Organization.
Proceedings of the Companion of the The Web Conference 2018 on The Web Conference 2018, 2018
2017
Lightweight Multilingual Entity Extraction and Linking.
Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, 2017
DocTag2Vec: An Embedding Based Multi-label Learning Approach for Document Tagging.
Proceedings of the 2nd Workshop on Representation Learning for NLP, 2017
Post-Processing Techniques for Improving Predictions of Multilabel Learning Approaches.
Proceedings of the Eighth International Joint Conference on Natural Language Processing, 2017
Automatically Identifying Good Conversations Online (Yes, They Do Exist!).
Proceedings of the Eleventh International Conference on Web and Social Media, 2017
Finding Good Conversations Online: The Yahoo News Annotated Comments Corpus.
Proceedings of the 11th Linguistic Annotation Workshop, 2017
2016
Weakly supervised user intent detection for multi-domain dialogues.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016
Humor in Collective Discourse: Unsupervised Funniness Detection in the New Yorker Cartoon Caption Contest.
,
,
,
,
,
,
,
,
,
,
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016
2015
The Cohort and Speechify Libraries for Rapid Construction of Speech Enabled Applications for Android.
Proceedings of the SIGDIAL 2015 Conference, 2015
Automatic formatted transcripts for videos.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
2014
Knowledge Acquisition Strategies for Goal-Oriented Dialog Systems.
Proceedings of the SIGDIAL 2014 Conference, 2014
Investigating Critical Speech Recognition Errors in Spoken Short Messages.
Proceedings of the Situated Dialog in Speech-Based Human-Computer Interaction, 2014
Learning situated knowledge bases through dialog.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Conversational Strategies for Robustly Managing Dialog in Public Spaces.
Proceedings of the Workshop on Dialogue in Motion, 2014
2013
Predicting Tasks in Goal-Oriented Spoken Dialog Systems using Semantic Knowledge Bases.
Proceedings of the SIGDIAL 2013 Conference, 2013
Deploying speech interfaces to the masses.
Proceedings of the 18th International Conference on Intelligent User Interfaces, 2013
Situated Multiparty Interaction between Humans and Agents.
Proceedings of the Human-Computer Interaction. Interaction Modalities and Techniques, 2013
2012
The Structure and Generality of Spoken Route Instructions.
Proceedings of the SIGDIAL 2012 Conference, 2012
2010
Instruction Taking in the TeamTalk System.
Proceedings of the Dialog with Robots, 2010
2009
Using Wikipedia for Hierarchical Finer Categorization of Named Entities.
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, 2009
2008
Vaakkriti: Sanskrit Tokenizer.
Proceedings of the Third International Joint Conference on Natural Language Processing, 2008