CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training.
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, April, 2025
Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase Pretraining.
CoRR, 2024
Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset.
CoRR, 2024
Survey of Hallucination in Natural Language Generation.
ACM Comput. Surv., December, 2023
Learn What NOT to Learn: Towards Generative Safety in Chatbots.
CoRR, 2023
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 2023
Context Generation Improves Open Domain Question Answering.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023
Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023
Model Debiasing via Gradient-based Explanation on Representation.
Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, 2023
Generative Long-form Question Answering: Relevance, Faithfulness and Succinctness.
CoRR, 2022
AiSocrates: Towards Answering Ethical Quandary Questions.
CoRR, 2022
QA4QG: Using Question Answering to Constrain Multi-Hop Question Generation.
Proceedings of the IEEE International Conference on Acoustics, 2022
Read before Generate! Faithful Long Form Question Answering with Machine Reading.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022
Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters.
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering, 2022
Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters.
CoRR, 2021
Improve Query Focused Abstractive Summarization by Incorporating Answer Relevance.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021
Dimsum @LaySumm 20: BART-based Approach for Scientific Document Summarization.
CoRR, 2020
CAiRE-COVID: A Question Answering and Multi-Document Summarization System for COVID-19 Research.
CoRR, 2020
Improving Spoken Question Answering Using Contextualized Word Representation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the First Workshop on Scholarly Document Processing, 2020
CAiRE-COVID: A Question Answering and Query-focused Multi-Document Summarization System for COVID-19 Scholarly Information Management.
Proceedings of the 1st Workshop on NLP for COVID-19@ EMNLP 2020, Online, December 2020, 2020
Multi-hop Question Generation with Graph Convolutional Network.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020
Generalizing Question Answering System with Pre-trained Language Model Fine-tuning.
Proceedings of the 2nd Workshop on Machine Reading for Question Answering, 2019
Multimodal music emotion classification using AdaBoost with decision stumps.
Proceedings of the IEEE International Conference on Acoustics, 2013
These words are music to my ears: Recognizing music emotion from lyrics using AdaBoost.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013
Personalized music emotion classification via active learning.
Proceedings of the second international ACM workshop on Music information retrieval with user-centered and multimodal strategies, 2012