2025
Language Surgery in Multilingual Large Language Models.
CoRR, June, 2025

What Do Indonesians Really Need from Language Technology? A Nationwide Survey.
CoRR, June, 2025

Simulating LLM-to-LLM Tutoring for Multilingual Math Feedback.
CoRR, June, 2025

Evaluating Vision-Language and Large Language Models for Automated Student Assessment in Indonesian Classrooms.
CoRR, June, 2025

IndoSafety: Culturally Grounded Safety for LLMs in Indonesian Languages.
CoRR, June, 2025

FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning.
CoRR, June, 2025

Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM Evaluation.
CoRR, May, 2025

Llama-3-Nanda-10B-Chat: An Open Generative Large Language Model for Hindi.
CoRR, April, 2025

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
CoRR, March, 2025

Llama-3.1-Sherkala-8B-Chat: An Open Large Language Model for Kazakh.
CoRR, March, 2025

Unveiling Cultural Blind Spots: Analyzing the Limitations of mLLMs in Procedural Text Comprehension.
CoRR, February, 2025

Instruction Tuning on Public Government and Cultural Data for Low-Resource Language: a Case Study in Kazakh.
CoRR, February, 2025

Qorgau: Evaluating LLM Safety in Kazakh-Russian Bilingual Contexts.
CoRR, February, 2025

Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs.
CoRR, February, 2025

Synthetic Data Generation for Culturally Nuanced Commonsense Reasoning in Low-Resource Languages.
CoRR, February, 2025

KazMMLU: Evaluating Language Models on Kazakh, Russian, and Regional Knowledge of Kazakhstan.
CoRR, February, 2025

Commonsense Reasoning in Arab Culture.
CoRR, February, 2025

LLM360 K2: Building a 65B 360-Open-Source Large Language Model from Scratch.
CoRR, January, 2025

Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

2024
IndoCulture: Exploring Geographically Influenced Cultural Commonsense Reasoning Across Eleven Indonesian Provinces.
Trans. Assoc. Comput. Linguistics, 2024

Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability.
CoRR, 2024

INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge.
CoRR, 2024

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages.
CoRR, 2024

CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark.
CoRR, 2024

CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Are Multilingual LLMs Culturally-Diverse Reasoners? An Investigation into Multicultural Proverbs and Sayings.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon.
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024

ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

CMMLU: Measuring massive multitask language understanding in Chinese.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
LLM360: Towards Fully Transparent Open-Source LLMs.
CoRR, 2023

Are Multilingual LLMs Culturally-Diverse Reasoners? An Investigation into Multicultural Proverbs and Sayings.
CoRR, 2023

Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models.
CoRR, 2023

Bactrian-X : A Multilingual Replicable Instruction-Following Model with Low-Rank Adaptation.
CoRR, 2023

NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages.
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 2023

Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

NusaCrowd: Open Source Initiative for Indonesian NLP Resources.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
FFCI: A Framework for Interpretable Automatic Evaluation of Summarization.
J. Artif. Intell. Res., 2022

NusaCrowd: Open Source Initiative for Indonesian NLP Resources.
CoRR, 2022

NusaCrowd: A Call for Open and Reproducible NLP Research in Indonesian Languages.
CoRR, 2022

NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages.
CoRR, 2022

LipKey: A Large-Scale News Dataset for Absent Keyphrases Generation and Abstractive Summarization.
Proceedings of the 29th International Conference on Computational Linguistics, 2022

One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Discourse Probing of Pretrained Language Models.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Top-down Discourse Parsing via Sequence Labelling.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

Evaluating the Efficacy of Summarization Evaluation across Languages.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

2020
Towards Computational Linguistics in Minangkabau Language: Studies on Sentiment Analysis and Machine Translation.
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation, 2020

Liputan6: A Large-scale Indonesian Dataset for Text Summarization.
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 2020

IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP.
Proceedings of the 28th International Conference on Computational Linguistics, 2020

2019
Improved Document Modelling with a Neural Discourse Parser.
Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association, 2019

2017
Inset lexicon: Evaluation of a word list for Indonesian sentiment analysis in microblogs.
Proceedings of the 2017 International Conference on Asian Language Processing, 2017

2016
A Publicly Available Indonesian Corpora for Automatic Abstractive and Extractive Chat Summarization.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

2015
A Comparative Study on Twitter Sentiment Analysis: Which Features are Good?
Proceedings of the Natural Language Processing and Information Systems, 2015

HBE: Hashtag-Based Emotion Lexicons for Twitter Sentiment Analysis.
Proceedings of the 7th Forum for Information Retrieval Evaluation, 2015

The Use of POS Sequence for Analyzing Sentence Pattern in Twitter Sentiment Analysis.
Proceedings of the 29th IEEE International Conference on Advanced Information Networking and Applications Workshops, 2015

A Study on Natural Expressive Speech: Automatic Memorable Spoken Quote Detection.
Proceedings of the Natural Language Dialog Systems and Intelligent Assistants, 2015

2014
Memorable spoken quote corpora of TED public speaking.
Proceedings of the 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014

The use of semantic and acoustic features for open-domain TED talk summarization.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014