Abdelrahman Boda Sadallah

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

KazMMLU: Evaluating Language Models on Kazakh, Russian, and Regional Knowledge of Kazakhstan.

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Commonsense Reasoning in Arab Culture.

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Instruction Tuning on Public Government and Cultural Data for Low-Resource Language: a Case Study in Kazakh.

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Qorǵau: Evaluating Safety in Kazakh-Russian Bilingual Contexts.

[DOI]

Zain Muhammad Mujahid

Preslav Nakov

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia.

[DOI]

Samuel Cahyawijaya

Mohammad Rifqi Farhansyah

Joel Ruben Antony Moniz

Tack Hwa Wong

Thant Thiri Maung

Frederikus Hudi

David Anugraha

Muhammad Ravi Shulthan Habibi

Muhammad Reza Qorib

Amit Agarwal

Mohamed Fazli Mohamed Imam

Hitesh Laxmichand Patel

Vicky Feliren

Bahrul Ilmi Nasution

Manuel Antonio Rufino

Genta Indra Winata

Rian Adam Rajagede

Carlos Rafael Catalan

Priyaranjan Pattnayak

Salsabila Zahirah Pranida

Kevin Pratama

Yeshil Bangera

Adisai Na-Thalang

Patricia Nicole Monderin

Kanyakorn Veerakanjana

Piyalitt Ittichaiwong

Matthew Theodore Roque

Karissa Vincentio

Takdanai Kreangphet

Phakphum Artkaew

Kadek Hendrawan Palgunadi

Hanif Muhammad Zhafran

Fenal Ashokbhai Ilasariya

Haochen Li

John Amadeo Daniswara

Filbert Aurelian Tjiaranata

Eryawan Presma Yulianrifat

Can Udomcharoenchaikit

Fadil Risdian Ansori

Mahardika Krisna Ihsani

Isaiah Edri W. Flores

Lester James Validad Miranda

Ming Shan Hee

Ikhlasul Akmal Hanif

M. Alif Al Hakim

Muhammad Rizky Sya'ban

Kun Kerdthaisong

Daniel Fernando Erazo Florez

Tirana Noor Fatyanosa

Peerat Limkonchotiwat

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

IndoCulture: Exploring Geographically Influenced Cultural Commonsense Reasoning Across Eleven Indonesian Provinces.

[DOI]

Trans. Assoc. Comput. Linguistics, 2024

Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability.

[DOI]

CoRR, 2024

INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge.

[DOI]

Azril Hafizi Amirudin

Fabian Farestam

Shayekh Bin Islam

Perttu Isotalo

Maral Jabbarishiviari

Gabriel Adriano de Melo

Johan Samir Obando-Ceron

Marjana Prifti Skenduli

Arshia Soltani Moakhar

Bardia Soltani Moakhar

Ran Tamir

Ayush Kumar Tarun

Azmine Toushik Wasi

Thenuka Ovin Weerasinghe

CoRR, 2024

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages.

[DOI]

Rahmad Mahendra

Lester James V. Miranda

Muhammad Ravi Shulthan Habibi

Onno Pepijn Kampman

Joel Ruben Antony Moniz

Patrick Amadeus Irawan

Bin Wang

Muhammad Dehan Al Kautsar

Chenxi Whitehouse

Ivan Halim Parmonangan

Sonny Lazuardi Hermawan

Dan John Velasco

Willy Fitra Hendria

Yasmin Moslem

Noah Flynn

Peerat Limkonchotiwat

CoRR, 2024

CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark.

[DOI]

David Romero

Chenyang Lyu

Haryo Akbarianto Wibowo

David Ifeoluwa Adelani

Henok Biadglign Ademtew

Hernán Maina

Israel Abebe Azime

Jesús-Germán Ortiz-Barajas

Jay P. Gala

Jiahui Geng

Jinheon Baek

Jocelyn Dunstan

Laura Alonso Alemany

Kumaranage Ravindu Yasas Nagasinghe

Luciana Benotti

Luis Fernando D'Haro

Marcelo Viridiano

Marcos Estecha-Garitagoitia

Maria Camila Buitrago Cabrera

Mario Rodríguez-Cantelar

Mélanie Jouitteau

Mihail Mihaylov

Mohamed Fazli Mohamed Imam

Jesús-Germán Ortiz-Barajas

Munkhjargal Gochoo

Munkh-Erdene Otgonbold

Tiago Timponi Torrent

Toqeer Ehsan

Vladimir Araujo

Yova Kementchedjhieva

CoRR, 2024

CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark.

[DOI]

David Romero

Chenyang Lyu

Haryo Akbarianto Wibowo

Santiago Góngora

Aishik Mandal

Sukannya Purkayastha

Munkh-Erdene Otgonbold

Tiago Timponi Torrent

Frederico Belcavello

Marcelo Viridiano

Christian Salamea Palacios

Vladimir Araujo

Yova Kementchedjhieva

Mihail Mihaylov

Israel Abebe Azime

Henok Biadglign Ademtew

Bontu Fufa Balcha

Naome A. Etori

David Ifeoluwa Adelani

Rada Mihalcea

Atnafu Lambebo Tonja

Maria Camila Buitrago Cabrera

Gisela Vallejo

Marcos Estecha-Garitagoitia

Ruochen Zhang

Mario Rodríguez-Cantelar

Toqeer Ehsan

Rendi Chevi

Mohamed Fazli Mohamed Imam

Kumaranage Ravindu Yasas Nagasinghe

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Are Multilingual LLMs Culturally-Diverse Reasoners? An Investigation into Multicultural Proverbs and Sayings.

[DOI]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages.

[DOI]

Rahmad Mahendra

Lester James V. Miranda

Muhammad Ravi Shulthan Habibi

Onno Kampman

Joel Ruben Antony Moniz

Patrick Amadeus Irawan

Bin Wang

Muhammad Dehan Al Kautsar

Chenxi Whitehouse

Ivan Halim Parmonangan

Sonny Lazuardi Hermawan

Dan John Velasco

Willy Fitra Hendria

Yasmin Moslem

Noah Flynn

Abdelrahman Boda Sadallah

Peerat Limkonchotiwat

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon.

[DOI]

Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024

ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic.

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages.

[DOI]

Emmanuel Dave

Nuur Shadieq

Muhammad Ihza Mahendra

Dea Annisayanti Putri

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

CMMLU: Measuring massive multitask language understanding in Chinese.

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

LLM360: Towards Fully Transparent Open-Source LLMs.

[DOI]

CoRR, 2023

Are Multilingual LLMs Culturally-Diverse Reasoners? An Investigation into Multicultural Proverbs and Sayings.

[DOI]

CoRR, 2023

Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models.

[DOI]

CoRR, 2023

Bactrian-X : A Multilingual Replicable Instruction-Following Model with Low-Rank Adaptation.

[DOI]

CoRR, 2023

NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages.

[DOI]

Jhonson Lee

Nuur Shadieq

Tjeng Wawan Cenggoro

Hanung Wahyuning Linuwih

Bryan Wilie

Galih Pradipta Muridan

Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 2023

Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU.

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages.

[DOI]

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

NusaCrowd: Open Source Initiative for Indonesian NLP Resources.

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

FFCI: A Framework for Interpretable Automatic Evaluation of Summarization.

[DOI]

J. Artif. Intell. Res., 2022

NusaCrowd: Open Source Initiative for Indonesian NLP Resources.

[DOI]

CoRR, 2022

NusaCrowd: A Call for Open and Reproducible NLP Research in Indonesian Languages.

[DOI]

CoRR, 2022

NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages.

[DOI]

CoRR, 2022

LipKey: A Large-Scale News Dataset for Absent Keyphrases Generation and Abstractive Summarization.

[DOI]

Proceedings of the 29th International Conference on Computational Linguistics, 2022

One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia.

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Context-Aware Sentence Classification in Evidence-Based Medicine.

[DOI]

Biaoyan Fang

Proceedings of the 20th Annual Workshop of the Australasian Language Technology Association, 2022

2021

Discourse Probing of Pretrained Language Models.

[DOI]

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization.

[DOI]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Top-down Discourse Parsing via Sequence Labelling.

[DOI]

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

Evaluating the Efficacy of Summarization Evaluation across Languages.

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

Handling Variance of Pretrained Language Models in Grading Evidence in the Medical Literature.

[DOI]

Biaoyan Fang

Proceedings of the 19th Annual Workshop of the Australasian Language Technology Association, 2021

2020

Towards Computational Linguistics in Minangkabau Language: Studies on Sentiment Analysis and Machine Translation.

[DOI]

Ikhwan Koto

Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation, 2020

Liputan6: A Large-scale Indonesian Dataset for Text Summarization.

[DOI]

Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 2020

IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP.

[DOI]

Proceedings of the 28th International Conference on Computational Linguistics, 2020

2019

Improved Document Modelling with a Neural Discourse Parser.

[DOI]

Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association, 2019

2017

Inset lexicon: Evaluation of a word list for Indonesian sentiment analysis in microblogs.

[DOI]

Gemala Y. Rahmaningtyas

Proceedings of the 2017 International Conference on Asian Language Processing, 2017

2016

A Publicly Available Indonesian Corpora for Automatic Abstractive and Extractive Chat Summarization.

[DOI]

Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

2015

A Comparative Study on Twitter Sentiment Analysis: Which Features are Good?

[DOI]

Mirna Adriani

Proceedings of the Natural Language Processing and Information Systems, 2015

HBE: Hashtag-Based Emotion Lexicons for Twitter Sentiment Analysis.

[DOI]

Mirna Adriani

Proceedings of the 7th Forum for Information Retrieval Evaluation, 2015

The Use of POS Sequence for Analyzing Sentence Pattern in Twitter Sentiment Analysis.

[DOI]