Fajri Koto

According to our database1, Fajri Koto authored at least 40 papers between 2014 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia.
CoRR, 2024

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages.
CoRR, 2024

CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark.
CoRR, 2024

IndoCulture: Exploring Geographically-Influenced Cultural Commonsense Reasoning Across Eleven Indonesian Provinces.
CoRR, 2024

Are Multilingual LLMs Culturally-Diverse Reasoners? An Investigation into Multicultural Proverbs and Sayings.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024


Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon.
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024

ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

CMMLU: Measuring massive multitask language understanding in Chinese.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
LLM360: Towards Fully Transparent Open-Source LLMs.
CoRR, 2023

Are Multilingual LLMs Culturally-Diverse Reasoners? An Investigation into Multicultural Proverbs and Sayings.
CoRR, 2023

Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models.
CoRR, 2023

Bactrian-X : A Multilingual Replicable Instruction-Following Model with Low-Rank Adaptation.
CoRR, 2023

NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages.
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 2023

Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023


2022
FFCI: A Framework for Interpretable Automatic Evaluation of Summarization.
J. Artif. Intell. Res., 2022

NusaCrowd: Open Source Initiative for Indonesian NLP Resources.
CoRR, 2022

NusaCrowd: A Call for Open and Reproducible NLP Research in Indonesian Languages.
CoRR, 2022

NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages.
CoRR, 2022

LipKey: A Large-Scale News Dataset for Absent Keyphrases Generation and Abstractive Summarization.
Proceedings of the 29th International Conference on Computational Linguistics, 2022

One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Discourse Probing of Pretrained Language Models.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Top-down Discourse Parsing via Sequence Labelling.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

Evaluating the Efficacy of Summarization Evaluation across Languages.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

2020
Towards Computational Linguistics in Minangkabau Language: Studies on Sentiment Analysis and Machine Translation.
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation, 2020

Liputan6: A Large-scale Indonesian Dataset for Text Summarization.
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 2020

IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP.
Proceedings of the 28th International Conference on Computational Linguistics, 2020

2019
Improved Document Modelling with a Neural Discourse Parser.
Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association, 2019

2017
Inset lexicon: Evaluation of a word list for Indonesian sentiment analysis in microblogs.
Proceedings of the 2017 International Conference on Asian Language Processing, 2017

2016
A Publicly Available Indonesian Corpora for Automatic Abstractive and Extractive Chat Summarization.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

2015
A Comparative Study on Twitter Sentiment Analysis: Which Features are Good?
Proceedings of the Natural Language Processing and Information Systems, 2015

HBE: Hashtag-Based Emotion Lexicons for Twitter Sentiment Analysis.
Proceedings of the 7th Forum for Information Retrieval Evaluation, 2015

The Use of POS Sequence for Analyzing Sentence Pattern in Twitter Sentiment Analysis.
Proceedings of the 29th IEEE International Conference on Advanced Information Networking and Applications Workshops, 2015

A Study on Natural Expressive Speech: Automatic Memorable Spoken Quote Detection.
Proceedings of the Natural Language Dialog Systems and Intelligent Assistants, 2015

2014
Memorable spoken quote corpora of TED public speaking.
Proceedings of the 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014

The use of semantic and acoustic features for open-domain TED talk summarization.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014


  Loading...