2025

LLM360 K2: Building a 65B 360-Open-Source Large Language Model from Scratch.

[DOI]

,

,

,

Willie Neiswanger

,

,

,

,

,

,

Omkar Pangarkar

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, January, 2025

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models.

[DOI]

,

,

,

,

,

,

,

,

,

Timothy Baldwin

,

,

J. Artif. Intell. Res., 2025

2024

Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability.

[DOI]

CoRR, 2024

ToolGen: Unified Tool Retrieval and Calling via Generation.

[DOI]

,

,

,

,

Timothy Baldwin

,

CoRR, 2024

Loki: An Open-Source Tool for Fact Verification.

[DOI]

,

,

,

,

,

,

,

,

,

Timothy Baldwin

CoRR, 2024

EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models.

[DOI]

Rocktim Jyoti Das

,

Simeon Emilov Hristov

,

,

Dimitar Iliyanov Dimitrov

,

,

CoRR, 2024

A Chinese Dataset for Evaluating the Safeguards in Large Language Models.

[DOI]

,

,

,

,

,

,

,

,

Timothy Baldwin

CoRR, 2024

Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents.

[DOI]

,

,

,

,

Timothy Baldwin

CoRR, 2024

Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs.

[DOI]

,

,

Rusiru Thushara

,

Mohammad Qazim Bhat

,

,

,

,

,

,

,

,

,

Timothy Baldwin

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Do-Not-Answer: Evaluating Safeguards in LLMs.

[DOI]

,

,

,

,

Timothy Baldwin

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2024, 2024

A Chinese Dataset for Evaluating the Safeguards in Large Language Models.

[DOI]

,

,

,

,

,

,

,

,

Timothy Baldwin

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Demystifying Instruction Mixing for Fine-tuning Large Language Models.

[DOI]

,

,

,

,

,

,

Timothy Baldwin

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024

ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic.

[DOI]

,

,

,

,

Abdelrahman Boda Sadallah

,

,

Khalid Almubarak

,

,

,

,

,

,

Timothy Baldwin

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification.

[DOI]

Ekaterina Fadeeva

,

Aleksandr Rubashevskii

,

Artem Shelmanov

,

Sergey Petrakov

,

,

,

Evgenii Tsymbalov

,

,

Alexander Panchenko

,

Timothy Baldwin

,

,

Proceedings of the Findings of the Association for Computational Linguistics, 2024

EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models.

[DOI]

Rocktim Jyoti Das

,

Simeon Emilov Hristov

,

,

Dimitar Dimitrov

,

,

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

CMMLU: Measuring massive multitask language understanding in Chinese.

[DOI]

,

,

,

,

,

,

,

Timothy Baldwin

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

Understanding the Instruction Mixture for Large Language Model Fine-tuning.

[DOI]

,

,

,

,

,

CoRR, 2023

LLM360: Towards Fully Transparent Open-Source LLMs.

[DOI]

,

,

Willie Neiswanger

,

,

,

,

,

,

,

Omkar Pangarkar

,

,

,

,

,

,

,

,

,

,

,

,

Roberto Iriondo

,

,

,

,

,

,

CoRR, 2023

Can Large Language Model Comprehend Ancient Chinese? A Preliminary Test on ACLUE.

[DOI]

,

CoRR, 2023

Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models.

[DOI]

CoRR, 2023

Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs.

[DOI]

,

,

,

,

Timothy Baldwin

CoRR, 2023

Bactrian-X : A Multilingual Replicable Instruction-Following Model with Low-Rank Adaptation.

[DOI]

,

,

,

Alham Fikri Aji

,

Timothy Baldwin

CoRR, 2023

Can Large Langauge Model Comprehend Ancient Chinese? A Preliminary Test on ACLUE.

[DOI]

,

Proceedings of the Ancient Language Processing Workshop, 2023

Location Aware Modular Biencoder for Tourism Question Answering.

[DOI]

,

,

Timothy Baldwin

Proceedings of the Findings of the Association for Computational Linguistics: IJCNLP-AACL 2023, 2023

Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU.

[DOI]

,

,

,

Timothy Baldwin

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2022

Neural Character-Level Syntactic Parsing for Chinese.

[DOI]

,

,

,

,

,

J. Artif. Intell. Res., 2022

MultiSpanQA: A Dataset for Multi-Span Question Answering.

[DOI]

,

,

Maria Vasardani

,

Timothy Baldwin

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

CULG: Commercial Universal Language Generation.

[DOI]

,

,

,

,

,

Timothy Baldwin

,

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track, 2022

Sentiment-Aware Word and Sentence Level Pre-training for Sentiment Analysis.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021

KFCNet: Knowledge Filtering and Contrastive Learning Network for Generative Commonsense Reasoning.

[DOI]

,

,

,

,

Timothy Baldwin

,

CoRR, 2021

KFCNet: Knowledge Filtering and Contrastive Learning for Generative Commonsense Reasoning.

[DOI]

,

,

,

,

Timothy Baldwin

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

2020

Target Word Masking for Location Metonymy Resolution.

[DOI]

,

Maria Vasardani

,

,

Timothy Baldwin

Proceedings of the 28th International Conference on Computational Linguistics, 2020

2019

UniMelb at SemEval-2019 Task 12: Multi-model combination for toponym resolution.

[DOI]

,

,

Timothy Baldwin

,

,

Maria Vasardani

Proceedings of the 13th International Workshop on Semantic Evaluation, 2019

Classifying Relation via Piecewise Convolutional Neural Networks with Transfer Learning.

[DOI]

,

,

,

,

,

Proceedings of the Man-Machine Interactions 6, 2019

Place Questions and Human-Generated Answers: A Data Analysis Approach.

[DOI]

,

,

Maria Vasardani

,

Timothy Baldwin

,

,

Proceedings of the Geospatial Technologies for Local and Regional Development, 2019

2018

Neural Character-level Dependency Parsing for Chinese.

[DOI]

,

,

,

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018