2025
EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements.
CoRR, June, 2025

llm-jp-modernbert: A ModernBERT Model Trained on a Large-Scale Japanese Corpus with Long Context Length.
CoRR, April, 2025

Developing Japanese CLIP Models Leveraging an Open-weight LLM for Large-scale Dataset Translation.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

2024
Constructing Multimodal Datasets from Scratch for Rapid Development of a Japanese Visual Language Model.
CoRR, 2024

LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs.
CoRR, 2024

A Comprehensive Analysis of Memorization in Large Language Models.
Proceedings of the 17th International Natural Language Generation Conference, 2024