Anya Belz

CoRR, 2024

Assessing the Portability of Parameter Matrices Trained by Parameter-Efficient Finetuning Methods.

[BibT_eX]

[DOI]

Mohammed Sabry

CoRR, 2024

(Mostly) Automatic Experiment Execution for Human Evaluations of NLP Systems.

[BibT_eX]

[DOI]

Craig Thomson

Proceedings of the 17th International Natural Language Generation Conference, 2024

Filling Gaps in Wikipedia: Leveraging Data-to-Text Generation to Improve Encyclopedic Coverage of Underrepresented Groups.

[BibT_eX]

[DOI]

Massimiliano Pronesti

Proceedings of the 17th International Natural Language Generation Conference, 2024

Differences in Semantic Errors Made by Different Types of Data-to-text Systems.

[BibT_eX]

[DOI]

Rudali Huidrom

Proceedings of the 17th International Natural Language Generation Conference, 2024

QCET: An Interactive Taxonomy of Quality Criteria for Comparable and Repeatable Evaluation of NLP Systems.

[BibT_eX]

[DOI]

Proceedings of the 17th International Natural Language Generation Conference, 2024

Assessing the Portability of Parameter Matrices Trained by Parameter-Efficient Finetuning Methods.

[BibT_eX]

[DOI]

Mohammed Mohammed

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2024, 2024

High-quality Data-to-Text Generation for Severely Under-Resourced Languages with Out-of-the-box Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2024, 2024

Beyond Abstracts: A New Dataset, Prompt Design Strategy and Method for Biomedical Synthesis Generation.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024

2023

Data-to-text Generation for Severely Under-Resourced Languages with GPT-3.5: A Bit of Help Needed from Google Translate.

[BibT_eX]

[DOI]

CoRR, 2023

Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP.

[BibT_eX]

[DOI]

CoRR, 2023

PEFT-Ref: A Modular Reference Architecture and Typology for Parameter-Efficient Finetuning Techniques.

[BibT_eX]

[DOI]

Mohammed Sabry

CoRR, 2023

How to Control Sentiment in Text Generation: A Survey of the State-of-the-Art in Sentiment-Control Techniques.

[BibT_eX]

[DOI]

Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, 2023

Towards a Consensus Taxonomy for Annotating Errors in Automatically Generated Text.

[BibT_eX]

[DOI]

Rudali Huidrom

Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, 2023

Mod-D2T: A Multi-layer Dataset for Modular Data-to-Text Generation.

[BibT_eX]

[DOI]

Proceedings of the 16th International Natural Language Generation Conference, 2023

Exploring Variation of Results from Different Experimental Conditions.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Non-Repeatable Experiments and Non-Reproducible Results: The Reproducibility Crisis in Human Evaluation in NLP.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

A Metrological Perspective on Reproducibility in NLP.

[BibT_eX]

[DOI]

Alex Papadopoulos-Korfiatis

Comput. Linguistics, 2022

User-Driven Research of Medical Note Generation Software.

[BibT_eX]

[DOI]

Tom Knoll

Francesco Moramarco

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Consultation Checklists: Standardising the Human Evaluation of Medical Note Generation.

[BibT_eX]

[DOI]

Aleksandar Savkov

Francesco Moramarco

Alex Papadopoulos-Korfiatis

Mark Perera

Alex Papadopoulos-Korfiatis

Ehud Reiter

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: EMNLP 2022 - Industry Track, Abu Dhabi, UAE, December 7, 2022

Human Evaluation and Correlation with Automatic Metrics in Consultation Note Generation.

[BibT_eX]

[DOI]

Francesco Moramarco

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Quantified Reproducibility Assessment of NLP Results.

[BibT_eX]

[DOI]

Maja Popovic

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021

Quantifying Reproducibility in NLP and ML.

[BibT_eX]

[DOI]

CoRR, 2021

The Human Evaluation Datasheet 1.0: A Template for Recording Details of Human Evaluation Experiments in NLP.

[BibT_eX]

[DOI]

Anastasia Shimorina

CoRR, 2021

A Reproduction Study of an Annotation-based Human Evaluation of MT Outputs.

[BibT_eX]

[DOI]

Maja Popovic

Proceedings of the 14th International Conference on Natural Language Generation, 2021

Another PASS: A Reproduction Study of the Human Evaluation of a Football Report Generation System.

[BibT_eX]

[DOI]

Thiago Castro Ferreira

Brian Davis

Proceedings of the 14th International Conference on Natural Language Generation, 2021

The ReproGen Shared Task on Reproducibility of Human Evaluations in NLG: Overview and Results.

[BibT_eX]

[DOI]

Proceedings of the 14th International Conference on Natural Language Generation, 2021

A Systematic Review of Reproducibility Research in Natural Language Processing.

[BibT_eX]

[DOI]

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

2020

Twenty Years of Confusion in Human Evaluation: NLG Needs Evaluation Sheets and Standardised Definitions.

[BibT_eX]

[DOI]

David M. Howcroft

Miruna-Adriana Clinciu

Proceedings of the 13th International Conference on Natural Language Generation, 2020

Disentangling the Properties of Human Evaluation Methods: A Classification System to Support Comparability, Meta-Evaluation and Reproducibility Testing.

[BibT_eX]

[DOI]

David M. Howcroft

Proceedings of the 13th International Conference on Natural Language Generation, 2020

ReproGen: Proposal for a Shared Task on Reproducibility of Human Evaluations in NLG.

[BibT_eX]

[DOI]