2025
SENAI: Towards Software Engineering Native Generative Artificial Intelligence.
CoRR, March, 2025

On Inter-Dataset Code Duplication and Data Leakage in Large Language Models.
IEEE Trans. Software Eng., January, 2025

2024
ALPINE: An adaptive language-agnostic pruning method for language models for code.
CoRR, 2024

CONCORD: Towards a DSL for Configurable Graph Code Representation.
CoRR, 2024

Enhancing Identifier Naming Through Multi-Mask Fine-Tuning of Language Models of Code.
Proceedings of the IEEE International Conference on Source Code Analysis and Manipulation, 2024

Naturalness of Attention: Revisiting Attention in Code Language Models.
Proceedings of the 2024 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results, 2024

2023
On Inter-dataset Code Duplication and Data Leakage in Large Language Models.
Dataset, December, 2023

Calibrating Deep Learning-based Code Smell Detection using Human Feedback.
Proceedings of the 23rd IEEE International Working Conference on Source Code Analysis and Manipulation, 2023

DACOS - A Manually Annotated Dataset of Code Smells.
Proceedings of the 20th IEEE/ACM International Conference on Mining Software Repositories, 2023