HomeLREC 2026WorkshopsNSLPlrec2026-ws-nslp-17
Back to NSLP 2026
LREC 2026workshop

Generating Research Data Metadata from Their Accompanying README Files

Proceedings of Natural Scientific Language Processing (NSLP) @ LREC 2026

DOI:10.63317/27oe6uwv2fws

Abstract

Software repositories have conventionally been used for software development. Recently, they have also served as research data repositories. Research data published in such repositories are frequently accompanied by README files; however, the data frequently lack structured metadata. To address this issue, this paper investigates the feasibility of generating research data metadata from their accompanying README files. First, we analyze the occurrence patterns of metadata-related information in README files. The results of this analysis demonstrated that README files could serve as valuable resources for metadata generation. We then performed an experiment on extracting metadata-related information from README files using large language models (LLMs) and evaluated their performance. The experimental results demonstrated that LLMs could extract metadata-related information with high performance.

Details

Paper ID
lrec2026-ws-nslp-17
Pages
pp. 180-185
BibKey
sekido-etal-2026-generating
Editors
Georg Rehm, Stefan Dietze, Danilo Dessi, Diana Maynard, Sonja Schimmler
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of Natural Scientific Language Processing (NSLP) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • KS

    Kotaro Sekido

  • YW

    Yu Watanabe

  • KI

    Koichiro Ito

  • SM

    Shigeki Matsubara

Links