Back to Main Conference 2022
LREC 2022main

SpecNFS: A Challenge Dataset Towards Extracting Formal Models from Natural Language Specifications

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/44sgctjot5ob

Abstract

Can NLP assist in building formal models for verifying complex systems? We study this challenge in the context of parsing Network File System (NFS) specifications. We define a semantic-dependency problem over SpecIR, a representation language we introduce to model sentences appearing in NFS specification documents (RFCs) as IF-THEN statements, and present an annotated dataset of 1,198 sentences. We develop and evaluate semantic-dependency parsing systems for this problem. Evaluations show that even when using a state-of-the-art language model, there is significant room for improvement, with the best models achieving an F1 score of only 60.5 and 33.3 in the named-entity-recognition and dependency-link-prediction sub-tasks, respectively. We also release additional unlabeled data and other domain-related texts. Experiments show that these additional resources increase the F1 measure when used for simple domain-adaption and transfer-learning-based approaches, suggesting fruitful directions for further research

Details

Paper ID
lrec2022-main-233
Pages
pp. 2166-2176
BibKey
ghosh-etal-2022-specnfs
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 June 2022 25 June 2022

Authors

  • SG

    Sayontan Ghosh

  • AS

    Amanpreet Singh

  • AM

    Alex Merenstein

  • WS

    Wei Su

  • SS

    Scott A. Smolka

  • EZ

    Erez Zadok

  • NB

    Niranjan Balasubramanian

Links