HomeLREC 2022WorkshopsISAlrec2022-ws-isa-03
Back to ISA 2022
LREC 2022workshop

Guidelines and a Corpus for Extracting Biographical Events

Proceedings of the 18th Joint ACL - ISO Workshop on Interoperable Semantic Annotation within LREC2022

DOI:10.63317/58wkromnqtfm

Abstract

Despite biographies are widely spread within the Semantic Web, resources and approaches to automatically extract biographical events are limited. Such limitation reduces the amount of structured, machine-readable biographical information, especially about people belonging to underrepresented groups. Our work challenges this limitation by providing a set of guidelines for the semantic annotation of life events. The guidelines are designed to be interoperable with existing ISO-standards for semantic annotation: ISO-TimeML (SO-24617-1), and SemAF (ISO-24617-4). Guidelines were tested through an annotation task of Wikipedia biographies of underrepresented writers, namely authors born in non-Western countries, migrants, or belonging to ethnic minorities. 1,000 sentences were annotated by 4 annotators with an average Inter-Annotator Agreement of 0.825. The resulting corpus was mapped on OntoNotes. Such mapping allowed to to expand our corpus, showing that already existing resources may be exploited for the biographical event extraction task.

Details

Paper ID
lrec2022-ws-isa-03
Pages
pp. 20-26
BibKey
stranisci-etal-2022-guidelines
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 18th Joint ACL - ISO Workshop on Interoperable Semantic Annotation within LREC2022
Location
undefined, undefined
Date
20 June 2022 25 June 2022

Authors

  • MS

    Marco Antonio Stranisci

  • EM

    Enrico Mensa

  • RD

    Rossana Damiano

  • DR

    Daniele Radicioni

  • OD

    Ousmane Diakite

Links