Semantic Frame Parsing for Information Extraction : the CALOR corpus

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Abstract

This paper presents a publicly available corpus of French encyclopedic history texts annotated according to the Berkeley FrameNet formalism. The main difference in our approach compared to previous works on semantic parsing with FrameNet is that we are not interested here in full text parsing but rather on partial parsing. The goal is to select from the FrameNet resources the minimal set of frames that are going to be useful for the applicative framework targeted, in our case Information Extraction from encyclopedic documents. Such an approach leverage the manual annotation of larger corpus than those obtained through full text parsing and therefore open the door to alternative methods for Frame parsing than those used so far on the FrameNet 1.5 benchmark corpus. The approaches compared in this study rely on an integrated sequence labeling model which jointly optimizes frame identification and semantic role segmentation and identification. The models compared are CRFs and multitasks bi-LSTMs.

Resources

Details

Paper ID

lrec2018-main-159

Pages

N/A

DOI

10.63317/5hpmtq47z62u

BibKey

marzinotto-etal-2018-semantic

Editors

Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, Takenobu Tokunaga

Publisher

European Language Resources Association (ELRA)

ISSN

2522-2686

ISBN

79-10-95546-00-9

Conference

Eleventh International Conference on Language Resources and Evaluation

Location

Miyazaki, Japan

Date

7 - 12 May 2018

Authors

GM
Gabriel Marzinotto
JA
Jeremy Auguste
FB
Frederic Bechet
GD
Geraldine Damnati
AN
Alexis Nasr

Links

URL

DOI