SPADE: Evaluation Dataset for Monolingual Phrase Alignment

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Abstract

We create the SPADE (Syntactic Phrase Alignment Dataset for Evaluation) for systematic research on syntactic phrase alignment in paraphrasal sentences. This is the first dataset to shed lights on syntactic and phrasal paraphrases under linguistically motivated grammar. Existing datasets available for evaluation on phrasal paraphrase detection define the unit of phrase as simply sequence of words without syntactic structures due to difficulties caused by the non-homographic nature of phrase correspondences in sentential paraphrases. Different from these, the SPADE provides annotations of gold parse trees by a linguistic expert and gold phrase alignments identified by three annotators. Consequently, 20,276 phrases are extracted from 201 sentential paraphrases, on which 15,721 alignments are obtained that at least one annotator regarded as paraphrases. The SPADE is available at Linguistic Data Consortium for future research on paraphrases. In addition, two metrics are proposed to evaluate to what extent the automatic phrase alignment results agree with the ones identified by humans. These metrics allow objective comparison of performances of different methods evaluated on the SPADE. Benchmarks to show performances of humans and the state-of-the-art method are presented as a reference for future SPADE users.

Resources

Details

Paper ID

lrec2018-main-220

Pages

N/A

DOI

10.63317/5894atgjvr8a

BibKey

arase-tsujii-2018-spade

Editors

Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, Takenobu Tokunaga

Publisher

European Language Resources Association (ELRA)

ISSN

2522-2686

ISBN

79-10-95546-00-9

Conference

Eleventh International Conference on Language Resources and Evaluation

Location

Miyazaki, Japan

Date

7 - 12 May 2018

Authors

YA
Yuki Arase
JT
Junichi Tsujii

Links

URL

DOI