Back to Home

Request Correction

Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.

Correction Guidelines

  1. Click the edit button next to a field to report a correction.
  2. Fill in the suggested correction value for each field you want to correct.
  3. Provide your name and email so we can contact you if needed.

Paper Information

lrec2026-main-103

EPOP: A Benchmark Corpus for Assessing NLP Models on Structured Information Extraction in Plant Health

Paper Fields

Click the edit button next to a field to report a correction.

Title

EPOP: A Benchmark Corpus for Assessing NLP Models on Structured Information Extraction in Plant Health

Abstract

We introduce the EPOP (Epidemiomonitoring of Plants) corpus, a new annotated resource for structured information extraction in the domain of plant health epidemiology. The corpus consists of translated news reports that reflect real-world phytosanitary monitoring scenarios. It includes annotations for named entities (e.g. Plant, Pest, Vector, Disease, Dissemination Pathway), identity coreferences, and both binary and complex n-ary relations that represent key events such as Transmits or Causes, along with their modalities. A distinctive feature of EPOP is its normalization layer where mentions of species and geographical locations are linked to canonical identifiers in the NCBI Taxonomy and GeoNames, enabling semantic disambiguation and integration with external knowledge bases. As the first publicly available corpus of its kind, EPOP presents a realistic and challenging benchmark, with high linguistic variability, entity role ambiguity, and long-distance relations. We report baseline results on core tasks (named entity recognition, normalization (entity-linking), and relation extraction) using both fine-tuned BERT-based models and hard-prompted large language models. These experiments demonstrate the utility of EPOP while also identifying areas for improvement, particularly in the extraction of complex relations. The corpus is released under an open license, to support research in environmental NLP, crop protection, and knowledge graph enrichment.


Authors

Expand an author to correct their information. Use the remove button to request author removal, or add a new author.


PDF Attachment

You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.

Drag & drop a PDF here, or click to select

Your Information

Author Declaration *

Select at least one field to correct using the edit buttons above.