Back to Main Conference 2026
LREC 2026main

Detecting Potentially Under-annotated Explicit Discourse Connectives in the Penn Discourse Treebank (PDTB-3) with LLMs

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/5392qqgkyzs5

Abstract

Accurate identification of explicit discourse connectives is crucial for analysing discourse relations, which supports NLP tasks such as summarisation and question answering. However, annotation inconsistencies remain a challenge, particularly for ambiguous prepositions with both discourse and non-discourse usages. This paper presents a pipeline that leverages large language model (LLM) prompting, cross-model agreement, and syntactic pattern analysis to detect likely under-annotated connectives. Evaluated on four prepositions (by, with, without and for), the approach effectively identifies likely under-annotations for some, but not all prepositions. Results show that while the method is promising, its generalisability depends on improved prompt design, model choice, and syntactic analysis tools. The findings highlight both the potential and limitations of LLM-based approaches for corpus error detection and demonstrate how improved discourse annotation can contribute to more reliable data for downstream NLP tasks.

Details

Paper ID
lrec2026-main-158
Pages
pp. 2012-2023
BibKey
chuang-etal-2026-detecting
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • YC

    Yueh-Ting Chuang

  • XL

    Xixian Liao

  • BW

    Bonnie Webber

Links