Back to Main Conference 2026
LREC 2026main

SciCiteVal: A Multi-Domain Dataset for Scientific Citation Verification

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4m84m2k77g97

Abstract

Citations are an integral and important part of scientific papers. However, there exist erroneous citations ranging from careless mistakes to deliberate misconduct, and there are currently few studies or benchmark datasets dedicated to automated citation verification. To bridge this gap, we introduce SciCiteVal, a novel, manually annotated dataset for citation verification. Each instance in SciCiteVal pairs a citation context from a citing paper with the corresponding evidence passage extracted from the full text of the cited source. The dataset features a comprehensive taxonomy, where each citation is annotated as "Correct”, "Incorrect”, or "Unrelated”, with the "Incorrect” category further divided into five fine-grained sub-categories. The completed dataset comprises over 1,000 annotated citations, distributed as 302 "Correct”, 302 "Incorrect”, and 430 "Unrelated” instances. We establish a benchmark by evaluating different Large Language Models (LLMs), providing baseline performance and a detailed analysis. We release SciCiteVal as a resource to support the development of citation verification systems and to facilitate research on evidence-based tasks.

Details

Paper ID
lrec2026-main-125
Pages
pp. 1603-1611
BibKey
liu-etal-2026-sciciteval
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • QL

    Qinyue Liu

  • YZ

    Yongxin Zhou

  • CL

    Cyril Labbe

Links