HomeLREC 2026WorkshopsBUCClrec2026-ws-bucc-05
Back to BUCC 2026
LREC 2026workshop

Comparable Corpora in Cross-linguistic Research: Nominal Number in English, Czech, and Greek

Proceedings of the 19th Workshop on Building and Using Comparable Corpora (BUCC)

DOI:10.63317/4nz4cptoiwv3

Abstract

The paper examines the use of comparable corpora for contrastive research on the category of nominal number across three languages—English, Czech, and Greek. Two objectives are pursued: a cross-linguistic analysis of number and an assessment of the impact of automatic annotation on linguistic findings. For this study, corpora of comparable size and composition were compiled for the three languages from the Leipzig Corpora Collection. The data were automatically annotated using two open-access tools, Stanza and UDPipe, producing six datasets (two per language), each containing about 5 million sentences and 100 million tokens. Although derived from the same source, the paired datasets for each language differ in sentence and word segmentation, in the number of nouns identified, and in the number values assigned. These differences, nevertheless, do not appear to substantially affect the overall picture of number in the languages examined. The distribution of lemmas by the ratio of singular and plural forms challenges the view commonly presented in grammars that most nouns occur in both numbers and that singular-only and plural-only nouns are rare. However, a closer analysis of nouns assumed to have defective number indicates that answers to more nuanced questions vary depending on the annotation tool used.

Details

Paper ID
lrec2026-ws-bucc-05
Pages
pp. 30-40
BibKey
diamantopoulos-etal-2026-comparable
Editors
Reinhard Rapp, Ayla Rigouts Terryn, Serge Sharoff, Pierre Zweigenbaum
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 19th Workshop on Building and Using Comparable Corpora (BUCC)
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • KD

    Konstantinos Diamantopoulos

  • Magda Ševčíková

Links