Back to Main Conference 2018
LREC 2018main
Manually Annotated Corpus of Polish Texts Published between 1830 and 1918
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Abstract
The paper presents a manually annotated 625,000 tokens large historical corpus of -- fiction, drama, popular science, essays and newspapers of the period. The corpus provides three layers: transliteration, transcription and morphosyntactic annotation. The annotation process as well as the corpus itself are described in detail in the paper.