Back to Main Conference 2008
LREC 2008main

Designing and Evaluating a Russian Tagset

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/35g8pne924yf

Abstract

This paper reports the principles behind designing a tagset to cover Russian morphosyntactic phenomena, modifications of the core tagset, and its evaluation. The tagset is based on the MULTEXT-East framework, while the decisions in designing it were aimed at achieving a balance between parameters important for linguists and the possibility to detect and disambiguate them automatically. The final tagset contains about 500 tags and achieves about 95% accuracy on the disambiguated portion of the Russian National Corpus. We have also produced a test set that can be shared with other researchers.

Details

Paper ID
lrec2008-main-539
Pages
N/A
BibKey
sharoff-etal-2008-designing
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • SS

    Serge Sharoff

  • MK

    Mikhail Kopotev

  • TE

    Tomaž Erjavec

  • AF

    Anna Feldman

  • DD

    Dagmar Divjak

Links