Back to Main Conference 2008
LREC 2008main

A Comparative Study on Language Identification Methods

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/3sfctgtdb5ts

Abstract

In this paper we present two experiments conducted for comparison of different language identification algorithms. Short words-, frequent words- and n-gram-based approaches are considered and combined with the Ad-Hoc Ranking classification method. The language identification process can be subdivided into two main steps: first a document model is generated for the document and a language model for the language; second the language of the document is determined on the basis of the language model and is added to the document as additional information. In this work we present our evaluation results and discuss the importance of a dynamic value for the out-of-place measure.

Details

Paper ID
lrec2008-main-118
Pages
N/A
BibKey
grothe-etal-2008-comparative
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • LG

    Lena Grothe

  • ED

    Ernesto William De Luca

  • AN

    Andreas Nürnberger

Links