Back to Main Conference 2014
LREC 2014main

Facing the Identification Problem in Language-Related Scientific Data Analysis.

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)

DOI:10.63317/36iw5a9dwb2s

Abstract

This paper describes the problems that must be addressed when studying large amounts of data over time which require entity normalization applied not to the usual genres of news or political speech, but to the genre of academic discourse about language resources, technologies and sciences. It reports on the normalization processes that had to be applied to produce data usable for computing statistics in three past studies on the LRE Map, the ISCA Archive and the LDC Bibliography. It shows the need for human expertise during normalization and the necessity to adapt the work to the study objectives. It investigates possible improvements for reducing the workload necessary to produce comparable results. Through this paper, we show the necessity to define and agree on international persistent and unique identifiers.

Details

Paper ID
lrec2014-main-717
Pages
N/A
BibKey
mariani-etal-2014-facing
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-8-4
Conference
Ninth International Conference on Language Resources and Evaluation
Location
Reykjavik, Iceland
Date
26 May 2014 31 May 2014

Authors

  • JM

    Joseph Mariani

  • CC

    Christopher Cieri

  • GF

    Gil Francopoulo

  • PP

    Patrick Paroubek

  • MD

    Marine Delaborde

Links