Back to Main Conference 2004
LREC 2004main

A Methodology and Associated Tools for Building Interlingual Wordnets

Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)

DOI:10.63317/24itz3shga32

Abstract

The paper describes the methodology and the tools we developed for the purpose of building a Romanian wordnet. The work is carried out within the BalkaNet European project and is concerned with wordnets for Bulgarian, Czech, Greek, Romanian, Serbian and Turkish all of them aligned via an interlingual index (ILI) to Princeton Wordnet. The wordnets structuring follows the principles adopted in EuroWordNet. In order to ensure maximal cross-lingual lexical coverage, the consortium decided to implement the same concepts, represented by a common set of ILI concepts. We describe the selection of concepts to be implemented in all the monolingual wordnets The methodologies adopted by each partner were different and they depended on the language resources and personnel available. For the Romanian wordnet,we decided that it should be based on the reference lexicographic descriptions of Romanian which we had in electronic forms: EXPD, a heavily XML annotated explanatory dictionary (developed in the previous CONCEDE project and based on the standard Explanatory Dictionary of Romanian), SYND, a published dictionary of synonyms which we keyboarded, encoded and completed with more than 4000 new synonymy sets extracted from EXPD, EnRoD, a Romanian-English dictionary, most part of it being extracted automatically from parallel corpora and further hand validated and extended. Besides these monolingual resources, as all the other members of the consortium, we had at our disposal the interlingual mapping of the Princeton Wordnet. All the above mentioned resources have been incorporated into a user-friendly system, WnBuilder, which allows for cooperative work of a large number of lexicographers. When the distributed work is put together, the synsets are validated. Several errors show up, the most frequent and difficult to solve being the case of a literal with the same sense number appearing in different synsets. We discuss reasons for such conflicts as well as their correction, supported by another utility program called WnCorrector. The full paper presents WnBuilder and WnCorrector, as well as the status of the Romanian wordnet development.

Details

Paper ID
lrec2004-main-159
Pages
N/A
BibKey
tufis-barbu-2004-methodology
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-1-6
Conference
Fourth International Conference on Language Resources and Evaluation
Location
Lisbon, Portugal
Date
26 May 2004 28 May 2004

Authors

  • DT

    Dan Tufis

  • EB

    Eduard Barbu

Links