Back to Main Conference 2010
LREC 2010main

Constructing a Broad-coverage Lexicon for Text Mining in the Patent Domain

Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010)

DOI:10.63317/44tky77uusy8

Abstract

For mining intellectual property texts (patents), a broad-coverage lexicon that covers general English words together with terminology from the patent domain is indispensable. The patent domain is very diffuse as it comprises a variety of technical domains (e.g. Human Necessities, Chemistry & Metallurgy and Physics in the International Patent Classification). As a result, collecting a lexicon that covers the language used in patent texts is not a straightforward task. In this paper we describe the approach that we have developed for the semi-automatic construction of a broad-coverage lexicon for classification and information retrieval in the patent domain and which combines information from multiple sources. Our contribution is twofold. First, we provide insight into the difficulties of developing lexical resources for information retrieval and text mining in the patent domain, a research and development field that is expanding quickly. Second, we create a broad coverage lexicon annotated with rich lexical information and containing both general English word forms and domain terminology for various technical domains.

Details

Paper ID
lrec2010-main-258
Pages
N/A
BibKey
oostdijk-etal-2010-constructing
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-6-7
Conference
Seventh International Conference on Language Resources and Evaluation
Location
Valletta, Malta
Date
17 May 2010 23 May 2010

Authors

  • NO

    Nelleke Oostdijk

  • SV

    Suzan Verberne

  • CK

    Cornelis Koster

Links