Back to Main Conference 2006
LREC 2006main

Building Carefully Tagged Bilingual Corpora to Cope with Linguistic Idiosyncrasy

Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006)

DOI:10.63317/2wym9fdkqsxm

Abstract

We illustrate the effectiveness of medium-sized carefully tagged bilingual core corpus, that is, “semantic typology patterns” in our term together with some examples to give concrete evidence of its usefulness. The most important characteristic of these semantic typology patterns is the bridging mechanism between two languages which is based on sequences syntactic codes and semantic codes. This characteristic gives both wide coverage and flexible applicability of core bilingual core corpus though its volume size is not so large. A further work is to be done for grasping some intuitive feeling of pertinent coarseness and fineness of patterns. Here coarseness feeling is concerning the generalization in phrase-level and clause-level semantic patterns and fineness is concerning word-level semantic patterns. Based on this feeling we will complete the core tagged bilingual corpora while enhancing the necessary support functions and utilities.

Details

Paper ID
lrec2006-main-442
Pages
N/A
BibKey
nitta-etal-2006-building
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-2-4
Conference
Fifth International Conference on Language Resources and Evaluation
Location
Genoa, Italy
Date
24 May 2006 26 May 2006

Authors

  • YN

    Yoshihiko Nitta

  • MS

    Masashi Saraki

  • SI

    Satoru Ikehara

Links