Back to Main Conference 2000
LREC 2000main

TyPTex: Inductive Typological Text Classification by Multivariate Statistical Analysis for NLP Systems Tuning/Evaluation

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/29rg2k6jq37a

Abstract

The increasing use of methods in natural language processing (NLP) which are based on huge corpora require that the lexical, morpho-syntactic and syntactic homogeneity of texts be mastered. We have developed a methodology and associate tools for text calibration or ''profiling'' within the ELRA benchmark called ''Contribution to the construction of contemporary french corpora'' based on multivariate analysis of linguistic features. We have integrated these tools within a modular architecture based on a generic model allowing us on the one hand flexible annotation of the corpus with the output of NLP and statistical tools and on the other hand retracing the results of these tools through the annotation layers back to the primary textual data. This allows us to justify our interpretations.

Details

Paper ID
lrec2000-main-193
Pages
N/A
BibKey
folch-etal-2000-typtex
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • HF

    Helka Folch

  • SH

    Serge Heiden

  • BH

    Benoît Habert

  • SF

    Serge Fleury

  • GI

    Gabriel Illouz

  • PL

    Pierre Lafon

  • JN

    Julien Nioche

  • SP

    Sophie Prévost

Links