Back to Main Conference 2022
LREC 2022main

From FreEM to D’AlemBERT: a Large Corpus and a Language Model for Early Modern French

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/22guf3yfi2fk

Abstract

anguage models for historical states of language are becoming increasingly important to allow the optimal digitisation and analysis of old textual sources. Because these historical states are at the same time more complex to process and more scarce in the corpora available, this paper presents recent efforts to overcome this difficult situation. These efforts include producing a corpus, creating the model, and evaluating it with an NLP task currently used by scholars in other ongoing projects.

Details

Paper ID
lrec2022-main-359
Pages
pp. 3367-3374
BibKey
gabay-etal-2022-freem
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 June 2022 25 June 2022

Authors

  • SG

    Simon Gabay

  • PO

    Pedro Ortiz Suarez

  • AB

    Alexandre Bartz

  • AC

    Alix Chagué

  • RB

    Rachel Bawden

  • PG

    Philippe Gambette

  • BS

    Benoît Sagot

Links