Back to Main Conference 2024
LREC-COLING 2024main

LinguaMeta: Unified Metadata for Thousands of Languages

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/3hwcoqkcp23x

Abstract

We introduce LinguaMeta, a unified resource for language metadata for thousands of languages, including language codes, names, number of speakers, writing systems, countries, official status, coordinates, and language varieties. The resources are drawn from various existing repositories and supplemented with our own research. Each data point is tagged for its origin, allowing us to easily trace back to and improve existing resources with more up-to-date and complete metadata. The resource is intended for use by researchers and organizations who aim to extend technology to thousands of languages.

Details

Paper ID
lrec2024-main-0921
Pages
pp. 10530-10538
BibKey
ritchie-etal-2024-linguameta
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • SR

    Sandy Ritchie

  • Dv

    Daan van Esch

  • UO

    Uche Okonkwo

  • SV

    Shikhar Vashishth

  • ED

    Emily Drummond

Links