Request Correction
Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.
Correction Guidelines
- Click the edit button next to a field to report a correction.
- Fill in the suggested correction value for each field you want to correct.
- Provide your name and email so we can contact you if needed.
Paper Information
Rebelòt: Datasets and Token-Level Language Identification for Lombard-Italian-English Code-Mixing
Paper Fields
Click the edit button next to a field to report a correction.
Rebelòt: Datasets and Token-Level Language Identification for Lombard-Italian-English Code-Mixing
Lombard is an endangered and under-resourced Gallo-Italic language variety that exists with Standard Italian. As with other language varieties of Italy, code-switching and code-mixing is common between Lombard and Italian in everyday conversation and with English, online. This linguistic complexity, and the lack of a unified written standard, poses challenges for Natural Language Processing tools. We introduce Rebelòt, a novel multi-domain, token-level annotated dataset for Lombard-Italian-English code-mixing. Furthermore, we develop and evaluate three variants of a token-level Language Identification (LID) tool based on a pre-trained encoder architecture, fine-tuned using both authentic data from our corpus and synthetically generated code-mixed text. Our evaluation demonstrates that the optimal model variant achieves an accuracy of over 0.99 on token-level prediction, and substantially outperforms widely used off-the-shelf LID baselines at sentence-level.
Authors
Expand an author to correct their information. Use the remove button to request author removal, or add a new author.
PDF Attachment
You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.
Your Information
Author Declaration *
Select at least one field to correct using the edit buttons above.