HomeLREC 2020WorkshopsCALCSlrec2020-ws-calcs-1
Back to CALCS 2020
LREC 2020workshop

An Annotated Corpus of Emerging Anglicisms in Spanish Newspaper Headlines

Proceedings of the 4th Workshop on Computational Approaches to Code Switching

DOI:10.63317/38mj45r92qnh

Abstract

The extraction of anglicisms (lexical borrowings from English) is relevant both for lexicographic purposes and for NLP downstream tasks. We introduce a corpus of European Spanish newspaper headlines annotated with anglicisms and a baseline model for anglicism extraction. In this paper we present: (1) a corpus of 21,570 newspaper headlines written in European Spanish annotated with emergent anglicisms and (2) a conditional random field baseline model with handcrafted features for anglicism extraction. We present the newspaper headlines corpus, describe the annotation tagset and guidelines and introduce a CRF model that can serve as baseline for the task of detecting anglicisms. The presented work is a first step towards the creation of an anglicism extractor for Spanish newswire.

Details

Paper ID
lrec2020-ws-calcs-1
Pages
pp. 1-8
BibKey
alvarez-mellado-2020-annotated
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 4th Workshop on Computational Approaches to Code Switching
Location
undefined, undefined
Date
11 May 2020 16 May 2020

Authors

  • EA

    Elena Alvarez-Mellado

Links