Back to Main Conference 2026
LREC 2026main

A Corpus-Based Profiling of Regional English Variants in Global Media: Insights from Olympic Journalism

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/3kzf8hoic5ht

Abstract

This paper investigates the distinctive linguistic characteristics of regional English variants through a quantitative analysis of global media coverage. The study applies advanced classification techniques, integrating GPT-based embeddings with Support Vector Machines, to a novel corpus, the Olympic Journalism English Variants Corpus. Comprising news articles related to Olympic Games covered by prominent news outlets in the United States, China, Spain, and Mexico between 2020 and 2023, this corpus enables a fine-grained analysis of 164 linguistic features across lexical, syntactic, readability, and sentiment dimensions. The findings reveal strong and interpretable distinctions in features such as verb ratio, nominality, and readability. This study not only demonstrated the enhanced classification capabilities of the model (optimized F1 score = 97.2), but also yielded deeper, data-driven stylistic analysis and insights of each English variant. This work provides a potential template that can be expanded to other World Englishes research.

Details

Paper ID
lrec2026-main-503
Pages
pp. 6340-6348
BibKey
mao-2026-corpus
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • FM

    Felix Mao

Links