Entity Linking for Faroese Using Large Language Models with Web Search
The Fourth Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL 2026)
Abstract
Entity linking connects text mentions to knowledge bases. For low-resource languages, entity linking has typically not been a research priority, as named entity recognition and knowledge base creation must first be addressed. We present the first study of entity linking for Faroese, a North Germanic language with approximately 70,000 speakers. Unlike traditional systems that rely on separate candidate retrieval and ranking components, we employ an end-to-end approach using GPT-5 with integrated web search. Our method prompts the model to directly identify and link named entities to Wikipedia pages through a three-tier fallback strategy: Faroese Wikipedia, English Wikipedia, and finally any available Wikipedia. We evaluate our approach on 1,010 manually annotated examples from a Faroese NER dataset, analyzing entity mentions across Person, Location, Organization, and Miscellaneous types. Human evaluation shows our system achieves 87.5% precision and 87.3% recall, with particularly strong performance on locations (93-95% precision, 92-95% recall). Persons are more challenging (86-88% precision, 72-83% recall). The majority of links (76.5%) point to Faroese Wikipedia, demonstrating the model’s ability to leverage language-specific knowledge bases. A Wikipedia API search baseline without any LLM achieves F1 = 0.57–0.60 on the same evaluation data, confirming that the LLM’s contextual reasoning provides substantial gains over simple search. We validate our approach across three models (GPT-5, Gemini 3 Flash, GPT-5.4 Mini), achieving F1 scores of 0.74–0.87 and confirming that the method generalizes across providers. This work establishes initial performance benchmarks for Faroese entity linking and demonstrates the viability of LLM-based approaches for low-resource languages.