Cross-Lingual Abstractive Keyphrase Generation for Historical Newspapers

Proceedings of Shaping Multilingual, Multimodal AI for the Social Sciences and Humanities (LLMs4SSH) @ LREC 2026

Abstract

We investigate large language models (LLMs) for cross-lingual abstractive keyphrase generation from historical newspapers. The task consists of producing a small set of English keyphrases for articles written in German, French, and Luxembourgish, combining translation, abstraction, and normalization. We conduct a human-centered pilot study comparing model outputs using human selections, LLM-as-judge assessments, and inter-annotator agreement analysis, followed by a medium-scale application to multilingual data from the Impresso corpus. Results show that LLM-generated keyphrases can support semantic enrichment and exploratory analysis of historical collections, while highlighting the subjective and methodologically challenging nature of keyphrase evaluation.