HomeLREC 2026WorkshopsFNPlrec2026-ws-fnp-07
Back to FNP 2026
LREC 2026workshop

Flipper: An Extended Document-Level Financial Dataset for Training and Evaluation with Annotated Discourse Phenomena

The 7th Financial Narrative Processing Workshop

DOI:10.63317/22f2djwbns96

Abstract

We present a new resource for Machine Translation (MT), namely a training and evaluation dataset containing parallel sections issued from authentic documents in the financial domain. We cover five language pairs: English-French, English-Spanish, English-German, English-Italian and French-Spanish. The total number of parallel sections is 122k and the number of tokens is 118M (source and target combined). MT has improved greatly in recent years, but certain phenomena still cause errors, particularly when context spans beyond a single sentence. Errors can lead to mistranslated pronouns, incorrect gender or number agreement, and inconsistent terminology, which can be especially problematic in high-stakes domains like finance. We therefore construct the dataset at document level (rather than sentence-level alignment) and also produce fine-grained annotations of context-sensitive phenomena. The annotation was performed using preexisting tools and custom scripts. The annotated phenomena are: formality, gender, terminology consistency, verb form and sentence reordering. This aims to improve document-level evaluation of MT models by enabling evaluation solely on texts containing a particular phenomenon of interest. Our primary contribution is the creation and public release of Flipper, a multilingual document-level parallel dataset in the financial domain, designed to support both training and targeted evaluation of context-sensitive machine translation.

Details

Paper ID
lrec2026-ws-fnp-07
Pages
pp. 78-86
BibKey
nakhl-etal-2026-flipper
Editors
Mo El-Haj, Antonio Moreno Sandoval, Ana Garcia-Serrano, Chung-Chi Chen, Paul Rayson, Yanco Amor Torterolo Orta, Paloma Martinez, Jordi Porta
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
The 7th Financial Narrative Processing Workshop
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • MN

    Mariam Nakhlé

  • RA

    Rachel Atherly

  • GG

    Gabriela nicole Gonzalez Saez

  • MD

    Marco Dinarelli

  • RQ

    Raheel Qader

  • HB

    Hervé Blanchon

Links