Causal Connections: Leveraging Multilingual Fine-Tuning for Financial QA@FinCausal 2026

The 7th Financial Narrative Processing Workshop

Abstract

This paper describes team HSA_CORAL’s submission to the FinCausal 2026 shared task on extracting cause–effect relations from financial narratives via extractive question answering in English and Spanish. We compare three modeling families: (i) encoder-only token tagging with multilingual BERT, (ii) encoder–decoder generation with multilingual BART, and (iii) decoder-only LLMs (Llama 3.1 and GPT variants) using prompt refinement, few-shot demonstrations, and supervised fine-tuning. Across settings, prompting and few-shot examples yield competitive performance, but supervised fine-tuning is the main driver of improvement. Our best system, GPT-4.1 Mini fine-tuned on combined English and Spanish training data, achieves the highest (tied) score on English (score 4.8140) and ranks third on Spanish (score 4.7753) under the shared task’s LLM-as-a-judge metric. Overall, the results highlight the value of task-specific adaptation and multilingual fine-tuning for cross-lingual transfer in financial causality QA.