Request Correction

Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.

Correction Guidelines

Click the edit button next to a field to report a correction.
Fill in the suggested correction value for each field you want to correct.
Provide your name and email so we can contact you if needed.

View all submitted correction requests

Paper Information

lrec2026-ws-resourceful-09

Cross-Lingual Mathematical Reasoning in LLMs: Evaluating Performance on Icelandic vs. English Problems

View lrec2026-ws-resourceful-09.pdf

Paper Fields

Click the edit button next to a field to report a correction.

Title

Cross-Lingual Mathematical Reasoning in LLMs: Evaluating Performance on Icelandic vs. English Problems

Abstract

We investigate whether large language models (LLMs) exhibit performance differences when solving mathematical problems presented in a low-resource language (Icelandic) versus a high-resource language (English). Using 847 multiple-choice problems from the Icelandic Mathematics Competition corpus (STAK), we evaluate two state-of-the-art models (Gemini-3-Flash-Preview and GPT-5.4-mini) in both multiple-choice (MC) and open-ended (OE) formats, with correctness determined by a three-judge quorum (Gemini-3-Flash, GPT-5.4-mini, Claude Sonnet 4.6) achieving 97.6% unanimous agreement. Our results reveal significant cross-lingual performance gaps that vary by model: Gemini-3-Flash shows a consistent English advantage of 2.4–10.0 percentage points across both evaluation modes, while GPT-5.4-mini exhibits no significant language effects. Notably, GPT-5.4-mini demonstrates a substantial MC deficit, achieving only 42% in that format despite reaching 69-71% accuracy on OE problems. Analysis of answer patterns reveals a strong option position bias in GPT-5.4-mini, with systematic over-selection of option B and under-selection of option D. These findings suggest that language does affect LLM mathematical reasoning for some models, but the effect is model-dependent and interacts with evaluation format, with implications for deploying LLMs in educational contexts for speakers of low-resource languages.

Authors

Expand an author to correct their information. Use the remove button to request author removal, or add a new author.

PDF Attachment

You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.

Drag & drop a PDF here, or click to select

Your Information

Name

Comment

Author Declaration *

I declare that I have notified all co-authors of the proposed corrections and obtained their consent, and that all modifications adhere to research ethics standards and the LREC correction policy.

Select at least one field to correct using the edit buttons above.