Request Correction

Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.

Correction Guidelines

Click the edit button next to a field to report a correction.
Fill in the suggested correction value for each field you want to correct.
Provide your name and email so we can contact you if needed.

View all submitted correction requests

Paper Information

lrec2026-main-070

Gretino: A Greek and Latin Dataset to Benchmark Retrieval Systems in Classical Languages

View lrec2026-main-070.pdf

Paper Fields

Click the edit button next to a field to report a correction.

Title

Gretino: A Greek and Latin Dataset to Benchmark Retrieval Systems in Classical Languages

Abstract

Semantic similarity search is a method for exploring large text corpora and retrieving conceptually related content. Although widely used in modern language applications, it remains underexplored in the context of classical literature, where it could provide scholars with tools to uncover meaningful connections across authors, genres, and languages, surpassing the limitations of rule-based or keyword search systems. To promote the adoption of semantic retrieval in classical languages, we introduce Gretino, the first benchmark dataset for evaluating semantic search systems in Latin, Ancient Greek, and cross-lingual settings. Gretino comprises 240 carefully designed queries, each paired with five semantically relevant passages in Latin and Greek. The dataset is divided into two subsets: Gretino Silver, consisting of 200 queries and 1,000 targets (evenly split between Latin and Greek), generated with the assistance of ChatGPT and subsequently reviewed; and Gretino Gold, a manually curated high-quality subset of 40 queries and 200 targets, fully based on authentic classical texts. We evaluate four pre-trained language models: GreBERTa, LaBERTa, PhilBERTA, and SPhilBERTa and demonstrate the potential of a contrastive learning approach based on SimCSE (Gao et al., 2021) for fine-tuning, showing that training on carefully curated bilingual corpora, with texts aligned in the two languages, can improve retrieval performance.

Authors

Expand an author to correct their information. Use the remove button to request author removal, or add a new author.

PDF Attachment

You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.

Drag & drop a PDF here, or click to select

Your Information

Name

Comment

Author Declaration *

I declare that I have notified all co-authors of the proposed corrections and obtained their consent, and that all modifications adhere to research ethics standards and the LREC correction policy.

Select at least one field to correct using the edit buttons above.