The MultiplEYE Text Corpus: Towards a Diverse and Ever-Expanding Multilingual Text Corpus
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
We present the MultiplEYE Text Corpus, a large-scale, document-level, multi-parallel resource designed to advance cross-linguistic research on reading and language processing. The corpus provides paragraph-level alignment for texts in 39 languages spanning seven language families and seven scripts. Unlike many existing multilingual corpora, a substantial number of documents were originally written in languages other than English, reducing English-centric bias and supporting more typologically diverse investigations. The texts are carefully selected to balance linguistic richness with experimental feasibility, particularly for eye-tracking-while-reading studies. Developed within a multi-lab initiative, the MultiplEYE Text Corpus follows unified translation, alignment, and experimental design guidelines to ensure cross-linguistic comparability. Its inclusion of texts varying in type and difficulty enables research on discourselevel processing, genre effects, and individual differences across a wide range of languages. The text corpus and accompanying metadata provide a robust foundation for multilingual psycholinguistic and computational modeling research. Data and materials are publicly available at https://doi.org/10.23668/psycharchives.21750.
Details
Authors
- RK
Ramunė Kasperė
- AB
Anna Bondar
- SN
Sergiu Nisioi
- MS
Maja Stegenwallner-Schütz
- HK
Hanne B. Søndergaard Knudsen
- AM
Ana Matić
- EV
Eva Pavlinušić Vilus
- DK
Dorota Klimek-Jankowska
- CT
Chiara Tschirner
- NS
Not Battesta Soliva
- DJ
Deborah N. Jakobi
- CD
Cui Ding
- DR
Dima Abu Romi
- CA
Cengiz Acarturk
- MA
Matilda Agdler
- AA
Anton Marius Alexandru
- MA
Mohd Faizan Ansari
- AA
Annalisa Arcidiacono
- EB
Elizabete Ausma Velta Barisa
- AB
Ana Bautista
- LB
Lisa Beinborn
- YB
Yevgeni Berzak
- NB
Nedeljka Bjelanović
- AB
Anna Isabelle Bothmann
- JB
Jan Brasser
- CC
Caterina Cacioli
- AÇ
Anila Çepani
- IC
Ilze Ceple
- AC
Adelina Cerpja
- DC
Dalí Chirino
- JC
Jan Chromý
- AM
Alessandro Corona Mendozza
- Id
Iria de-Dios-Flores
- ND
Nazik Dinçtopal Deniz
- AD
Ana Došen
- KE
Kristian Elersič
- IF
Inmaculada Fajardo
- ZF
Zigmunds Freibergs
- AG
Angelina Ganebnaya
- SG
Shan Gao
- JG
Jéssica Gomes
- AG
Annjo Klungervik Greenall
- AH
Alba Haveriku
- MH
Miao He
- AH
Anamaria Hodivoianu
- YH
Yu-Yin Hsu
- AI
Amanda Isaksen
- AJ
Andreia Janeiro
- KL
Kristine Jensen de López
- AJ
Aleksandar Jevremovic
- VJ
Vojislav Jovanovic
- HK
Hanna Kędzierska
- NK
Nik Kharlamov
- SK
Sara Kosutar
- NK
Nelda Kote
- VK
Vanja Kovic
- IK
Izabela Krejtz
- TK
Thyra Krosness
- OK
Oleksandra Kuvshynova
- EL
Eilam Lavy
- EL
Ella Lion
- MŁ
Marta Łockiewicz
- KL
Kaidi Lõo
- PL
Paula Luegi
- MM
Mircea Mihai Marin
- CM
Clara Martin
- SM
Svitlana Matvieieva
- DM
Diane C. Mézière
- XM
Xavier Mínguez-López
- VM
Valeriia Modina
- JM
Jurgita Motiejūnienė
- MM
Marie-Luise Müller
- Tk
Tolgonai Nasipbek kyzy
- JN
Jamal Abdul Nasir
- JN
Johanne S. K. Nedergård
- AÖ
Ayşegül Özkan
- PP
Patrizia Paggio
- MP
Marijan Palmović
- MP
Maria Christina Panagiotopoulou
- AP
Alberto Parola
- HP
Helena Pérez
- KP
Klaudia Petersen
- AP
Anja Podlesek
- EP
Eva Pospíšilová
- MP
Marta Praulina
- MP
Mikuláš Preininger
- LP
Loredana Pungă
- DR
Diego Rossini
- ŠR
Špela Rot
- HY
Habib Sani Yahaya
- IS
Irina A. Sekerina
- AS
Anne Gabija Skadina
- JS
Jordi Solé-Casals
- LP
Lonneke van der Plas
- SV
Saara M. Varjopuro
- SV
Spyridoula Varlokosta
- JV
João Veríssimo
- OV
Oskari Juhapekka Virtanen
- NV
Nemanja Vračar
- MV
Mila Vulchanova
- AW
Ahmad Mustapha Wali
- PW
Peizheng Wu
- NY
Nilgün Yücel
- SF
Stefan Frank
- NH
Nora Hollenstein
- LJ
Lena Jäger