HomeLREC 2026WorkshopsLLMS4SSHlrec2026-ws-llms4ssh-16
Back to LLMS4SSH 2026
LREC 2026workshop

Design and Methodological Architecture of a Multilingual Corpus of Interpreter-mediated Public Service Telephone Interactions

Proceedings of Shaping Multilingual, Multimodal AI for the Social Sciences and Humanities (LLMs4SSH) @ LREC 2026

DOI:10.63317/4z95d2arkz5w

Abstract

Multimodality in Social Sciences and Humanities (SSH) research is often associated with the integration of text and visual data. However, interpreter-mediated telephone interaction presents a different configuration of complexity, where acoustic, temporal, discursive, and pragmatic dimensions converge. This paper presents the design and methodological architecture of PRAGMACOR(Corpus Pragmatics and Telephone Interpreting: Analysis of Face-Threatening Acts, Ref. PID2021-127196NA-I00), a multilingual corpus of interpreter-mediated public service telephone interactions (Chinese–Spanish, English–Spanish, French–Spanish, German–Spanish), as a case study in multimodal and plurilingual SSH infrastructure. The corpus integrates aligned audio recordings, orthographic transcriptions enriched with speech phenomena, temporal segmentation into speech acts, and multilayer pragmatic annotation of Face-Threatening Acts (FTAs), validated through a structured double-annotation and expert review process. Beyond textual data, the infrastructure captures prosodic overlap, turn-taking dynamics, and pragmatic mediation, enabling the study of cross-linguistic transfer and relational negotiation in asymmetrical institutional contexts. Datasets such as PRAGMACOR have proved essential to train LLMs for speech to speech translation (Sakai et al., 2024). Attention is given to the ethical and technical design of the corpus, including local automatic transcription, systematic removal of personal identifiable information, and irreversible voice anonymization through spectral and temporal signal transformation. These procedures ensure both research usability and compliance with responsible data governance principles. By conceptualising interpreter-mediated interaction as an acoustic-discursive multimodal object and plurilingual pragmatic process, this paper argues that PRAGMACOR provides a replicable model for the development of SSH-oriented infrastructures capable of supporting advanced research in multilingual communication, discourse analysis, and future evaluation of language technologies.

Details

Paper ID
lrec2026-ws-llms4ssh-16
Pages
pp. 153-157
BibKey
lazarogutierrez-etal-2026-design
Editors
Arturo Montejo-Raez, Cristina Grisot, Joanna Blochowiak, Nikola Ljubešić, Elena Battaner, German Rigau
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of Shaping Multilingual, Multimodal AI for the Social Sciences and Humanities (LLMs4SSH) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • RL

    Raquel Lazaro Gutierrez

  • DL

    Daniel López Padilla

  • JR

    Jorge Rico

  • MV

    María José Vilella Sánchez

  • FE

    Fernando Manuel Espinoza-Cuadros

Links