Back to Main Conference 2024
LREC-COLING 2024main

Improving Text Readability through Segmentation into Rheses

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/5ncsscqua2qm

Abstract

Enhancing text readability is crucial for readers with challenges like dyslexia. This paper delves into the segmentation of sentences into rheses, i.e. rhythmic and semantic units. Their aim is to clarify sentence structures for improved comprehension, through a harmonious balance between syntactic accuracy, the natural rhythm of reading aloud, and the delineation of meaningful units. This study relates and compares our various attempts to improve a pre-existing rhesis segmentation tool, which is based on the selection of candidate segmentations. We also release TeRheSe (Texts with Rhesis Segmentation), a bilingual dataset, segmented into rheses, comprising 12 books from classic literature in French and English. We evaluated our approaches on this dataset, showing the efficiency of a novel approach based on token classification, reaching a F1-score of 90.0% in English (previously 85.3%) and 91.3% in French (previously 88.0%). We also study the potential of leveraging prosodic elements, though its definitive impact remains inconclusive.

Details

Paper ID
lrec2024-main-0781
Pages
pp. 8925-8930
BibKey
jamelot-etal-2024-improving
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • AJ

    Antoine Jamelot

  • SQ

    Solen Quiniou

  • SH

    Sophie Hamon

Links