Back to Main Conference 2012
LREC 2012main

Clause-based Discourse Segmentation of Arabic Texts

Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012)

DOI:10.63317/26588rymf6s8

Abstract

This paper describes a rule-based approach to segment Arabic texts into clauses. Our method relies on an extensive analysis of a large set of lexical cues as well as punctuation marks. Our analysis was carried out on two different corpus genres: news articles and elementary school textbooks. We propose a three steps segmentation algorithm: first by using only punctuation marks, then by relying only on lexical cues and finally by using both typology and lexical cues. The results were compared with manual segmentations elaborated by experts.

Details

Paper ID
lrec2012-main-559
Pages
pp. 2826-2832
BibKey
keskes-etal-2012-clause
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-7-7
Conference
Eighth International Conference on Language Resources and Evaluation
Location
Istanbul, Turkey
Date
21 May 2012 27 May 2012

Authors

  • IK

    Iskandar Keskes

  • FB

    Farah Benamara

  • LB

    Lamia Hadrich Belguith

Links