Back to Main Conference 2006
LREC 2006main

A Hebrew Tree Bank Based on Cantillation Marks

Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006)

DOI:10.63317/4om9aybeiiiq

Abstract

In the Masoretic text of the Hebrew Bible (HB), the cantillation marks function like a punctuation system that shows the division and subdivision of each verse, forming a tree structure which is similar to the prosodic tree in modern linguistics. However, in the Masoretic text, the structure is hidden in a complicated set of diacritic symbols and the rich information is accessible only to a few trained scholars. In order to make the structural information available to the general public and to automatic processing by the computer, we built a tree bank where the hierarchical structure of each HB verse is explicitly represented in XML format. We coded the punctuation system in a context-tree grammar which was then used by a CYK parser to automatically generate trees for the whole HB. The results show that (1) the CFG correctly encoded the annotation rules and (2) the annotation done by the Masoretes is highly consistent.

Details

Paper ID
lrec2006-main-002
Pages
N/A
BibKey
wu-lowery-2006-hebrew
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-2-4
Conference
Fifth International Conference on Language Resources and Evaluation
Location
Genoa, Italy
Date
24 May 2006 26 May 2006

Authors

  • AW

    Andi Wu

  • KL

    Kirk Lowery

Links