Back to Main Conference 2008
LREC 2008main

UnsuParse: unsupervised Parsing with unsupervised Part of Speech Tagging

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/5nmad429s97i

Abstract

Based on simple methods such as observing word and part of speech tag co-occurrence and clustering, we generate syntactic parses of sentences in an entirely unsupervised and self-inducing manner. The parser learns the structure of the language in question based on measuring “breaking points” within sentences. The learning process is divided into two phases, learning and application of learned knowledge. The basic learning works in an iterative manner which results in a hierarchical constituent representation of the sentence. Part-of-Speech tags are used to circumvent the data sparseness problem for rare words. The algorithm is applied on untagged data, on manually assigned tags and on tags produced by an unsupervised part of speech tagger. The results are unsurpassed by any self-induced parser and challenge the quality of trained parsers with respect to finding certain structures such as noun phrases.

Details

Paper ID
lrec2008-main-455
Pages
N/A
BibKey
hanig-etal-2008-unsuparse
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • CH

    Christian Hänig

  • SB

    Stefan Bordag

  • UQ

    Uwe Quasthoff

Links