Back to Main Conference 2004
LREC 2004main

Information Extraction from Hindi Texts

Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)

DOI:10.63317/4zo5feiim2hw

Abstract

The paper presents an information extraction system that takes input from Hindi texts and improves the information content retrieved by using anaphor/pronoun resolution mechanism. The information extraction system developed consists of three major modules: The language Parser, Resolution System and Information Extractor. The language parser used is HPSG (Head-Driven Phrase Structure Grammar) based that provides both syntactic and semantic information to the anaphor resolution system. HPSG was chosen because it provides a set of constraint on the co-referential structures in the language, which bounds the search for an antecedent to a more precise location in the discourse. The semantic information included in its parsing may be helpful for removing ambiguity in anaphor/pronoun resolution. The anaphor resolution system uses few heuristic rules to resolve intrasentential references while centering theory is used for intersentential resolution.

Details

Paper ID
lrec2004-main-008
Pages
N/A
BibKey
dutta-etal-2004-information
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-1-6
Conference
Fourth International Conference on Language Resources and Evaluation
Location
Lisbon, Portugal
Date
26 May 2004 28 May 2004

Authors

  • KD

    Kamlesh Dutta

  • SK

    Saroj Kaushik

  • NP

    Nupur Prakash

Links