Back to Main Conference 2008
LREC 2008main
ParsCit: an Open-source CRF Reference String Parsing Package
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)
Abstract
We describe ParsCit, a freely available, open-source implementation of a reference string parsing package. At the core of ParsCit is a trained conditional random field (CRF) model used to label the token sequences in the reference string. A heuristic model wraps this core with added functionality to identify reference strings from a plain text file, and to retrieve the citation contexts. The package comes with utilities to run it as a web service or as a standalone utility. We compare ParsCit on three distinct reference string datasets and show that it compares well with other previously published work.