Back to Main Conference 2004
LREC 2004main
Exploiting Anchor Text as a Lexical Resource
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)
Abstract
Anchor texts, the strings associated with hyperlinks on a web page, are currently employed to express millions of referrals to sites and topics on the world wide web. We consider how these strings might be exploited as a lexical resource, particularly when viewed from the perspective of their target documents rather than their sources. We find that for many target pages, incoming anchors form a miniature corpus of reference expressions whose properties with relation both to other target sites and to each other can be put to use for mining lexical information.