EHRI Annotator: A Web-Based Tool for Named Entity Recognition and Linking in Holocaust-Related Texts
Proceedings of The Second Workshop on Holocaust Testimonies as Language Resources (HTRes)
Abstract
This paper presents the EHRI Annotator, a web-based tool for multilingual named entity recognition (NER) and entity linking (EL) in Holocaust-related texts. The tool was developed to support services provided by the European Holocaust Research Infrastructure (EHRI), primarily the digital scholarly editions published by EHRI (EHRI Online Editions) by streamlining the process of detecting named entities in documents and linking them to their unique identifiers in EHRI and third-party controlled vocabularies and gazetteers. The EHRI Annotator builds upon previous work on domain-specific NER, taking it a step further to support multilingual EL. The tool adopts a dual entity linking architecture that uses a different matching approach depending on the type of the named entity. It performs semantic matching for entities to be linked to EHRI vocabularies and authority sets which are modestly sized, and string-matching-based retrieval for locations to be linked to the extensive GeoNames gazetteer using a domain-specific relevance weighting. A preliminary evaluation on 264 entities from a manually annotated dataset of Holocaust testimonies yields an Accuracy@5 of 77.7% when it comes to the linking component of the tool. User testing confirms the tool’s usability but also highlights areas for improvement.