Candidate Ranking for Maintenance of an Online Dictionary
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Abstract
Traditionally, the process whereby a lexicographer identifies a lexical item to add to a dictionary -- a database of lexical items -- has been time-consuming and subjective. In the modern age of online dictionaries, all queries for lexical entries not currently in the database are indistinguishable from a larger list of misspellings, meaning that potential new or trending entries can get lost easily. In this project, we develop a system that uses machine learning techniques to assign these ``misspells'' a probability of being a novel or missing entry, incorporating signals from orthography, usage by trusted online sources, and dictionary query patterns.