Back to Main Conference 2018
LREC 2018main

Candidate Ranking for Maintenance of an Online Dictionary

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

DOI:10.63317/2nvn3ihrgqho

Abstract

Traditionally, the process whereby a lexicographer identifies a lexical item to add to a dictionary -- a database of lexical items -- has been time-consuming and subjective. In the modern age of online dictionaries, all queries for lexical entries not currently in the database are indistinguishable from a larger list of misspellings, meaning that potential new or trending entries can get lost easily. In this project, we develop a system that uses machine learning techniques to assign these ``misspells'' a probability of being a novel or missing entry, incorporating signals from orthography, usage by trusted online sources, and dictionary query patterns.

Details

Paper ID
lrec2018-main-134
Pages
N/A
BibKey
broad-etal-2018-candidate
Editors
Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, Takenobu Tokunaga
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-00-9
Conference
Eleventh International Conference on Language Resources and Evaluation
Location
Miyazaki, Japan
Date
7 - 12 May 2018

Authors

  • CB

    Claire Broad

  • HL

    Helen Langone

  • DB

    David Guy Brizan

Links