Request Correction
Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.
Correction Guidelines
- Click the edit button next to a field to report a correction.
- Fill in the suggested correction value for each field you want to correct.
- Provide your name and email so we can contact you if needed.
Paper Information
The Potential for Misleading Results in Text Sanitisation with Standard Evaluation Metrics
Paper Fields
Click the edit button next to a field to report a correction.
The Potential for Misleading Results in Text Sanitisation with Standard Evaluation Metrics
Data privacy is an important facet of modern life. It is especially important when considering data that carries potentially sensitive information such as in medical or legal documents. However, it is particularly difficult to ensure private information has been removed or masked in unstructured data, e.g. free-flowing text. The evaluation of systems that automatically detect and remove personal identifiable information (PII) from text is also challenging. Here we present a case study of a system that seemingly performed well, but under closer scrutiny the high performance was due to the shortcomings of standard binary classification metrics in the context of high target class prevalence. We then give a short analysis of different possible metrics in these high-prevalence scenarios, clearly showing the superiority of the Matthews Correlation Coefficient. This is particularly important because readily available data in this domain is rare and often systems are compared using biographies from Wikipedia which have a naturally high prevalence. This can be further aggravated by certain reasonable pre-processing or evaluation formalisms as in the case study discussed here.
Authors
Expand an author to correct their information. Use the remove button to request author removal, or add a new author.
PDF Attachment
You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.
Your Information
Author Declaration *
Select at least one field to correct using the edit buttons above.