Back to Home

Request Correction

Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.

Correction Guidelines

  1. Click the edit button next to a field to report a correction.
  2. Fill in the suggested correction value for each field you want to correct.
  3. Provide your name and email so we can contact you if needed.

Paper Information

lrec2026-ws-rail-13

Open but Unvetted: The Ethics of African Language Data

Paper Fields

Click the edit button next to a field to report a correction.

Title

Open but Unvetted: The Ethics of African Language Data

Abstract

Creative Commons (CC) licenses are prevalent in African natural language processing (NLP) corpus releases, but their compatibility implications are rarely examined systematically. CC-BY-SA and CC-BY-NC cannot be combined in a single published dataset; a NoDerivs (ND) clause prohibits redistribution of tokenised or annotated derivatives. This paper presents an empirical audit of license provenance across more than twenty corpus families used in African NLP, applying established compatibility rules to three case-study languages: Kituba/Munukutuba, Zarma, and Moore. Four failure modes are documented with primary-source evidence: outright prohibition (JW300, removed from OPUS after a legal audit confirmed a Terms of Service violation); composite license misrepresentation (WAXAL, whose CC-BY 4.0 claim is contradicted by its HuggingFace dataset card); a ND restriction not reflected in the CC-BY label (Tanzil); and data persistence failure (the Congolese Radio Corpus, where 402 of 405 source URLs are no longer accessible). A due diligence checklist and a survey of legally compliant enrichment opportunities conclude the paper. We argue that lawful data use is an ethical baseline: for African language communities with limited institutional recourse, license violations are not only legal risks but ethical failures that compound existing power asymmetries.


Authors

Expand an author to correct their information. Use the remove button to request author removal, or add a new author.


PDF Attachment

You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.

Drag & drop a PDF here, or click to select

Your Information

Author Declaration *

Select at least one field to correct using the edit buttons above.