Back to Home

Request Correction

Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.

Correction Guidelines

  1. Click the edit button next to a field to report a correction.
  2. Fill in the suggested correction value for each field you want to correct.
  3. Provide your name and email so we can contact you if needed.

Paper Information

lrec2026-main-618

Historical Medical Knowledge Graphs and Ontologies from the Medical History of British India Corpus (1850-1950)

Paper Fields

Click the edit button next to a field to report a correction.

Title

Historical Medical Knowledge Graphs and Ontologies from the Medical History of British India Corpus (1850-1950)

Abstract

This research presents a reproducible framework for constructing biomedical knowledge graphs and ontologies from digitized historical archives. Focusing on the Medical History of British India corpus (468 reports; ∼22.5M words; 1850–1950), our pipeline combines BioBERT-based entity recognition, LLM-guided relation extraction with LLM-based filtering, and clustering-based ontology induction. Reliability is strengthened through canonicalization, schema mapping to standardized biomedical relation types, and multi-metric edge scoring with temporal decay; a manual evaluation of 500 validated triples yields 0.892 precision. The resulting resources comprise 282,882 extracted relations, consolidated into 22,360 unique surface forms and organized into 71 thematic clusters. Frequent categories include After Treatment (∼1,242 mentions), Date of Inoculation (∼540), and diverse causal relations, while the induced ontology highlights six epidemic diseases: plague, cholera, malaria, kala azar, leprosy, and smallpox together with their characteristic interventions (e.g., quinine therapy, vaccination campaigns, hospital disinfection). Temporal analyses capture historically plausible trajectories: plague interventions peaking in the 1890s, cholera’s long-run decline, and tuberculosis departments rising after 1910. All code, relation inventories, ontologies, and visualizations are released in a GitHub Repository, enabling reproducibility and supporting research in historical NLP, biomedical informatics, and digital humanities.


Authors

Expand an author to correct their information. Use the remove button to request author removal, or add a new author.


PDF Attachment

You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.

Drag & drop a PDF here, or click to select

Your Information

Author Declaration *

Select at least one field to correct using the edit buttons above.