EpiGator: An Event-based Surveillance System for Infectious Disease Outbreaks
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
We present EpiGator, a novel event-based system for global surveillance of outbreaks of infectious epidemics that automatically processes streams of news articles and generates reports about the outbreaks, which is crucial for medical authorities. The goal of our work is to combine our experience in outbreak surveillance with state-of-the-art large language models (LLM), which allows us to reduce the overall cost of system development and maintenance. The EpiGator pipeline combines keyword filtering, relevance classification, event-based clustering, and multi-document summarization. A key novelty lies in using a fine-tuned LLM to identify articles relevant to ongoing outbreaks, followed by a zero-shot information extraction pipeline that normalizes the event features and clusters the related articles. For each cluster, we generate an outbreak summary using instruction-tuned LLMs. We evaluate EpiGator output against disease outbreak reports written by medical specialists.