Validating a Pipeline to Create a Comparable Corpus of Government-Issued Travel Advisories from the Internet Archives
Proceedings of the 19th Workshop on Building and Using Comparable Corpora (BUCC)
Abstract
Government-issued travel advisories are used by citizens to get information about destination countries for tourism and other purposes such as temporary work stays or permanent relocation plans. However, qualitative evidence suggests that travel advisories may be influenced by considerations beyond current security situations. Systematic and rigorous quantitative analyses of advisories are scarce because relevant corpus data are not readily available and official government websites often provide practical obstacles. We validate a pipeline to generate a time-series cross-sectional dataset of government-issued travel advisories for three English-speaking issuing countries based on the Internet Archive’s Wayback Machine. Using official government data sources that are prohibited to be scraped and used for research, we illustrate that our approach provides (near-)complete coverage. The resulting corpus and code are intended to support downstream research on comparative risk communication, international relations, and text analysis using natural language processing methods.