The dynamics of contemporary social media communication, particularly on platforms like X (formerly Twitter), have significantly evolved, and this data is frequently used for scientific research. However, due to X’s API changes in 2019, a tweet’s precise geolocation is no longer present in the data, thus preventing a geographical assessment of tweets. This project aims to extract location mentions from tweets’ texts and to map them to Ukraine’s administrative regions. We have developed a specialized pipeline for geoparsing with specific prebuilt components for the Ukrainian, Russian, and English languages. The main advantage of our pipeline’s architecture is the interchangeability of all components, allowing for the integration of custom-developed solutions. Initial tests on our hand-labeled Ukrainian dataset show promising results in accurately identifying and mapping location mentions despite various challenges, such as declension and the presence of multiple languages in a single tweet. Additional experiments using publicly available benchmark data further indicate promising performance when transferring our pipeline to other geographical regions. Both our geoparsing pipeline and its online documentation have been made publicly available.
inproceedings
BibTeXKey: MMR+25