HIV has a devastating social, demographic, and economic effect on Africa. With a 3.7% of population infected, the Ivory Coast has the highest prevalence rate in West Africa and generalized epidemic. The disease spreads out of the risk groups and affects entire population, demanding the development of national HIV-prevention plans.
Currently, there is a growing interest in the mining of mobile phone data for epidemiological purposes. Mobile phone communication drives the era of big data by creating huge amounts of Call Detail Records (CDRs). Cell phone service providers collect these records whenever a phone is used to send a text message or make a call. These records contain the time of the action, identifiers (IDs) of sender, receiver and the cell towers used to communicate. In this way, mobile phones provide approximate spatio-temporal localization of users and create an immense resource for the analysis of human mobility and behavioral patterns.
To better understand spatial epidemiology of HIV across 50 departments of the Ivory Coast, BioSense researchers have analyzed the collective communication and mobility connections from mobile phone data and linked them to prevalence rates estimated from publicly available surveys.
HIV prevalence rate in (a) 10 administrative regions (b) 50 departments. Estimated values range between 0.6 and 5.7%. Figures reveal the spatial variability of the distribution of HIV across the country and enable us to identify hot spots of epidemics – departments severely hit by HIV.
Data fusion of mobile phone and epidemic data enabled us to identify key elements that correlate with the rate of infections and could serve as a proxy for epidemic monitoring. To build predictive disease model we relied on machine learning algorithms coupled with feature engineering and recursive feature elimination. Our findings indicate that night connectivity and activity, spatial area covered by users and overall migrations are strongly linked to HIV. We have discovered that strong ties and hubs in the communication align with HIV hot spots. The strong ties created by user mobility revealed pathways that connect regions with higher prevalence.
Strong connectivity ties for (a) overall communication (b) night communication. The graphs emphasize the strongest links and communication hubs emerge. The hubs correspond to HIV hot spots and we can notice that larger hubs have higher prevalence rates. Visually apparent sparser and weaker social connectivity in the north part of the country may have affected epidemic spread by making it harder for disease to propagate.
The features identified from mobile phone data can be continuously measured and leveraged for the monitoring of changes in the HIV prevalence rate and to create early warning signs for possible increase of the infected population.
Features contribution analysis for night connectivity and in-migrations features. Plots indicate that larger connectivity during night and incoming migration flows are associated to higher HIV prevalence. Red arrows indicate warning signs.
This study is a first attempt to link mobile phone data and HIV epidemiology and lays a foundation for further research into ways to explain the heterogeneity of HIV and build predictive tools aimed at advancing public–health campaigns and decision making for HIV interventions. Together with other “big data” approaches to HIV epidemiology that rely on Twitter data and social networks, it fits well into the wider initiative of digital epidemiology.