Understanding Covid-19 Stay-At-Home Orders: A Machine Learning Approach

Carter Rhea
4 min readMay 31, 2023

--

In January 2020, the COVID-19 pandemic rocked the world by sending many countries across the globe into lockdown and curfews and instilling a general sense of fear of the viral infection in every neighborhood. While the repercussions of this event are still being felt in everyday life, we collectively have started to move past it. We can now ask ourselves questions such as “why did certain governments issue Stay-At-Home orders (SAHOs) while others did not?” What political, scientific, economic, social, or external factors drove governments to implement SAHOs? Recently researchers have taken to answering these questions with the help of machine learning algorithms.

*Disclaimer: I am an author of the paper discussed in this post.*

The leading question is simple: “What factors drove countries to issue SAHOs”?

To answer this question, we gathered a set of over 200 variables ranging from social, political, economic, and health-related factors from several sources.

Since Africa is an understudied political region, we restricted our research to the 54 countries on the African continent. Our approach was to allow the data to speak for itself instead of relying on underlying political theory to drive hypotheses. That being said, our team members are well-versed researchers in political psychology and political science.

Once the data was gathered, our goal was relatively straightforward: use the existing data on political, scientific, economic, social, and external factors to predict whether or not an African country issued a SAHO during the first six and a half months of the COVID-19 pandemic. Given my background in machine learning applications, this would be a perfect use case for a decision-based model. Using a decision-based model, we can make a prediction and, more importantly, understand which factors were most important in making these predictions. In other words, once we trained the model to accurately predict whether or not a country would predict a SAHO, we could enumerate the most important features in the decision-making process.

The Data

Although our initial data collection yielded a total of 227 variables, due to the unreliability of values collected in African countries, we dropped nearly 50% of the variables. We additionally dropped any variables deemed duplicates. This left us with a total of 88 variables (including the dependent variable — the issuance of a SAHO). We divide the data into a training and test set where our test set has approximately 25% of the data; additionally. we ensure that the test and training sets have an equal number of countries that did issue SAHOs (denoted as saho in figure 1) and those that did not issue SAHOs (denoted as open in figure 1).

Figuure 1: This graphic shows the distribution of countries in test and training sets.

The Algorithm

Since we have a large number of variables (88) in each dataset but not a lot of data (n=54), we opt for a decision-based machine learning algorithm. Since the dataset is so small, we use the sklearn implementation of a random forest algorithm since its bootstrapping technique should increase the reliability of our results. As described in the paper, we split our feature selection algorithm into the initial variable selection and final importance calculation to further reduce bias and overfitting. We apply a Monte-Carlo implementation of the random forest algorithm with 1,000 iterations using all 88 variables during the initial variable selection. At each iteration, we recorded the top 50 variables ranked on importance as calculated by the feature importance variable implemented in sklearn. This variable acts as the mean decrease in the impurity measure (i.e., the entropy or Gini index) on splits computed for a given variable — therefore, it is a proxy for feature importance in the tree. After our initial 1,000 runs, we retain the top 30 variables from our initial list of 88 based on their mean feature importance score across all runs. For the final calculation, we again run the Monte-Carlo implementation; however, we only use the top 30 variables from the initial implementation. We take the feature importance from this implementation to determine the final values.

By applying this methodology, we ensure accurate results. Based on the f1-score (~85%), we determine that the top ten variables are the most important predictors of whether or not an African country issued SAHOs. A feature importance plot is shown in figure 2.

Figure 2: Feature importance vs. Feature name.

The most interesting result of this is that several different types of variables (i.e. economic, social, political, etc.) are important! This implies that, as social scientists, we can’t ignore any of these factors when studying political decision.

Although this post mainly focuses on the machine learning side of things, you can read all about the results in our published paper:

https://www.sciencedirect.com/science/article/pii/S221242092300078X

--

--

Carter Rhea
Carter Rhea

Written by Carter Rhea

PhD Student in Astrophysics at the University of Montreal working on machine learning in astronomy. Co-founder of cadena.ca

No responses yet