Capstone Project has Students Create an Algorithm to Predict the 2020 Presidential Election

November 19, 2020

For their capstone project, online MSDS students Matt Thomas, Ben Rogers, Spencer Marusco, and Chad Sopata built a model to forecast the 2020 presidential election. 

These students are graduating from the UVA School of Data Science in December, as a part of the inaugural online graduating class. 

“We built the model as part of the project to compare with other well known election model forecasts,” Thomas said. “I think most of us picked this project because it allowed us to use probabilistic models, which we don't always get to do but is becoming more popular in the data science community. After the election, the plan is to evaluate our model along with others to see what went right or wrong and see what it tells us about how elections are decided.”

They began by researching models from previous elections.

“The first part of the project began researching existing models and past models and their performance and how they're structured so a wide variety of different approaches,” Sopata said. 

The students explained that their model is a combination of two models: a polling-based model and a fundamentals model. The fundamentals model draws from information other than polls, such as demographic data. 

Rogers and Marusco focused on the polling model, while Thomas and Sopata focused on the fundamentals model. 

Thomas broke down how they specifically created the fundamentals model. 

“To make the fundamentals model, we collected many economic indicators and demographic data, such as per capital income, consumer sentiment, and state-by-state breakdowns of education level, race, and age,” he explained. “Using data from elections between 1992-2016, we made a lasso regression and random forest models to try to find the best predictors.”

Sopata added that it was important to study each state individually. 

“Some of the fundamentals model was economic data, so we might get something from the Bureau of Labor Statistics. It might be something like per capita income,” Sopata said. “One of the things that was very important for us was to get state level information more than national level information, because obviously we have an electoral college as we're all painfully aware of now and so you need accurate state level predictions.”

The students also gathered information from the U.S. Census Data and worked to narrow their model down to eight or nine of the most relevant factors. 

Looking at their model in hindsight after election results, their models proved to be pretty accurate. 

They predicted every state accurately, except for Florida, North Carolina, and Georgia, which were all swing states and difficult to predict for all models out there. 

Thomas, Rogers, Marusco, and Sopata began their capstone project in the summer and finished right before the election in November. 

“We researched models mostly over the summer,” Thomas said. “This semester we mostly spent building the dashboard.”

Their dashboard combined both models to predict a winner, with the polling model becoming the dominant predictor as the election got closer. 

“Our final predictions combine the two models, initially 50-50. As election day approached, the polling model got more and more weight, the idea being that the closer we are to election day, the better the polls become at predicting the winner,” Thomas explained.

While these students do not know if they want to pursue election analytics in the future, they enjoyed getting to apply their data science skills to a historic election. 

To learn more, check out their website here