My MSDS Capstone Project: Saving America’s Small Business

David Diaz
July 11, 2024
Image
UVA Data Science alumni David Diaz professional headshot circle crop

Building on the rigorous and foundational data science curriculum offered by the M.S. in Data Science program, capstone projects provide students with the space to apply that knowledge and practice the soft skills needed of data scientists today. 

All the projects are vetted beforehand, with each project having its own scope and specific industry impact.

Early on in the Fall semester, all the capstone project topics were presented to us. Afterward, all of us ranked each project individually, which was then used to match each of us to a project that aligned with our personal or professional interests.

Given that I grew up watching my dad establish his own small business in California, I was thrilled to be matched with the project sponsored by the Small Business Administration: “Prediction on Federal Purchasing Behaviors on Small Business.”

Image
Small California restaurant with outside seating, awning and palm trees
A California small business

Aiding in President Biden’s mission to advance equity to underserved communities through the mechanisms of federal contracting, this project aims to create an online tool to match small businesses to government agencies. Specifically, it will provide a platform for small businesses to gain insight into what federal contracting opportunities best match their services based on previous government contracting data, to make federal contracting more accessible.

In accomplishing this task, my group and I all took a deep dive into the world of federal contracting by performing individual research, exploring the dataset in full, and working with our sponsor to understand contracting mechanisms and the project scope better. We also reached out to local small businesses to garner interest in our project and help guide the development of the online tool.

To build the primary matching model, my group and I reviewed the several different model types we learned about in Professor Porter’s Statistical Learning class. We then used feature importance as a starting point to filter for important data features before basing our refined model on random forests.

Image
David Diaz addresses the audience during his group's capstone project presentation. (Photo by Alyssa Brown)
David Diaz addresses the audience during his group's capstone project presentation. (Photo by Alyssa Brown)

All in all, the capstone project has allowed me to gain hands-on experience as a data scientist, exposing me to the project management and communications side of the job. It is an invaluable experience that has ultimately prepared me for a career in the field.

At the University of Virginia’s School of Data Science, you learn early on that data can be messy. There is beauty in that, however. It allows you to make a project your very own, capstone included.