M.S. in Data Science students are required to complete a capstone project. Capstone projects challenge students to acquire and analyze data to solve real-world problems. Project teams consist of two to four students and a faculty advisor. Teams select their capstone project at the beginning of the year and work on the project over the course of two semesters.
Most projects are sponsored by an organization—academic, commercial, non-profit, and government—seeking valuable recommendations to address strategic and operational issues. Depending on the needs of the sponsor, teams may develop web-based applications that can support ongoing decision-making. The capstone project concludes with a paper and presentation.
- Synthesizing the concepts you have learned throughout the program in various courses (this requires that the question posed by the project be complex enough to require the application of appropriate analytical approaches learned in the program and that the available data be of sufficient size to qualify as ‘big’)
- Experience working with ‘raw’ data exposing you to the data pipeline process you are likely to encounter in the ‘real world’
- Demonstrating oral and written communication skills through a formal paper and presentation of project outcomes
- Acquisition of team building skills on a long-term, complex, data science project
Capstone projects have been sponsors by a variety of organizations and industries, including: Capital One, City of Charlottesville, Deloitte Consulting LLP, Metropolitan Museum of Art, MITRE Corporation, a multinational banking firm, The Public Library of Science, S&P Global Market Intelligence, UVA Brain Institute, UVA Center for Diabetes Technology, UVA Health System, U.S. Army Research Laboratory, Virginia Department of Health, Virginia Department of Motor Vehicles, Virginia Office of the Governor, Wikipedia, and more.
View previous examples of capstone projects and check out answers to frequently asked questions.
What does the process look like?
- The School of Data Science periodically puts out a Call for Proposals. Prospective project sponsors complete applications and work with the SDS Associate Director for Research Development to finalize the materials and ensure that projects meet the requirements.
- Finalized lists of projects are then presented to students in a “pitch day” near the start of the term in which their capstone project work begins (fall term for residential MSDS students, term 4 for online MSDS students). During ‘pitch day’ project sponsors describe the capstone project and students have the opportunity to ask questions.
- Students then individually rank their top project choices using a mechanism described to them by one of the SDS faculty members. An algorithm is used to sort students into groups based on their choices and the number of groups desired which is based on the number of total students enrolled that semester and a desired group size of approximately 3 to 4 students per group.
- Project assignments are communicated by faculty; each group is assigned a faculty mentor which will meet with approximately 4 groups each week in a seminar style format.
What is the seminar approach to mentoring capstones?
We utilize a seminar approach to managing capstones to provide faculty mentorship and streamlined logistics. This approach involves one mentor supervising three to four loosely related projects and meeting with these groups on a regular basis. Project teams often encounter similar roadblocks and issues so meeting together to share information and report on progress toward key milestones is highly beneficial.
Do all capstone projects have sponsors?
Not necessarily. Generally in our capstone program each group works with a sponsor from outside the School of Data Science. Some sponsors are corporate, some are from nonprofit and governmental organizations, and some are professors in other departments at UVA. As is the case with all of what we do, we are constantly evolving and looking for ways to improve what we do at the School of Data Science.
One of the challenges we continue to encounter when curating capstone projects with external sponsors is appropriately scoping and defining a question that is of sufficient depth for our students, obtaining data of sufficient size, obtaining access to the data in sufficient time for adequate analysis to be performed and navigating a myriad of legal issues (including conflicts of interest). While we continue to strive to use sponsored projects and work to solve these issues, we also look for ways to leverage openly available data to solve interesting societal problems which allow students to apply the skills learned throughout the program. While not all capstones have sponsors, all capstones have clients. That is, the work is being done for someone who cares and has investment in the outcome.
Why do we have to work in groups?
Because data science is a team sport!
All of our capstones, online and residential, are group work projects. While coordinating group projects requires additional coordination and collaboration than individual groups, there are benefits to group work as well. These are big projects and it would be too much to ask of one person. Also, most data science jobs involve a high degree of group work and, as a result, building this capability in our students is one of our core learning objectives for the capstone project.
I don’t like the topic area of the capstone project I received during the algorithm matching. What can I do?
First, remember that the point of the capstone projects isn’t the subject matter; it’s the data science. For example, if you couldn’t care less about political speeches, maybe you can appreciate the challenge of building a document store and running Latent Dirichlet Allocation on cloud computing. You might not care enough about the election to bother to vote, but you can still get a lot by learning how to generate causal inferences from the vote-by-mail natural experiment. You might hate social media, but you might need to learn how to wrangle a tough API like Twitter’s and how to run recurrent neural networks on the time series output. Professional data scientists often find themselves in positions in which they work on topics they find boring, but use methods they enjoy. That said, there are many ways to tackle a subject, and we are more than happy to work with you to find an approach to the work that most aligns with your interests.
Why don’t we have a say in the capstone topics?
Your ability to influence which project you work on is in the ranking process after “pitch day” and in encouraging your company or department to submit a proposal during the Call for Proposal process. At a minimum it takes several months to work with a sponsor to adequately scope a project, confirm access to the data and put the appropriate legal agreements into place. Before you ever see a project presented on pitch day, a lot of work has taken place to get it to that point!
Can I work on a project for my current employer?
Each spring, we put forward a public call for capstone projects. You are encouraged to share this call widely with your community, including your employer, non-profit organizations, or any entity that might have a big data problem that we can help solve. As a reminder, capstone projects are group projects so the project would require sufficient student interest after ‘pitch day’. In addition, you (the student) cannot serve as the project sponsor (someone else within your employer organization must serve in that capacity).
If my project doesn’t have a corporate sponsor, am I losing out on a career opportunity?
The completion of a capstone that produces good results presented in a paper and through code on Github will provide more career opportunities than the sponsor of the project. Although it does happen from time to time, it is rare that capstones lead to a direct job offer with the capstone sponsor's company. The purpose of the capstone is to provide you with the opportunity to do relevant and quality work (as described above in the learning objectives) which can be included on a resume and discussed during job interviews. We have an excellent career services team led by Reggie Leonard (email@example.com). Capstone projects are just one networking opportunity available to you in the program.
What is the SIEDS conference and is there an equivalent for the online program?
The SIEDS conference takes place annually in the spring. Traditionally, graduates of the residential MSDS program participate in it where they submit papers for publication and if accepted they present their results at the conference. For the online program we will make available publication opportunities for the final papers and also provide an opportunity to present results to an online audience. At a minimum, we anticipate those online students presenting at School of Data Science venues.
Capstone Project Reflections From Alumni
“Capstone projects are opportunities for you to deliver valuable, quantifiable results that you can use as a testimony of your long-term project success to the company you work for and other companies in future interviews.” — Gabriel Rushin, MSDS 2017, Procter & Gamble, Senior Machine Learning Engineer Manager
“For my capstone project, I worked to develop a clustering model to assess biogeographic ancestry, using DNA profiles. I felt like I was finally doing real-world data science and loved working with such an important organization as the Department of Defense.” — Colleen Callahan, Online MSDS 2021, Associate Research Analyst, CNA (Arlington, Virginia)
Capstone Project Reflections From Sponsors
“For us, the level of expertise, and special expertise, of the capstone students gives us ‘extra legs’ and an extra push to move a project forward. The team was asked to provide a replicable prototype air quality sensor that connected to the Cville Things Network, a free and community supported IoT network in Charlottesville. Their final product was a fantastic example that included clear circuit diagrams for replication by citizen scientists.” — Lucas Ames, Founder, Smart Cville
“Working with students on an exploratory project allowed us to focus on the data part of the problem rather than the business part, while testing with little risk. If our hypothesis falls flat, we gain valuable information; if it is validated or exceeded, we gain valuable information and are a few steps closer to a new product offering than when we started.” — Ellen Loeshelle, Senior Director of Product Management, Clarabridge