Gearing Up For the Long Haul, Students Select Capstone Projects

Chris Longchamp
November 16, 2022
Residential MSDS Capstones

The Capstone Project selection process is one of the most exciting times for M.S. in Data Science students. As part of the 11-month residential program, students are required to complete a capstone project that spans the academic year. Students must go through many different stages to finally get their capstone projects assigned. There are three steps students go through: 1) reviewing the pitch book, 2) attending the capstone fair, and 3) completing a survey for the matching process.

First, the pitchbook. The capstone pitchbook is arguably one of the most important resources for students during the early stages of the capstone experience. The pitchbook allows students to get a sneak peek at the different companies and their proposed projects. Capstone projects can vary by research topic or expected deliverable and depend heavily on the capstone sponsor. The pitchbook is also the student’s first touchpoint as they can begin to think of questions to ask the sponsors when they present their projects to the students at the capstone fair.

As mentioned, the capstone fair is when the sponsors present their projects to students either in person on Grounds or over Zoom. The capstone fair usually lasts a couple of hours with students rotating from table to table or breakout rooms. This is when students can ask the capstone sponsors questions like, “What does the project timeline look like” or “How is the data for the project collected and evaluated?” This forum allows students to ultimately get a better idea of who the capstone sponsors are and the scope of the projects.

After the capstone fair, students are asked to rank their project choices. Decision criteria is personal to the student and could depend on many things, such as what subject area interests them most or and how did the presenter respond to their questions. The student must think about the tradeoffs of each decision criteria before submitting their final preference ranking to the school. Students are usually given a week to reflect on their experiences at the capstone fair before the final preference rankings are due to the school.

After students submitted their preferences, the School runs an algorithm to assign which project per student. The algorithm was created by data science professor Jon Kropko. Kropko’s algorithm works by minimizing a function that sums over all the student’s rankings multiplied by the constraint of having either zero, three, or four students working on a project. Students are requested to submit the final preference rankings of all the projects, since that is what the algorithm takes into consideration while making decisions. The algorithm splits the students into groups of three or four, depending on the project and how the numbers work out according to the preference lists.

Finally, students are notified of their capstone project and meet their teammates. Each capstone group is also assigned a faculty advisor who will help the team set up client meetings, monitor progress of the project, and provide advice to the students as they navigate the project throughout the year.

For me, there were different factors that I had to weigh when choosing a capstone project such as how interesting was the topic, what skills could be gained from working on the project, and how organized was the sponsor during their pitch presentation. Ultimately, my decision was to select a project that focused on identifying illegal fishing patterns throughout the world. I ended up getting my first choice and I am working with three other students on the project. I’m looking forward to working on this particular capstone because the data is unstructured and challenging along with the questions that the GA-CCRI, the capstone sponsor, is asking. Illegal fishing is hard to track and therefore hard to detect whether or not something is considered illegal fishing. This is the challenge my group will work on. The data is challenging because there is a lot of data, and it is very unstructured to say the least.