More than 600 undergraduate students are enrolled in the Data Science Minor at the University of Virginia and represent more than 50 majors, a display of the field’s growing popularity and applicability in a range of subjects. As part of the “Data Science Project” course taught by Brian Wright, assistant professor of data science and undergraduate program director, students formed teams to solve a data problem.
Three fourth-year students—Stephanie Fissel, Beza Gashe, and Claire Yoon—formed a team that explored algorithmic bias in facial recognition technology. Although they are majoring in economics, human biology, and computer science, all three were drawn to data science for its real-world application.
Professor Wright, who directs undergraduate programs at the School of Data Science, is not surprised at how popular the minor is across Grounds. “We designed the minor in data science to be available to any student, of any major or any background, who is interested in coming and learning about data science.”
His students reflect the minor’s success across Grounds. “I first realized my passion for data science after taking a Forge class called ‘Node’ which introduced me to coding, data analysis, and data visualization,” said Fissel, an economics major. “When I discovered the UVA data science minor, I wanted to dive deeper into the subject because it pairs well with my major.”
Gashe agreed, though her decision to pursue data science came from closer to home. “I remember talking to my father about the advancements in technology and the growth of the digital world. He explained that everything is becoming digital, so it is critical to learn programming, data analysis, statistics, and data ethics.” A human biology major, she credits data science with expanding her knowledge beyond subjects related to the medical field.
When asked what intersections they see between data science and their major, answers vary.
Yoon reflected, “There are many intersections between computer science and data science. Both fields deal with data and require data interpretation skills, writing codes, and understanding systems in general. Learning computer science allows me to better understand data science and vice versa.”
For Gashe, data science can play a crucial role in advancing the understanding of human biology and its societal implications. “What I love about my major is that it not only covers biology but the social factors that influence the health and well-being of communities and individuals.” She’s learning that data science techniques can be applied to biological data to provide new interpretations and information.
“For example, by interpreting data that investigates disease prevalence, projections can inform public health officials and the public about what to prepare for, where to allocate resources, and which populations to prioritize in order to maximize prevention.”
The three approached their group project from different skill sets, interests, and backgrounds, yet they worked as a cohesive team.
According to Gashe, “The project allowed us to put our strengths and skills into practice. We were also able to step outside of our comfort zones and explore new methods and techniques that we previously did not touch on in other courses.”
Yoon agreed. “This project was challenging because I was not familiar with Python machine learning and knew nothing about image classification. However, we wanted to challenge ourselves and we ended up successfully developing CNN models!”
We asked the team to walk us through their project on algorithmic bias in facial recognition technology. Read below their responses, including why they chose the topic, how they organized the problem, and what results they presented.
Data Science Project: Algorithmic Bias in Facial Recognition Technology
The problem we focused on was the algorithmic bias in facial recognition technology. But first, what is facial recognition technology?
Facial recognition technology (FRT) relies on massive datasets to “learn” faces to accurately identify or verify the identity of a person. It captures, analyzes, and compares patterns based on a person’s facial features. Facial biometrics continues to be the preferred biometric benchmark above all other methods because it’s easy to deploy and implement, does not require any physical interaction with the end user, and is very quick.
Facial recognition is now implemented across many industries because of its wide applicability, benefits, and convenience. Many issues, small and large, that we once faced a decade ago are being solved thanks to this technology. For example, it can be used to:
Help find missing people and identify perpetrators
Protect businesses against theft
Improve medical treatment
Strengthen security measures in banks and airports
Unlock sensitive information on mobile devices
Make shopping more efficient
Drastically reduce human touchpoints
Help organize photos
Back to the project. For all its benefits, FRTs are controversial because algorithmic bias can result in racial discrimination. Researchers have pointed out that some FRTs identified persistent inaccuracies in algorithms designed to detect faces of color. In addition, the algorithms used were less accurate for women than for men. So, we wondered whether our machine learning models could identify and detect the correct person from a group of people of the same race and gender, comparing our results with previously published research.
To access a lot of individual images, we decided our subjects would be celebrities. We organized them into six groups: Caucasian male, Caucasian female, African American male, African American female, Asian male, and Asian female. Each group had three different celebrities. To collect the celebrities’ pictures, we used both web scraping and web crawling.
Definition: Web scraping is the automated process of collecting structured information from the internet. It extracts and duplicates data from any page it accesses. Web crawling is used by search engines to scan the internet for pages according to keywords and then indexes search results.
After gathering the images from the internet, we developed and trained Convolutional Neural Network models using PyTorch for image classification of respective gender and racial group. Considering the limitations of time and GPU, we decided to use a pre-trained network by transfer learning (the technique of using a model to solve another related task) since it allowed us to reduce the time to train models and use smaller datasets. We used 120 images per person for training and 40 images per person for validating/testing. You can check out our project website to learn more.
The results? We compared the web scraping model with the web crawling model and found that web scraping had a higher accuracy rate. Web crawling had lower accuracy which may have been due to the higher chance of random images being pulled from the internet based on the keyword input. The results may also have differed because the web scraping model cleaned and normalized images compared to web crawling.
When looking more closely at the six groups, we found that the White Female group had a fluctuating but consistent trend. The group we found most interesting was the Asian Male group where the results were more dynamic than the other groups. One reason could be that the individuals chosen—members of the international boy band sensation BTS—have similar features or the model’s limitations pulling random photos. Comparing results allowed us to gain a better understanding of how facial recognition software identifies certain groups of people, as well as informed us of the potential implicit biases that can emerge from these results.
How was their project received? Professor Wright was impressed with the group’s work and presentation. “This is a great example of what students can accomplish in the minor and the Data Science Project course specifically. We encourage students to stretch themselves and to not be afraid to take on tough challenges, which is just what this group was able to do.”
According to Gashe, both the class and the project were well worth it. “I learned a lot regarding the ethics surrounding facial recognition and how we must be cognizant of the implicit bias that this technology may create and perpetuate.”