Get the latest news
Subscribe to receive updates from the School of Data Science.
Data science students Sana Syed, Sudeepti Surapaneni, and Logan Lee are increasing the accessibility of art through their capstone project at the School of Data Science.
Every MSDS student completes a capstone project prior to graduating.
“Experiential learning is the cornerstone of UVA’s master’s program in data science. The capstone program is one of the primary ways that the School of Data Science engages with the community,” said Claudia Scholz, the Associate Director for Research Development at the School of Data Science. “Students apply their data science knowledge to help solve problems in the real world.”
These MSDS students have been working together on this project in close collaboration with the Metropolitan Museum of Art and the Wikipedia Foundation on their capstone. Their project is titled, “Creating a Global Museum of Art.”
Jennie Choi, the General Manager of Collection Information at the MET, explained that creating open access to art is something the MET values highly.
“We believe there's something in our collection for everyone. Everyone on the planet can find something that will inspire them in this museum, and we want to make that again as accessible as possible through any means,” Choi said. “There are so many ways into our collection, so that's why we all work here. We think art is important in the world.”
Choi explained that the MET has been working to increase accessibility since 2017 with the launch of Open Access.
“That means that all our public domain images are available free of use without permission,” Choi noted. “We launched our public API [Application Programming Interface] a year ago in October, and we wanted to celebrate that, and we want people to use our data.”
Images of artwork at the MET are available on their website. Not only are hundreds of thousands of images available, but there is also detailed information on the piece, artist, medium, and more.
These images fall under two categories - images in the public domain and images under copyright. There are over 400,000 images of artwork in the public domain, which can be downloaded and shared without restriction. Any artwork under copyright can be requested via a form available on their website.
Syed, Surapaneni, and Lee worked with the MET to improve the search of the museum’s online collection. Specifically, they looked into how objects in paintings were classified and thus labelled online.
Syed broke down how a computer sees a painting and labels it.
“When I or you look at a digital image what we see is this picture of dogs, but what the computer sees are these rows of numbers which represent pixels,” she explained. “Because you can reduce that image to a bunch of numbers, and the computers are really good at looking at patterns, this is a specific way that you can hunt for patterns. You can actually get decision-making that was otherwise only thought to be the domain of humans.”
Through this process, these students learned that cultural considerations are often not factored into this computer decision making, which can be problematic.
“[...] Everything is very subjective, and there is a lot of meaning behind the arts,” Lee explained. “Also, there are a lot of cultural problems, because every culture has a different meaning [behind] certain objects.”
Surapaneni gave an example of Mughal art, which is a style of South Asian miniature painting. Many of these paintings feature men in dresses. The computer then classifies the males as females in these paintings.
“We’re trying to address that bias,” Surapaneni said. “As of right now, we're visualizing that bias with Grad-CAMS [Gradient-weighted Class Activation Mapping], which basically look like heat maps, so that kind of gives us insight into actually, what our model is using, or what areas of the image it’s looking at to make a decision.”
Syed explained that these heat maps designate what within an image was most important in classifying it by giving that portion of the image the brightest color.
“So, if you ask me as a human, ‘how do you decide whether an image has a picture of a bird?’ I'll probably say, ‘well the feathers, or the wings, or the beak.’”
In a painting with a bird, the classifier sees feathers, wings, a beak, etc. and successfully labels it as a bird. However, in an image with a dress, such as the Mughal paintings, the classifier denotes the dress as most important within the image rather than any facial features and thus misclassifies the males as females.
“Where this is problematic is, for example, in South Asian images of men wearing more flowing outfits, or Japanese men wearing a kimono,” Syed explained. “Would those images also get misclassified?”
This bias goes beyond Mughal artwork and reveals the complexity of labelling artwork.
What is the solution?
Capstone mentor, Raf Alvarado, who is the Program Director at the School of Data Science, explained how they are working to alter the classification system.
“What we’re looking at as a solution is the possibility of pre-filtering and pre-recognizing based on culture area or perhaps representational tradition,” Alvarado explained. “These are things you can get at by looking at numeric features associated with an image, like its color index or something like that, and then pre-filter things into various collections and culture areas, and then try your recognition by labels.”
Alvarado continued to explain that this project is bringing awareness to the representational complexity of culture within an image.
Scholz praised the project for broadening the impact of art by increasing access to the images.
“This collaboration with the Met demonstrates the role that data science can play in making art findable and accessible to broader audiences, increasing exposure for each work, thereby expanding its impact and meaning,” Scholz said.
Lane Rasberry, Wikimedian-in-Residence at the School of Data Science, explained that this is all part of a larger effort by Wikipedia to index art from every museum in the world online.
“We're trying to get [...] the sum of all paintings in the world for the benefit of the public for them to ask the questions about the art, enjoy in the way that they like,” Rasberry explained.
Syed noted that they have all learned so much along the way through this project that they did not expect.
“I think when a movement like this is put together, you really don’t know where it's going to end up, and that's part of the beauty of knowledge and the power that goes with that.”
Subscribe to receive updates from the School of Data Science.