Building a bridge between innovation and ethics in data science

A lively room with diverse students engaged in collaborative group work around tables, using laptops for Data Viz Day.

When Rakesh Ravi, M.S. in Data Science student, was working as a data analyst in digital marketing, he often wondered about the impact of his work.

“You get involved in the numbers, the analytics, in solving a problem, and you lose sight of the effects of your work on actual people,” he says. “One of the most important concepts a data scientist has to grasp is an understanding of your impact on the world around you – an algorithm can’t do that.”       

Data science plays a crucial role in today’s world and, as data-driven technologies find application in virtually all aspects of contemporary life, data scientists must consider what effects their work has and should have. How does data science relate to issues of privacy, bias, discrimination, and inequality? What new ethical problems emerge from artificial intelligence and algorithmic decision-making? How might values of justice, care and the good life guide responses to these pressing issues?

Ravi and his fellow M.S. in Data Science students at the School of Data Science are immersed in the mechanics of data science – machine learning, natural language processing, data integration and engineering. Crucially, they are also studying an equally important aspect of the field – data science ethics and justice.  

“It’s important that among all of our more technical classes, there are courses close to the humanities that allow us to think about the impact of data science in the real world, beyond just innovation,” says MSDS ‘19 student Shaoran Li, who worked as a business intelligence analyst in the financial industry prior to joining the MSDS program.

“A lot of us math heads put priority in learning new skills and in technical, hands-on work for careers,” she says. “Having an understanding of data ethics will also be really useful for us in future work environments when discussions arise around how our data are collected, managed, analyzed and archived.

“You can be ethically responsible and generate a profit – it is a choice you can make.”

The Ethics of Big Data course that is part of the MSDS curriculum at SDS argues that these questions lack simple and ready-made answers. It advocates that data ethics and justice is an ongoing process at all stages of data production, dissemination and use and necessarily involves risk and uncertainty with which data practitioners ought to concern themselves. Dr. Samuel Lengen, who teaches the course and works with the DSI Center for Data Ethics and Justice, has a background in social science.

“While I have done extensive research on the ethics of digital technologies, teaching aspiring data scientists was a novel experience for me,” he says. “It was my great pleasure to witness just how willing our Master’s students are to engage tough questions, consider the potential harm of data applications, and try to imagine a data science that can help to make the world a better place for all.”

The Center for Data Ethics and Justice, led by UVA Department of Anthropology professor Dr. Jarrett Zigon, strives to bring a social science and humanities perspective to data science at the DSI. In class, this means getting students to ask questions about the social, economic, and political complexities into which data science intervenes.

“For students, I admit, embarking on this journey can be challenging at first,” Lengen says. “When I asked my students to answer a set of questions about the data used in their capstone projects—such as who collected their data, what data was collected, why that data was collected, who it was collected from—they made the somewhat uncomfortable discovery that their knowledge was limited and that under the surface of these deceptively simple questions lay unknown worlds with complicated ethical implications.”

This recognition—that having a lot of data is not the same as knowing it—is important. One of the first demands of the ethics course, therefore, is that students try to get to know their data.

“I encourage students to think about the perspectives, ambitions and assumptions that made the production of their data possible as well as consider those that were left out. Ideally, what comes out of this engagement, is the realization that data sets contain histories, biases, or inequalities that have consequences—not just for their work but also for the lives of others.

“It is therefore crucial that data science students learn to interrogate their own intentions, views, and preconceptions by placing them in conversation with others,” Lengen says.

The need for such a data ethics is clear. Recent scandals involving Cambridge Analytica and Facebook have made evident that existing conventions and rules around issues of data privacy, surveillance, and manipulation are insufficient in the digital age. There is growing recognition that the world needs data scientists who are able to think proactively and creatively through the systemic ethical issues raised by data and who are willing to face complex ethical problems to envision a better future.

By making data ethics a mandatory part of the curriculum, the DSI recognizes the need for a new generation of critical data scientists.

“Choosing problems is as important as solving them,” Ravi says. “A lot of us want to leave the world a better place than we found it.”

This story was co-written with Dr. Samuel Lengen.