How Data Science Could Reshape What We Know About the Law
Karolina Naranjo-Velasco's journey to the law and data science began with understanding war and how it deepens inequalities.
Growing up in Santander, Colombia, she witnessed the social and economic effects of armed conflict on families, including her own, and saw how war impacted the most disadvantaged in her country.
During vacations at her family’s cattle ranch, she noticed that the rural population had endured multiple forms of violence and experienced social and economic exclusion. This exclusion, in turn, produced several waves of victimization, such as forced displacement and a lack of access to economic opportunities.
She wanted to help find solutions to these seemingly intractable problems, a passion that led her to pursue social justice as a personal and professional vocation.
“That’s why I studied law — to defend people and make a difference,” she said.
After years as an attorney, she realized interpreting data can affect people’s lives in tangible ways. Serving as a victim reparation and assistance director in her region, she was inspired to pursue an education in computational methods to explore how the legal community could benefit from the more effective use of automated systems, particularly in decision-making.
This belief in data science as a pivotal tool in the legal field led her to the University of Virginia where she’s pursuing a doctoral degree in data science after earning a master’s degree in the field.
Naranjo-Velasco’s life and mission are powerful illustrations of how data science and the law can intersect, leading to important insights about issues that have a profound impact on society, both inside and outside courtrooms. It’s work that’s being done not only by Naranjo-Velasco at the School of Data Science but across Grounds at the University’s Law School.
Promoting justice through data science
From her days as a practicing attorney to her shift to data science, Naranjo-Velasco has seen how consequential, yet sometimes inscrutable, legal data can be.
The data contained in legal documents, such as court opinions, can have far-reaching impacts and can even trigger sweeping changes, she said. These documents are often hard for the public to understand because they use complex language.
Despite their complexity, Naranjo-Velasco concluded that data science tools can facilitate their comprehension and make them more accessible.
A concept known as “law-as-data” has served as the basis for statistical and computational approaches to handle the unique characteristics of legal documents, she said. In addition, the boom of large language models, or LLMs, can help fill this gap.
“We can train those models, for example, for summarization of legal documents to enhance legal analysis,” said Naranjo-Velasco.
Naranjo-Velasco is continuing to build a new skill set while at UVA, from how to optimize the training method for artificial intelligence systems to better understanding how to use and evaluate algorithms.
Looking ahead, she’s excited about the possibilities of using natural language processing, especially LLMs, to analyze legal decisions. She’s also encouraged by the possibilities offered by the rapidly evolving world of legal tech and computational legal studies, where new tools, such as chatbots, are being created to help ensure more people can receive answers on legal matters.
There are many ways to be innovative in these areas, she said.
She is hopeful that artificial intelligence tools can make complex legal concepts more accessible.
“An AI system that simplifies technical language and documents into easily readable and understandable legal text helps a lot. Knowing our rights can help us feel protected by the law,” she said.
Quantifying judges
Like Naranjo-Velasco, researchers at UVA’s Law School are also embracing the use of data to better understand the law and judiciary.
Li Zhang leads the Legal Data Lab at UVA’s Law School. Founded in 2016, it was one of the first data labs of its kind at a U.S. law school.
Before pursuing his Ph.D. as a computational social scientist specializing in the intersection of human-computer interaction and political communication, Zhang had had a long academic interest in the history of science and technology.
Unlike some previous major scientific developments, with data science, Zhang said, “this time you can be in the thick of it and study the impact it is going to unleash on human society.”
He soon became passionate about applying his skills to teach, promote, and support data-driven scholarship —
Even though he didn’t have a background in legal research, in 2023 he jumped at the chance to support the work of the faculty and students at one of the top law schools in the country.
There is a “growing interest among students for empirical studies,” Zhang said. “And that often involves big data, harvesting, cleaning, and processing data in creative ways.”
Zhang has been involved with projects relating to judicial behavior, including an ongoing collaboration with the school’s Supreme Court Litigation Clinic, which wants to build a model that can predict the success of petitions for certiorari, the term used when the court agrees to hear a case.
“It’s pretty challenging from the data gathering to the modeling,” Zhang said, explaining that because the court grants so few cert petitions relative to how many litigants seek review, the issue of class imbalance arises.
“I’m still experimenting with different techniques to augment the dataset,” he said.
Zhang also recently partnered with a law professor who, using data science methods, developed an innovative new way to measure the ideology of federal court judges.
Kevin Cope, an associate professor of law at UVA, is both a practitioner and teacher of legal empirical analysis.
Working with Zhang, he computed what he called Jurist-Derived Judicial Ideology Scores, which place nearly every Article III federal judge since 1990 on an ideological scale. The ratings are based on hundreds of thousands of pages of comments from experts such as lawyers or former law clerks describing the ideological bent of over 2,000 judges.
“The idea is we can use the wisdom of experts to measure judges across courts, and even over time — that’s something that’s proved elusive thus far,” Cope said.
That’s because judges across the country preside over such a wide variety of matters. “It’s a very difficult measurement problem to compare them with each other because we don’t have a frame of reference. They’re hearing completely different cases,” he said.
Ultimately, the dataset Cope worked with consisted of 100 large books of hardcopy records, many of which had various notes that were difficult to interpret or missing pages.
“Trying to automate the system for quantifying this information proved extremely difficult,” he said.
Zhang, though, was able to design a method for recovering and quantifying all of the problematic or missing pages.
“Every project has really unique challenges, but I think this project truly was unique,” Cope said.
Thus far, Cope has published his ideological scores for federal circuit judges. Soon, he’ll share his ideology scores on U.S. district court judges, as well as findings on other traits, including temperament and judging style — data that could offer untold new insights about the men and women who make decisions that have the potential to reshape American society.
The future of data science and the law
So, where do data science and the study of law go from here? On Grounds, the future certainly appears promising.
“UVA has a great tradition of quantitative analysis, of empirical legal studies on our faculty,” said Cope.
“Increasingly, we have students who are interested in empirical work as well,” he added.
Cope is also hopeful that data analysis will continue to be emphasized by legal educators and in the profession.
“As with many other professions, much of what lawyers do will be supplemented or supplanted by artificial intelligence in the coming years," he said. "To stay relevant, lawyers would be well-advised to become conversational in the languages of data analysis and machine learning, in order to either apply these technologies themselves or collaborate with legal data scientists. And to prepare modern lawyers, law schools will need to make those skills a bigger part of the curriculum.”
Zhang also believes lawyers will rely more on data science.
“It’s just a matter of time before it becomes an integral part of their practices,” he said.
As for Naranjo-Velasco, she plans to continue focusing on the intersection of the law and data science throughout her doctoral work at UVA and beyond, finding ways where her unique background can help her promote a more just society by enhancing AI tools for legal analysis.
It’s a passion that is fueled by her past in Colombia and her discovery of the potential of data science to make a positive impact on society.
“Data science has the power to bring objectivity to law,” she said. “It also allows us to take many sources of information and put them together to understand the facts. That's why it's a valuable tool for both lawyers and data scientists".