
PyData Virginia 2025 Spotlights UVA and Charlottesville

PyData Virginia 2025 was held in Charlottesville for the first time on April 18-19, featuring technical talks for technical people and University of Virginia alumni, students, and faculty as speakers. The two-day conference sparked discourse and facilitated learning about everything artificial intelligence, from building and training to applications and ethics.
PyData is an international community of developers and users of open-source data tools. The organization hosts meetups and conferences worldwide, connecting data scientists with the newest technology, ideas, and research.
A key takeaway from PyData Virginia 2025 was that Charlottesville is an up-and-coming technology hub. The city is host to AI startups and technology companies, some of which were established by UVA alumni. Proximity to the University and a diverse and supportive community of data scientists have elevated the city to the innovative hub that it is now being recognized for. A welcomer for the conference remarked that Charlottesville offered the technical prowess of Silicon Valley and the stunning nature of the Shenandoah Valley.
While Day Two of the conference was held at the University of Virginia Darden School of Business and featured technical skills-building workshops, Day One at the Violet Crown Theater on the Downtown Mall offered talks from 35 total speakers from a variety of backgrounds.
Keynote: Building AI-First Organizations
The conference began with a keynote from Rajkumar Venkatesan, an author, professor of business administration at UVA Darden, and academic director at the LaCross Institute for Ethical Artificial Intelligence in Business. His talk, entitled "Building AI-First Organizations," covered how to utilize data, privacy, professionals, and ethics to build an organization that prioritizes AI but is still competitive in today's market.
Venkatesan covered four questions that companies in AI will often ask: is data still an asset, will privacy regulations hurt AI, where do humans fit into this new environment, and what is an ethical view of data? He argued that all of these questions necessitate a customer-focused perspective, and this viewpoint will often prevent barriers like loss of trust or accessibility.
"The ultimate goal is to use data and algorithms to personalize all aspects of the customer experience, redefining engagement, growth, retention," he said. "So, if you're looking at how do we make our AI projects relevant, it's about connecting what you're doing to how it is going to affect your customers, and what aspects of the customer journey and customer life cycle you are influencing."
He concluded with four tenets to keep in mind when building an AI-first organization. "Data is still king," Venkatesan said. "Customer focus can overcome privacy concerns; design to build trust in AI; and adopt a value-chain perspective for ethical AI."
Generative AI and Market Intelligence
After the keynote, attendees had the option of three auditoriums to visit for their next talk, with three talks going on concurrently the rest of the day. These talks ranged from Bayesian risk analysis to addressing climate change with AI.
MacKenzye Leroy, lead data scientist at S&P Global's MI Enterprise Technology & Internal Productivity Team and an alumnus of the School of Data Science MSDS program, presented his process for building generative AI applications during his talk: "Evaluating LLMs at S&P Global: Building a Robust Evaluation Framework for GenAI Productivity Tools."
Leroy recounted how he led a team creating a GenAI-powered market intelligence sales assistant at S&P Global, helping sales assistants quickly and reliably find legal and procedural information. He took the audience through the trial-and-error process of training this model.
The talk covered testing the sales assistant with LLM-specific metrics, question-answer pair generation, ground truth creation, and evaluation implementation. Leroy discussed the challenges involved with building this model since there was zero tolerance for risk. "In general, when dealing with large language models, one of the challenges is that the business impact can be significant if you have an error," he said. "Another challenge is language is really hard to evaluate. There are, functionally, infinite ways to say something that means the same thing."
Leroy said that since this model would need to answer legal questions with consequentially specific jargon, the model needed more robust verification. The team had been employing subject matter experts at S&P Global to create question-and-answer pairs to test the model, but this process was proving to be wearing on these SMEs who had other obligations.
Ultimately, Leroy's solution was to ask the model to come up with its own Q&A pairs that SMEs would then verify before their use. This solution, a combination of human and AI efforts to train and refine the model, was the most timely, cost-effective, and successful in ensuring the model's accuracy.
Watch our spotlight on MSDS alum MacKenzye Leroy on YouTube.
Brain Data and Autism
Siwen Liao, a second-year physics and statistics major at UVA, presented research she and her professor have been conducting on the differences in MRI-derived brain data in both males and females with and without Autism Spectrum Disorder (ASD) in her talk, "The Art of Brain Data in ASD Subjects: Celebrating Neurodiversity Through Aesthetic Data Visualization."
Liao began by establishing the need and question behind this research, sharing that ASD is now diagnosed in 1 in 36 children, with the gender gap in autism for males to females being 4-to-1. Liao explained this is because of the "female protective effect" in which women mask symptoms more than men, so traditional behavior-based tests are less successful at detecting ASD.
The project analyzed MRI-attained brain data of about 300 individuals. Liao and her professor 3D printed a small model of every brain to see if they could identify any trends. Additionally, they scanned all of these brains to create a 3D computer model that could be used to identify differences in the shape and structure of the brains while allowing for focus on different brain regions.
Her conclusions began with a multivariate statistical analysis of the data, which was followed by a look at the 3D-printed brains and the subsequent mapped model. "We can see that the males have a very similar distribution, but the females tend to differ a little bit, which is really interesting," she said. "We can see that the top regions contribute a lot to emotional processing, and that's something that can be studied further, because evidently we saw there were some differences in how ASD manifests itself in females versus males."
She explained that brain scans and behavior-based tests are equally important in diagnosing ASD, and neither should be relied on to test for ASD alone. Therefore, Liao hopes the conclusions and continuation of her research will draw attention to the gender gap and support the refining of both ways to test for ASD.
Eviction Trends in Virginia
Lastly, a talk by Michele Claibourn focused on analyzing eviction trends in Virginia. Claibourn, assistant professor of data science (by courtesy), assistant professor at the Batten School of Leadership in Public Policy, and director of community-centered analysis at the Center for Community Partnerships, was joined by Samantha Toet, data scientist at the Center for Community Partnerships and a UVA alumna, to present their work in their talk, "Exploring Eviction Trends in Virginia."

Their research looked at where landlords engage in more eviction actions, with a focus on what characteristics contribute to this, including: the practice of "serial filing," where multiple evictions are filed by the same landlord in a calendar year; "nuisance filings," in which landlords file eviction as a rent-collecting measure; "rent burden," where rent is 30% or more of a renter's income; and the difference between landlords with one or a few properties and landlords that act like a business.
Claibourn began by establishing that five of the 10 cities with the highest eviction rates in the U.S. are in Virginia. These rates led her and Toet to research potential eviction hubs and compare overall rates pre- and post-COVID, hoping that their findings could drive eviction reduction work in the state.

"Evictions harm people, not just when they are displaced from their home, but by having a court record about having an eviction filed on you. Regardless of the outcome itself, [it] creates harm for people," Claibourn said. "It reduces access to future housing, to jobs, to schools, to transportation, to childcare. It impacts people's health and their ability to do their jobs well. It impacts communities by destabilizing them, by having a variety of displacement that makes folks have less social support. ... So, it matters."
Toet then took the audience through a statistical review of their findings. She said that in 2024, there were almost 20,000 landlords total in Virginia. There were 46 counties where there were between 0%-30% of the landlords operating as a business; 65 counties where 31%-50% of those landlords were operating as a business; and then 13 counties where 51%-100% of the landlords were operating as a business, Toet said.
The researchers found that 93% of serial nuisance filings were filed by landlords acting as a business, and that, if the plaintiff was non-residential and the landlord was acting as a business, there was a 97% chance that the filing was serial.
Claibourn and Toet concluded that eviction filings are certainly being used as a rent-collection measure, and they are using their findings to further advocate for other options and resources for both plaintiffs and landlords to avoid filing evictions.