Projects

REDS projects vary significantly in discipline, level of coding complexity, and methodology, allowing students from virtually any background to participate and derive benefit from the program. The projects fall into three broad themes or clusters:

Apply Data Science to Social Issues: Students use the tools of data science to study social issues, such as health, education, housing, etc. These projects are best suited to students with some understanding of statistics.
Improve How Data Science Works: Students develop tools to reduce the potential harm posed by datafication, automated decision systems, and AI. These projects are best suited to students with prior coding experience (Python) and some understanding of machine learning concepts.
Study Data in the World: Students interrogate data itself, challenging the perceived inevitability of artificial intelligence and the reification of data. These projects are best suited to students with strong skills in writing and literature reviews.

Apply Data Science to Social Issues

Community-Centered Analysis in Central Virginia
Project mentors: Beth Mitchell and Michele Claibourn

The Community-Centered Analysis team at the UVA Center for Community Partnerships is committed to authentic partnership and shared power. We work with community members, rather than “for” them. Through deep engagement with impacted communities, we gather insights and information to accurately measure and articulate regional patterns of racial and socioeconomic inequity. Guided by principles of justice and respect, transparency and reproducibility, and accessibility and collaboration, our work highlights regional assets and opportunities for collective action. We generate contextualized analysis and data platforms and tools, such as interactive maps and data visualizations, combined with narrative storytelling. These tools make shared information accessible for civic and community leaders to use in the pursuit of a more just community. Students will learn methods of data collection, data visualization, and analysis to contribute to data projects illuminating issues like education, housing, health, and environment.

Minimum qualifications: Experience with or strong interest in learning techniques for data processing and preparation, the R statistical language, and communication and storytelling with data.

Exploring and Visualizing Telehealth Access
Project mentors: Don Brown, Suchetha Sharma & Johanna Loomba

Telemedicine can improve access to care for patients lacking resources for in-person visits. In collaboration with the Karen Rheuban Center for Telehealth, the UVA School of Data Science (SDS), and iTHRIV, we are applying data science to identify which patients benefit most from telehealth. This insight will guide funding, service expansion, and targeting of high-impact areas. Our team has developed a research-ready telehealth dataset combining UVA patient data with public Social Drivers of Health (SDoH) data. This structured dataset enables analysis of clinical and demographic features of telehealth users, currently focusing on comparisons between patients in Southwest Virginia and other regions. RED participants will use aggregated datasets to explore and compare patient populations across Virginia, including their own regions. They will create summary tables, visualizations, and predictive models to evaluate telehealth impact. Using Python, interns will work with Pandas, Matplotlib, Seaborn, and Plotly to analyze data and build interactive regional maps.

Minimum qualifications: Basic knowledge of Python.

Investigating Early Childhood Growth Trends
Project mentor: Heman Shakeri

Join us in exploring the complex dynamics of early childhood growth and malnutrition through the analysis of rich longitudinal data from birth cohort studies. This project offers an opportunity to analyze real-world health data, uncover trends in growth faltering, and identify key factors that impact children's development in their crucial early years. By analyzing growth metrics such as length-for-age, weight-for-height, and weight-for-age z-scores, you will contribute to understanding the multifaceted issue of childhood malnutrition.

Minimum qualifications: Completion of introductory courses in statistics or data science; Basic proficiency in programming, preferably in R or Python; Ability to work with large datasets and perform data cleaning and manipulation; Strong analytical skills and attention to detail. 

Improve How Data Science Works

Analyzing and characterizing Slurm job workloads
Project mentor: Yue Cheng

REDS participants will analyze GPU job workloads from UVA's Rivanna / Afton high performance computing clusters. You will work with datasets of job logs and system metrics to clean, process, and visualize the Slurm workloads. We are interested in identifying patterns in GPU utilization, job queueing times, and scheduling behaviors. Daily tasks may include writing Python scripts to parse Slurm logs, generating summary statistics, creating plots or dashboards to highlight trends, and discussing findings with the DS2 research lab. The goal of the project is to find insights that could help improve the efficiency of the GPU infrastructure. These efficiencies have implications for the energy and cooling costs of computing, and by implication, the carbon footprint.

Minimum qualifications: Proficiency with Python

Defense Against Malicious Large Language Models
Project mentors: Tianhao Wang and Yan Pang

In response to malicious users who jailbreak or fine-tune standard LLMs and sell them on the dark web, this project aims to develop a reliable defense method to detect outputs generated by compromised models. By identifying these outputs, we can help prevent users from being misled by deceptive content, such as phishing emails. Our approach involves introducing a backdoor trigger tag into a normal LLM, which, once activated, facilitates the recognition of maliciously generated text. Undergraduate students will contribute to dataset generation and preprocessing, and they will also participate in model training and subsequent result analysis.

Minimum qualifications: Students involved in this project are expected to have a basic understanding of LLMs and some prior experience with PyTorch.

Fairness and Compositionality in Large Language Models
Project mentors: Jundong Li and Song Wang

This project builds upon the insights of the Compositional Evaluation Benchmark (CEB) to evaluate and improve fairness in large language models (LLMs). The focus is on developing methods to test and quantify biases within LLMs by designing targeted evaluation sets and analyzing model behaviors in compositional scenarios. Undergraduate participants will take charge of curating datasets, implementing fairness metrics, and experimenting with small-scale language models to understand their compositionality and fairness properties. The project aims to create tools and methodologies that promote equitable AI systems, giving students hands-on experience with cutting-edge NLP research and ethical AI principles.

Minimum qualifications: introductory programming course (e.g., Python or equivalent); Familiarity with machine learning or natural language processing concepts.

Study Data in the World

Digital Identity, Surveillance, and Technology Governance
Project mentor: Aaron Martin

This research project investigates digital identity, surveillance, and technology governance through a critical lens. Potential case studies include biometrics, synthetic data, humanitarian innovation, digital connectivity, and data markets. Project members will work with Dr. Aaron Martin as research collaborators and potential co-authors to undertake literature reviews and other desk research on relevant topics with the aim of developing articles for peer-reviewed outlets.

Minimum qualifications: experience conducting literature reviews and synthesizing research articles in the social sciences.

Civic Technology Documentation
Project mentors: Lane Rasberry and Jon Kropko

This project will develop and publish documentation for a civic technology resource. Community organizations have produced an abundance of technological resources, and large groups of community stakeholders use these resources, yet both the technology and its use often occur without documentation. There is almost no storytelling about the impact of volunteer-managed technology on communities. Your summer project will be to write these stories so that civic tech contributors can receive credit, and communities can learn from each other's efforts in developing local technologies.

Minimum qualifications: experience or interest in technical documentation, library science, and/or digital publishing.

Where Does Data (Science) Come From?
Project mentor: Mar Hicks

Faculty and students in the School of Data Science are engaged in a variety of research projects that require the use and collection of both small and large data sets. Sometimes these data sets are collected or created by the researchers themselves, but just as often they are pre-existing data sets that researchers find and use. The students on this project will undertake a survey of the datasets currently in use by SDS faculty and students to try to determine what kinds of data are shaping current and near-future SDS research and goals at UVA and beyond — and why. The broader impacts and implications of this project include giving more insight into the current shape of data science as a field, based on data from the non-data-science fields and sources that shape the questions and methods of data science research.

Minimum qualifications: Evidence of strong written and oral communication skills.

Filter by

Apply Data Science to Social Issues Close Icon Close

Improve How Data Science Works Close Icon Close

Study Data in the World Close Icon Close

Get the latest news

Apply Data Science to Social Issues

Improve How Data Science Works

Study Data in the World