Get the latest news
Subscribe to receive updates from the School of Data Science.
REDS projects vary significantly in discipline, level of coding complexity, and methodology, allowing students from virtually any background to participate and derive benefit from the program. The projects fall into three broad themes or clusters:
Community-Centered Analysis in Central Virginia
Project mentors: Beth Mitchell and Michele Claibourn
The Community-Centered Analysis team at the UVA Center for Community Partnerships is committed to authentic partnership and shared power. We work with community members, rather than “for” them. Through deep engagement with impacted communities, we gather insights and information to accurately measure and articulate regional patterns of racial and socioeconomic inequity. Guided by principles of justice and respect, transparency and reproducibility, and accessibility and collaboration, our work highlights regional assets and opportunities for collective action. We generate contextualized analysis and data platforms and tools, such as interactive maps and data visualizations, combined with narrative storytelling. These tools make shared information accessible for civic and community leaders to use in the pursuit of a more just community. Students will learn methods of data collection, data visualization, and analysis to contribute to data projects illuminating issues like education, housing, health, and environment.
Minimum qualifications: Experience with or strong interest in learning techniques for data processing and preparation, the R statistical language, and communication and storytelling with data.
Exploring and Visualizing Telehealth Access
Project mentors: Don Brown, Suchetha Sharma & Johanna Loomba
Telemedicine can improve access to care for patients lacking resources for in-person visits. In collaboration with the Karen Rheuban Center for Telehealth, the UVA School of Data Science (SDS), and iTHRIV, we are applying data science to identify which patients benefit most from telehealth. This insight will guide funding, service expansion, and targeting of high-impact areas. Our team has developed a research-ready telehealth dataset combining UVA patient data with public Social Drivers of Health (SDoH) data. This structured dataset enables analysis of clinical and demographic features of telehealth users, currently focusing on comparisons between patients in Southwest Virginia and other regions. RED participants will use aggregated datasets to explore and compare patient populations across Virginia, including their own regions. They will create summary tables, visualizations, and predictive models to evaluate telehealth impact. Using Python, interns will work with Pandas, Matplotlib, Seaborn, and Plotly to analyze data and build interactive regional maps.
Minimum qualifications: Basic knowledge of Python.
Investigating Early Childhood Growth Trends
Project mentor: Heman Shakeri
Join us in exploring the complex dynamics of early childhood growth and malnutrition through the analysis of rich longitudinal data from birth cohort studies. This project offers an opportunity to analyze real-world health data, uncover trends in growth faltering, and identify key factors that impact children's development in their crucial early years. By analyzing growth metrics such as length-for-age, weight-for-height, and weight-for-age z-scores, you will contribute to understanding the multifaceted issue of childhood malnutrition.
Minimum qualifications: Completion of introductory courses in statistics or data science; Basic proficiency in programming, preferably in R or Python; Ability to work with large datasets and perform data cleaning and manipulation; Strong analytical skills and attention to detail.
Analyzing and characterizing Slurm job workloads
Project mentor: Yue Cheng
REDS participants will analyze GPU job workloads from UVA's Rivanna / Afton high performance computing clusters. You will work with datasets of job logs and system metrics to clean, process, and visualize the Slurm workloads. We are interested in identifying patterns in GPU utilization, job queueing times, and scheduling behaviors. Daily tasks may include writing Python scripts to parse Slurm logs, generating summary statistics, creating plots or dashboards to highlight trends, and discussing findings with the DS2 research lab. The goal of the project is to find insights that could help improve the efficiency of the GPU infrastructure. These efficiencies have implications for the energy and cooling costs of computing, and by implication, the carbon footprint.
Minimum qualifications: Proficiency with Python
Defense Against Malicious Large Language Models
Project mentors: Tianhao Wang and Yan Pang
In response to malicious users who jailbreak or fine-tune standard LLMs and sell them on the dark web, this project aims to develop a reliable defense method to detect outputs generated by compromised models. By identifying these outputs, we can help prevent users from being misled by deceptive content, such as phishing emails. Our approach involves introducing a backdoor trigger tag into a normal LLM, which, once activated, facilitates the recognition of maliciously generated text. Undergraduate students will contribute to dataset generation and preprocessing, and they will also participate in model training and subsequent result analysis.
Minimum qualifications: Students involved in this project are expected to have a basic understanding of LLMs and some prior experience with PyTorch.
Fairness and Compositionality in Large Language Models
Project mentors: Jundong Li and Song Wang
This project builds upon the insights of the Compositional Evaluation Benchmark (CEB) to evaluate and improve fairness in large language models (LLMs). The focus is on developing methods to test and quantify biases within LLMs by designing targeted evaluation sets and analyzing model behaviors in compositional scenarios. Undergraduate participants will take charge of curating datasets, implementing fairness metrics, and experimenting with small-scale language models to understand their compositionality and fairness properties. The project aims to create tools and methodologies that promote equitable AI systems, giving students hands-on experience with cutting-edge NLP research and ethical AI principles.
Minimum qualifications: introductory programming course (e.g., Python or equivalent); Familiarity with machine learning or natural language processing concepts.
Digital Identity, Surveillance, and Technology Governance
Project mentor: Aaron Martin
This research project investigates digital identity, surveillance, and technology governance through a critical lens. Potential case studies include biometrics, synthetic data, humanitarian innovation, digital connectivity, and data markets. Project members will work with Dr. Aaron Martin as research collaborators and potential co-authors to undertake literature reviews and other desk research on relevant topics with the aim of developing articles for peer-reviewed outlets.
Minimum qualifications: experience conducting literature reviews and synthesizing research articles in the social sciences.
Civic Technology Documentation
Project mentors: Lane Rasberry and Jon Kropko
This project will develop and publish documentation for a civic technology resource. Community organizations have produced an abundance of technological resources, and large groups of community stakeholders use these resources, yet both the technology and its use often occur without documentation. There is almost no storytelling about the impact of volunteer-managed technology on communities. Your summer project will be to write these stories so that civic tech contributors can receive credit, and communities can learn from each other's efforts in developing local technologies.
Minimum qualifications: experience or interest in technical documentation, library science, and/or digital publishing.
Where Does Data (Science) Come From?
Project mentor: Mar Hicks
Faculty and students in the School of Data Science are engaged in a variety of research projects that require the use and collection of both small and large data sets. Sometimes these data sets are collected or created by the researchers themselves, but just as often they are pre-existing data sets that researchers find and use. The students on this project will undertake a survey of the datasets currently in use by SDS faculty and students to try to determine what kinds of data are shaping current and near-future SDS research and goals at UVA and beyond — and why. The broader impacts and implications of this project include giving more insight into the current shape of data science as a field, based on data from the non-data-science fields and sources that shape the questions and methods of data science research.
Minimum qualifications: Evidence of strong written and oral communication skills.
Subscribe to receive updates from the School of Data Science.