Sharing Knowledge

Casey Wagner
September 17, 2019

UVA's School of Data Science is building a culture of open access to drive discovery

The University of Virginia School of Data Science is partnering with the UVA Library to expand data science resources through the establishment of the Open Data Lab, a data sharing network that will support research through its entire process — from the data collection stage through analysis and data use. 

“The Open Data Lab is a framework to share open datasets while simultaneously enabling computation,” Open Data Lab Project Manager and Data Scientist Peter Alonzi said. “The goal is to support the complete research life cycle, consisting of data and its storage, analytics, and its execution and narrative.” 

The network will allow users to access and study data sets through an integrated execution environment, which is an online-based computation system for large data values. The Open Data Lab aims to resolve issues associated with storing and analyzing information from large data sets. 

“The main benefit of this platform is access to open datasets and computation tools,” Dr. Alonzi said, “this means many of the burdens of working with data at scale are mitigated.”

Data at scale can refer to any data set that is large in volume or diverse in type. Traditional data analytics methods are not well equipped to study this kind of data, but the Open Data Lab will aid students and researchers in understanding these vast and complex data sets through newly designed computing algorithms. 

The Open Data Lab is designed to aid with several different use cases, accounting for a broad range of free-use data possibilities. Scientists with novel data sets can transfer their findings into the system, allowing their analysis and computations to be freely accessed. The system will also give researchers access to other scientist’s data sets and source codes, allowing them to reproduce results or establish their own adaptation. 

The Open Data Lab is being implemented in progressive stages, beginning with the ‘technology exploration’ phase. This step consisted of testing data storage and accessibility, starting with data from the DSI Master of Science in Data Science (MSDS) 2018 Capstone groups. The project will eventually expand beyond the DSI to include the entire UVA and data science communities.

To follow the project check out the github repository here: