Nearly all of our daily technology-driven activities — checking an online weather forecast, using a smartphone to find the nearest gas station, even basic web searches — are made possible by open data.
Open data is information, usually from government organizations or large corporations, that anyone can freely access and use. More institutions are now recognizing the importance of transparency and the innovation made possible with open data.
The University of Virginia Data Science Institute is partnering with the UVA Library to expand data science resources through the establishment of the Open Data Lab, a data sharing network that will support research through its entire process — from the data collection stage through analysis and data use.
“The Open Data Lab is a framework to share open datasets while simultaneously enabling computation,” UVA Senior Research Data Scientist Peter Alonzi said. “The goal is to support the complete research life cycle, consisting of data and its storage, analytics and its execution and narrative.”
The network will allow users to access and study data sets through an integrated execution environment, which is an online-based computation system for large data values. The Open Data Lab aims to resolve issues associated with storing and analyzing information from large data sets.
“The main benefit of this platform is access to open datasets and computation tools,” Alonzi said, “this means many of the burdens of working with data at scale are mitigated.”
Data at scale can refer to any data set that is large in volume or diverse in type. Traditional data analytics methods are not well equipped to study this kind of data, but the Open Data Lab will aid students and researchers in understanding these vast and complex data sets through newly designed computing algorithms.
The Open Data Lab is designed to aid with several different use cases, accounting for a broad range of free-use data possibilities. Scientists with novel data sets can transfer their findings into the system, allowing their analysis and computations to be freely accessed. The system will also give researchers access to other scientist’s data sets and source codes, allowing them to reproduce results or establish their own adaptation.
“The Open Data Lab will create greater cost-effectiveness and higher productivity,” Alonzi said, “allowing for better data reuse to be achieved.”
The Open Data Lab is being implemented in progressive stages, beginning with the ‘technology exploration’ phase. This step consisted of testing data storage and accessibility, starting with data from the DSI Master of Science in Data Science (MSDS) 2018 Capstone groups. The project will eventually expand beyond the DSI to include the entire UVA and data science communities.
The program will expand upon UVA data science services currently being provided through the StatLab, a program that provides students and researchers with expert consultation and training in data science and computational methods.