Recent advancements in natural language processing research have yielded popular chatbot systems, including Apple’s Siri and Microsoft’s Cortana.
One potential application of this technology, with implications for international relations, diplomacy, and security, includes training military personnel to interact effectively with foreign civilian populations. But in order to develop such systems, a dataset of possible user statements must first be collected.
Master of Science in Data Science capstone project researchers Vaibhav Sharma, Beni Shpringer, and Michael Yang, along with UVA School of Engineering M.S. student Martin Bolger and Ph.D. students Sodiq Adewole and Erfaneh Gharavi, sought to develop new methods for collecting, generating, and labeling data to aid in the creation of educational, free-input dialogue simulations.
This project included an evaluation of various online data collection methods, the creation of a labeling system for capturing text details, and the development of a data generation algorithm for creating a large dataset from a small input set. The approach developed in this project can be used to obtain data for training natural language processing models for a chatbot system.
In future research, the team plans to examine the results of using this approach in the development of an educational simulation with a free-input dialogue system.