In cooperation with the U.S. National Science Foundation National Radio Astronomy Observatory (NSF NRAO), this project builds upon previous work in developing a machine learning pipeline to streamline proposal submissions for astronomers using the Atacama Large Millimeter Array (ALMA). Researchers Riya Pulla, Krishna Kumar, and Carter Day undertook a capstone project — part of the Master's in Data Science Residential program — that explores improvements to the ALMA proposal classification model.

They aimed to enhance the accuracy and usability of frequency range recommendations provided to astronomers and create a user interface for scientists to use.

Analytically, the team focused on optimizing the topic modelling segment of the existing pipeline by incorporating astroBERT, an astrophysics semantic language model, and comparing various clustering techniques to find which model provided the best average measurement hit rate per project.

They found that performance does not necessarily show improvement when incorporating astroBERT - the best combination features Latent Dirichlet Allocation and Spectral Clustering, with an average hit rate of 96 percent.

The team also successfully developed an interactive exploratory dashboard allowing researchers to visualize key aspects of their proposal such as topic clusters, commonly observed frequencies, and associated chemicals of scientific interest.

Researchers: Carter Day, Krishna Kumar, Riya Pulla

Sponsor: Adele Plunkett

Faculty Advisor: Aidong Zhang

Completed in:
2025
Category: