Data Science News

Using Neural Networks to Investigate the Spread of Disease

September 18, 2019
Eli Draizan and Jack Lanchantin
Presidential Fellows Jack Lanchantin and Eli Draizen 

Malaria affects between 300 and 600 million people every year. UVA doctoral students Eli Draizen and Jack Lanchantin want to help change that. 

Their research started with questions about disease pathways. How do diseases, such as malaria, enter and attack the human body? Is it possible to identify the pathways these disease agents (pathogens) take and stop them? 

Before teaming up with Lanchantin, Draizen, a Ph.D. candidate in Biomedical Engineering, had already been studying pathogen and cell relations, specifically in a Toxoplasma Gondii lab. 

In a Toxo-what lab?

Toxoplasma Gondii (taak·sow·plaz·muh gon·dee), although a mouthful of a name, is a microscopic single-cell parasite. However, this small pathogen is correlated with a deadly, widespread disease—malaria. 

“Malaria is caused by a parasite called Plasmodium falciparum, which diverged from T. gondii  about 350 million years ago,” Draizen explained. “Since they share a common evolutionary history, learning about T. gondii will help us understand more about malaria even though they are not the same parasite.” 

Studying Parasites 

“In a cell, there are hundreds of thousands of protein molecules floating around in a very crowded environment,” says Draizen. “However, only some of the proteins will interact with each other. These interactions are the basis of many functions of a cell—to respond to the environment, produce energy, and control the cell cycle, etc.—and if any of the interactions happen incorrectly, it could lead to disease.”

With a better understanding of this invasion process through studying T. gondii, a relative of the parasite which causes malaria, scientists hope to gain a better grasp on how this disease that kills millions of people actually starts, targets, and enters human cells.

“We have to look at this parasite [T. Gondii] to understand how malaria operates,” Draizen says. “We are interested in interactions that happen on the surface of cells. We hypothesis that these parasites have proteins that look like human proteins and ‘trick’ the human proteins to interact with them instead of their normal interaction partner. We are trying to develop new algorithms to identify two proteins that are going to interact with each other.”

Presidential Fellows in Data Science

Draizen and Lanchantin’s project is one of six projects funded by the Presidential Fellows in Data Science Program through the School of Data Science and the Office of Graduate and Postdoctoral Affairs. The fellowship provides training and funding for graduate students across a variety of fields of study to collaborate on projects that address real-world problems using traditional research methods alongside cutting-edge data science tools and techniques.

Draizen has experience in biomedical research through his work as a Postbaccalaureate Fellow at the National Center for Biotechnology Information, a Graduate Student Intern at Harvard Medical School, and a Predoctoral Fellow at the National Institutes of Health. His experience is a great fit for this presidential fellow’s project, which requires an understanding of specific pathogens and their interactions with cells. 

As Draizen began the process of identifying patterns among pathways that pathogens take, he also began teaching himself how to develop algorithms predicting which proteins will interact. He soon realized that he wanted to work with someone who knew this material well.

That is where Jack Lanchantin comes in. 

Lanchantin is a Ph.D. candidate in Computer Science at the University of Virginia. As a 5th year doctoral student, he has studied computational models extensively, honing in on his passion for attention methods for deep learning, as well as interpretable machine learning models for biomedical data. Lanchantin has been a part of numerous research projects, many of which have been published including “Memory Matching Networks for Genomic Sequence Classification,” “Neural Message Passing for Multi-Label Classification,” and many more. 

Lanchantin’s interest in applying computer science to issues in the biomedical field sparked an interest in teaming up with Draizen.  

“Our lab has been focusing on these biomedical machine-learning methods, different applications, but we never really understood the tasks well from just a computer science background,” explained Lanchantin.

With Draizen looking for someone with expertise in developing algorithms and Lanchantin wanting to apply machine-learning methods to biomedical data, the two make a perfect team for a project that overlaps into both domains.

Data Engineering 

The initial step is an extensive data searching and cleaning process on the biomedical side, facilitated by Draizen.

“We are collecting the data by taking known protein-protein interactions—crystal structures that are found in the protein data bank, which is the database with every known protein structure out there,” Draizen said. “We can mine that information to find any patterns in how two proteins bind.” 

After this extensive data mining and cleaning step, which Draizen noted was a time-consuming process, Lanchantin steps in to initiate the computer science side of the project. 

“The next step is to see if we can come up with a computational model that can accurately predict where two proteins will interact,” Lanchantin stated. 

To do this, Draizen and Lanchantin are using graph neural networks.

Graph Neural Networks

“Graph Neural Networks is a general class of methods that are used to classify data that have no spatial structure to them,” Lanchantin explained. 

This process allows proteins to be generalized as graphs. Lanchantin described these graphs as having various nodes, each of which identify a residue in a protein. 

“It’s not like a typical image where you have this spatial structure or linear sequence,” Lanchantin said. “It [Graph Neural Networks] is basically trying to classify structures that don’t have this smooth spatial structure.”

Graph Neural Networks are used in other areas of research, too. Lanchantin noted that this is the method most often used to classify and analyze social nodes and networks. These two doctoral researchers decided to take this approach and apply it to their research with proteins and pathogens.

Goals and Potential Outcomes

If Draizen and Lanchantin can validate that this method works, the potential impacts of this project are significant. Their research could make it possible to accurately identify the pathways that malaria parasites are targeting. Scientists, doctors, pharmacists, and researchers could then fully understand how the malaria pathogen finds healthy host cells and binds with them. 

With this process identified and interpreted, it could open the doors for the development of a drug capable of inhibiting these pathways. 

While it is too early to know the results and outcomes from their work together, Draizen and Lanchantin each bring expertise from their own areas of research to this interdisciplinary  collaboration. Together, they have begun a project that has the potential to not only better understand malaria, but also possibly develop a drug that can inhibit the disease, thus changing millions of lives.