Finally, machine learning interprets gene regulation clearly
The algorithms are a type of artificial neural network (ANN). Inspired by the way neurons connect and branch in the brain, ANNs are the computational foundations for advanced machine learning. And despite their name, ANNs are not exclusively used to study brains.
Biologists, like Tareen and Kinney, use ANNs to analyze data from an experimental method called a “massively parallel reporter assay” (MPRA) which investigates DNA. Using this data, quantitative biologists can make ANNs that predict which molecules control specific genes in a process called gene regulation.
Cells don’t need all proteins all the time. Instead, they rely on complex molecular mechanisms to turn the genes that produce proteins on or off, as needed. When those regulations fail, disorder and disease usually follow.
“That mechanistic knowledge — understanding how something like gene regulation works — is very often the difference between being able to develop molecular therapies against diseases, and not being able to,” Kinney said.
Unfortunately the way standard ANNs are shaped from MPRA data is very different from how scientists ask questions in the life sciences. This misalignment means that biologists find it difficult to interpret how gene regulation occurs.
Now, Kinney and Tareen developed a new approach that bridges the gap between computational tools and how biologists think. They created custom ANNs that mathematically reflect common concepts in biology concerning genes and the molecules that control them. In this way, the pair are essentially forcing their machine learning algorithms to process data in a way that a biologist can understand.
These efforts, Kinney explained, highlight how modern, industrial AI technologies can be optimized for use in the life sciences. Having verified this new strategy to make custom ANNs, Kinney’s lab is applying it in investigating a wide variety of biological systems, including key gene circuits involved in human disease.