Researchers at UT Southwestern Medical Center in Dallas have developed an artificial intelligence (AI) method that writes its own algorithms and may one day operate as an “automated scientist” to extract meaning from complex datasets.
“Our work is the first step in allowing researchers to use AI to directly convert complex data into new human-understandable insights,” said Milo Lin, Ph.D., assistant professor in the Lyda Hill Department of Bioinformatics, Biophysics, and the Center for Alzheimer’s and Neurodegenerative Diseases at UT Southwestern.
He noted that while researchers are increasingly employing AI and machine learning models, “these high-performing models provide limited new direct insights into the data.”
Lin co-led the study, published in Nature Computational Science, with first author Paul J. Blazek, M.D., Ph.D., who worked on this project as part of his thesis work while he was at UT Southwestern.
Building better neural networks
According to UTSW, the field of AI has exploded in recent years, with significant crossover from basic and applied scientific discovery to popular use.
One common branch of AI, known as neural networks, emulates the structure of the human brain by mimicking the way biological neurons signal one another, the university said. Neural networks are a form of machine learning, which creates outputs based on input data after learning on a “training” dataset, the school noted.
Lin said that although this tool has found significant use in applications such as image and speech recognition, conventional neural networks have significant drawbacks.
Most notably, they often don’t generalize far beyond the data they train on, and the rationale for their output is a “black box,” meaning researchers don’t have a way to understand how a neural network algorithm reached its conclusion.
UTSW said the study was supported by its High Impact Grant Program, which was started in 2001 and supports high-risk research offering high potential impact in basic science or medicine.
What is deep distilling?
Seeking to address both issues, the researchers said they developed a method called deep distilling.
According to UTSW, deep distilling automatically discovers algorithms or the “rules” to explain observed input-output patterns in limited training data—datasets used to train machine learning models.
That’s accomplished by training an essence neural network (ENN), previously developed in the Lin Lab, on input-output data. The parameters of the ENN that encode the learned algorithm are then translated into succinct computer codes so users can read them, the university said.
Testing an ‘automated scientist’
Researchers said they tested deep distilling in a variety of scenarios in which traditional neural networks cannot produce human-comprehensible rules and have poor performance in generalizing to very different data.
Those included cellular automata, in which grids contain hypothetical cells in distinct states that evolve over time according to a set of rules. That’s often used as model systems for emergent behavior in the physical, life, and computer sciences, UTSW noted.
The school said that despite the grid the researchers used having 256 possible sets of rules, deep distilling was able to “learn” the rules for correctly predicting the hypothetical cells’ behavior for every set of rules after only seeing grids from 16 rule sets, summarizing all 256 rule sets in a single algorithm.
In another test, researchers trained deep distilling to accurately classify a shape’s orientation as vertical or horizontal.
According to UTSW, the method only needed a few training images of perfectly horizontal or vertical lines. However, it was able to use the short algorithm it found to correctly solve much more complicated cases, such as patterns with multiple lines or gradients and shapes made of boxes and zigzag, diagonal, or dotted lines.
Eventually, Lin said, deep distilling could be applied to the vast datasets generated by high-throughput scientific studies, such as those used for drug discovery, and act as an “automated scientist” — capturing patterns in results not easily discernible to the human brain, such as how DNA sequences encode functional rules of biomolecular interactions.
UT Southwestern said that deep distilling potentially could serve as a decision-making aid to doctors, offering insights on its “thought process” through the generated algorithms.
Get on the list.
Dallas Innovates, every day.
Sign up to keep your eye on what’s new and next in Dallas-Fort Worth, every day.