While it has long been speculated that the number of COVID-19 cases is significantly higher than those reported, a new machine learning algorithm created by UT Southwestern Medical Center researchers from the Lyda Hill Department of Bioinformatics confirms this theory.
According to the algorithm, over 71 million people in the U.S. have contracted the virus. This number is almost three times as much as the 26.7 million publicly-reported number of confirmed cases, according to Jungsik Noh, Ph.D., a UT Southwestern assistant professor in the Lyda Hill Department of Bioinformatics.
Noh is the first author of a new study on the algorithm, which was published on February 8 in PLOS ONE.
Although Noh’s machine learning algorithm offers only rough estimates due to the uncertainties surrounding the coronavirus, Noh believes the model’s estimates have a higher accuracy and miss fewer cases than the confirmed ones currently used as guidance for public health policies, according to a statement.
“The estimates of actual infections reveal for the first time the true severity of COVID-19 across the U.S. and in countries worldwide,” Noh said in a statement.
Over time, the number of cases based on the algorithm and the reported cases have grown closer in number, but are still significantly different. For example, the algorithm estimates that Brazil had over 36 million cumulative cases on February 4, which is around four times more than its 9.4 million confirmed cases.
Funding for the project came from Dallas-based Lyda Hill Philanthropies, which funds advances in science and nature, empowers nonprofits, and works to improve Texas and Colorado communities.
A personal project
The creation of the algorithm came from a personal place for Noh.
During summer 2020, he was considering whether he should send his sixth-grade daughter back to school in person, but couldn’t find the data he needed to determine if it would be safe to do so.
With his background in working on statistical methods for biomedical data, he decided to build a machine learning algorithm himself. After finding that the area he lived in had around a 1 percent COVID-19 infection rate at the time, Noh decided to send his daughter back to school.
How it works
The algorithm is based on the number of reported deaths, which is said to be more accurate and complete than the number of lab-confirmed cases, and uses the infection fatality rate of 0.66 percent, based on an earlier study of the pandemic in China.
Other factors such as the average number of days from the onset of symptoms to death or recovery are also considered within the algorithm, according to a statement.
To confirm his findings, Noh compared his results to existing prevalence rates found in other studies that used blood tests to check for antibodies to the SARS-CoV-2 virus, which causes COVID-19. His algorithm’s estimates of infections turned out to be similar to the percentage of people who tested positive for the antibodies.
“The currently infected population is the cause of future infections and deaths,” Noh said. “Its actual size in a region is a crucial variable required when determining the severity of COVID-19 and building strategies against regional outbreaks.”
Daily estimated updates of the current total infections and how many people are currently infected across the U.S. and in the 50 countries hit hardest by the ongoing pandemic based on the algorithm can be found here.
Noh’s online model uses COVID-19 death data from Johns Hopkins University and The COVID Tracking Project to run its daily updates.
Gaudenz Danuser, Ph.D., chair of the Lyda Hill Department of Bioinformatics and professor of cell biology, was the study’s senior author.
Get on the list.
Dallas Innovates, every day.
Sign up to keep your eye on what’s new and next in Dallas-Fort Worth, every day.