Research team creates statistical model to predict COVID-19 resistance
Researchers from Johns Hopkins Medicine and The Johns Hopkins University have created and preliminarily tested what they believe may be one of the first models for predicting who has the highest probability of being resistant to COVID-19 in spite of exposure to SARS-CoV-2, the virus that causes it.
The study is reported online in the journal PLOS ONE.
For its study, the research team set out to determine if a machine-learning statistical model could use health characteristics stored in electronic health records — providing patient data such as comorbidities (other medical conditions) and prescribed medications — as a means to pinpoint people with a natural ability to avoid SARS-CoV-2 infection. Those persons, says Yang, could then be studied to better understand the factors enabling their resistance.
To demonstrate the model’s ability to predict COVID-19 resistance, the researchers first acquired data from a clinical registry called the Johns Hopkins COVID-19 Precision Medicine Analytics Platform Registry (JH-CROWN). The registry contains information for patients seen within the Johns Hopkins Health System who have been suspected of, or confirmed as, having a SARS-CoV-2 infection.
For their resistance study, the researchers only included individuals who received a COVID-19 test between June 10, 2020, and Dec. 15, 2020, and who reported “potential exposure to the virus” as the reason for testing.
The ending date was the point at which large-scale COVID-19 vaccination efforts started in the United States. Choosing this date, the researchers say, enabled them to avoid the effects on their findings of vaccines preventing infection rather than natural resistance.
The 8,536 study participants who reported exposure as their reason for getting COVID tested were divided into two groups: those who did not share a residence (called a “household” in this study) with any COVID-19 patients or their residence had 10 or more patients; and those who shared a residence with 10 or fewer people, with at least one being a COVID-19 patient. The first group, with 8,476 of the participants, was designated as the Training and Testing Set, while the second group, called the Household Index (HHI) Set, had 60 members, and was used as a separate testing set.
Keeping the household number to 10 or fewer, the researchers say, excluded people living in apartment complexes, dormitories and other higher-density, multi-unit living areas where exposure to a particular person positive for SARS-CoV-2 would be less intense.
To identify patterns and cluster participants so that those naturally resistant to SARS-CoV-2 stand out, both study sets were analyzed using the Maximal-frequent All-confident pattern Selection Pattern-based Clustering (MASPC) algorithm. MASPC is specifically designed for electronic health record data analysis that combines patient demographic information (age, sex and race), the International Statistical Classification of Diseases and Related Health Problems (ICD) medical diagnostic codes relevant to each case, outpatient medication orders and the number of comorbidities (other diseases) present.