AI model proactively predicts if a COVID-19 test might be positive or not
A study from Florida Atlantic University’s College of Engineering and Computer Science using machine learning provides new evidence in understanding how molecular tests versus serology tests are correlated, and what features are the most useful in distinguishing between COVID-19 positive versus test outcomes.
Researchers from the College of Engineering and Computer Science trained five classification algorithms to predict COVID-19 test results. They created an accurate predictive model using easy-to-obtain symptom features, along with demographic features such as number of days post-symptom onset, fever, temperature, age and gender.
The study demonstrates that machine-learning models, trained using simple symptom and demographic features, can help predict COVID-19 infections. Results, published in the journal Smart Health, identify the key symptom features associated with COVID-19 infection and provide a way for rapid screening and cost-effective infection detection.
For the study, researchers used test results from 2,467 donors, each tested using one or multiple types of COVID-19 tests, which were collected as the testbed. They combined symptoms and demographic information to design a set of features for predictive modeling using the five types of machine-learning models. By cross checking test types and results, they examined the correlation between serology and molecular tests. For test outcome prediction, they labeled the 2,467 donors as positive or negative by using their serology or molecular test results and created symptom features to represent each donor for machine learning.
By using created bin features, combined with the five machine-learning algorithms, these predictive models achieved more than 81 percent AUC scores (Area under the ROC Curve, which provides an aggregate measure of performance across all possible classification thresholds), and more than 76 percent classification accuracy.
The five machine learning models used by the researchers are Random Forest, XGBoost, Logistic Regression, Support Vector Machine (SVM) and Neural Network. They compared performance by using three performance metrics: Accuracy, F1-score and AUC.