New AI technique significantly boosts Medicare fraud detection

Feb. 12, 2024
Study explores ‘vast sea’ of big Medicare data, parts B and D.

New research from the College of Engineering and Computer Science at Florida Atlantic University pinpoints fraudulent activity in the “vast sea” of big Medicare data. Since identification of fraud is the first step in stopping it, this novel technique could conserve substantial resources for the Medicare system.

For the study, researchers systematically tested two imbalanced big Medicare datasets, Part B and Part D.  Part B involves Medicare’s coverage of medical services like doctor’s visits, outpatient care, and other medical services not covered under hospitalization. Part D, on the other hand, relates to Medicare’s prescription drug benefit and covers medication costs. These datasets were labeled with the List of Excluded Individuals and Entities (LEIE). The LEIE is provided by the United States Office of the Inspector General.

Researchers delved deep into the influence of Random Undersampling (RUS), a straightforward, yet potent data sampling technique, and their novel ensemble supervised feature selection technique. RUS works by randomly removing samples from the majority class until a specific balance between the minority and majority classes is met.

The experimental design investigated various scenarios, ranging from using each technique in isolation to employing them in combination. Following analyses of the individual scenarios, researchers again selected the techniques that yielded the best results and performed an analysis of results between all scenarios.

Results of the study, published in the Journal of Big Data, demonstrate that intelligent data reduction techniques improve the classification of high imbalanced big Medicare data. The synergistic application of both techniques – RUS and supervised feature selection – outperformed models that utilize all available features and data. Findings showed that either combination of using the feature selection technique followed by RUS, or using RUS followed by the feature selection technique, yielded the best performance.

Consequently, in the classification of either dataset, researchers discovered that a technique with the largest amount of data reduction also yields the best performance, which is the technique of performing feature selection, then applying RUS. Reduction in the number of features leads to more explainable models and performance is significantly better than using all features. 

For feature selection, researchers incorporated a supervised feature selection method based on feature ranking lists. Subsequently, through the implementation of an innovative approach, these lists were combined to yield a conclusive feature ranking. To furnish a benchmark, models also were built utilizing all features of the datasets. Upon the derivation of this consolidated ranking, features were selected based on their position in the list.

For both Medicare Part B and Part D datasets, researchers conducted experiments in five scenarios that exhausted the possible ways to utilize, or omit, the RUS and feature selection data reduction techniques. For both datasets, researchers found that data reduction techniques also improve classification results.

FAU release