275 million new genetic variants identified in NIH precision medicine data

Feb. 21, 2024
All of Us Research Program.

Researchers have discovered more than 275 million previously unreported genetic variants, identified from data shared by nearly 250,000 participants of the National Institutes of Health’s All of Us Research Program.

Half of the genomic data are from participants of non-European genetic ancestry. The unexplored cache of variants provides researchers new pathways to better understand the genetic influences on health and disease, especially in communities who have been left out of research in the past. The findings are detailed in Nature, along with three other articles in Nature journals.

Nearly 4 million of the newly identified variants are in areas that may be tied to disease risk. The genomic data detailed in the study are available to registered researchers in the Researcher Workbench, the program’s platform for data analysis.

To date, more than 90% of participants in large genomics studies have been of European genetic ancestry. NIH Institute and Center directors noted in an accompanying commentary article in Nature Medicine that this has led to a narrow understanding of the biology of diseases and impeded the development of new treatments and prevention strategies for all populations.

In a companion study published in Communications Biology, a research team led by Baylor College of Medicine, Houston, reviewed the frequency of genes and variants recommended by the American College of Medical Genetics and Genomics across different genetic ancestry groups in the All of Us dataset. These genes and variants mirror those in the program’s Hereditary Disease Risk research results offered to participants. The authors found significant variability in the frequency of variants associated with disease risk between different genetic ancestry groups and compared with other large genomic datasets.

In a separate study, investigators tapped the All of Us dataset to calibrate and implement 10 polygenic risk scores for common diseases across diverse genetic ancestry groups. These scores calculate an individual’s risk of disease by taking into account genetic and family history factors. Without accounting for diversity, polygenic risk scores could cause false results that misrepresent a person’s risk for disease and create inequitable genetic tools. Using the diversity of the All of Us data, these polygenic risk scores are applicable to a broader population.

More than 750,000 people have enrolled in All of Us to date. Ultimately, the program plans to engage at least one million people who reflect the diversity of the United States and contribute data from DNA, electronic health records, wearable devices, surveys, and more over time. The program regularly expands and refreshes the dataset as more participants share information. 

NIH release