Covering a broad range of ancestries with genetic testing

Aug. 25, 2021

Clinical laboratories have never been more important in the medical decision-making process. From novel therapies that can only be prescribed to patients with certain genetic variants to increasing interest in family planning based on actionable clinical data, the demand for accurate and reliable genetic tests continues to grow.

But the development of such genetic tests has been limited by several factors. The content for many tests is shaped by genomic databases populated almost entirely by information from people of European descent. For example, a 2009 evaluation revealed that 96% of 1.7 million genome-wide association study samples were of European ancestry; over the next 7 years, the number of samples grew more than 20-fold, yet the proportion from African and Hispanic or Latin American ancestry increased by just 2.5% and 0.5%, respectively.1-2 Inconclusive or erroneous results are far more common when tests designed using data from one ancestry are applied to others. Technological shortcomings have also made it difficult to access clinically important information, such as the ability to phase variants to determine whether a patient has a copy of a certain variant on both alleles or two copies on one allele. In addition, standard sequencing technologies cannot represent complex disease-causing structural variants accurately.

As more healthcare decisions are guided by data produced by clinical lab experts, it is essential to overcome these issues to ensure that all patients and their physicians have access to the most relevant and accurate information. Thanks to a combination of increasingly diverse genomic data and technical advances, clinical labs have more opportunity than ever to improve the quality and applicability of the tests they offer.

Two examples illustrate these advances nicely: spinal muscular atrophy (SMA) and cystic fibrosis (CF), the only two genetic diseases for which universal carrier screening is currently recommended regardless of a parent’s ethnicity. Ultimately, these examples provide a framework for how thousands of other molecular tests might be improved for use in a broader population of carriers and patients. This is especially important for public health, since more than 70% of the roughly 7,000 rare diseases have a genetic basis and collectively, nearly 1 in 10 people in the US are affected by a rare disorder.

Spinal muscular atrophy

SMA is a rare and debilitating disease of the central nervous system and a leading cause of infant death. The introduction of life-saving therapies within the last five years has transformed not only the prognosis for patients but also the utility of genetic testing for carrier screening and diagnosis.

Detecting pathogenic variants that cause SMA is technically challenging. There are copy number changes, single nucleotide variants, and insertion-deletion variants than can identify patients with SMA, as well as couples at risk for passing on the disease through recessive inheritance. Conventional genetic testing technologies often fail to detect the full spectrum of variant types.

Another complicating factor in SMA screening is identifying so-called silent carriers. Typically, carrier screening tests for SMA do a simple count of the number of SMN1 genes in an individual. People with the disease have no copies, whereas carriers typically have one copy, leaving one allele without any copies. However, as scientists developed tools that could accurately phase genes into maternal and paternal haplotypes, they discovered that some people have two copies of the SMN1 gene on the same allele, and no copies on the other allele. While a standard carrier screen would tally the two copies and find the individual at no risk for passing on the disease, in reality, that person is just as much a carrier as someone with only one copy of the SMN1 gene.

The inability to phase variants meant that clinical labs had been missing silent carriers for years, and unfortunately, giving them inaccurate risk information for having an affected child. Even worse, this problem has not been evenly distributed. The silent carrier genotype is several times more common among people of African ancestry than other groups, such as those with European heritage.

In addition, SMA tests can be difficult to perform by the lab. The most commonly used approach, based on MLPA technology, takes days to generate results. It can also lack consistency across labs when distinguishing between copy numbers for genes associated with SMA.3

Recent improvements have made SMA testing faster, more accurate, and more representative for patients of all ancestries. By incorporating technology that can detect not only gene copy number but also silent carrier-linked gene duplication events, screening for SMA can now be performed at higher throughput with results that better capture the full range of pathogenic variants in all populations.4 These newer tests can be run with blood samples or buccal swabs, which also expands their accessibility and utility, particularly for carrier screening.

Cystic fibrosis

Another example comes from CF, a serious autosomal recessive genetic disorder that affects several organs and leads to frequent lung infections and difficulty breathing. Like SMA testing, CF screening is needed both to diagnose patients with the condition and to identify carrier couples at risk of transmitting the disease to their kids.

CF testing has been around for more than two decades, and knowledge of CF-causing mutations in the CFTR gene has relied heavily on the CFTR2 database. This database is the gold standard for linking genotype with phenotype in CF, with data from nearly 90,000 patients and more than 350 pathogenic variants. Like so many genetic databases, CFTR2 has been populated with information collected almost entirely from one ancestral group — 95% of patients represented are generally of European descent.

Conventional screening tests for CF are targeted panels made up of the variants reported most frequently in these repositories, making them excellent at capturing risk among people of European ancestry. For other populations, though, conventional CF screening can miss the most important variants. An analysis of CF screening in non-European populations found that people of Hispanic, African, or Asian ancestry were more likely to get negative results from typical tests. The same study also identified several novel variants in those populations that appear to be associated with CF carrier status.5

For clinical laboratories serving diverse patient populations, the standard CF screening is a suboptimal approach. Fortunately, the availability of new information — rather than development of novel technologies — has significantly improved this situation.

One study compared standard screening to a sequencing-based approach in which researchers sequenced all coding bases in the CFTR gene.6 The population studied was not only large, with more than 115,000 individuals, but also representative of the diverse U.S. demographic. Caucasians made up just over half of the cohort, and groups of African, Latin American, or Asian descent made up at least 10% each. With such a heterogeneous cohort, the scientists reported more than 200 variants that were pathogenic or likely pathogenic across various ancestral groups, including many that were underrepresented in CFTR2.

By incorporating the most common variants for each ancestry from this study, the global gnomAD (Genome Aggregation Database) repository,7 and other representative sources, it is now possible to redesign or update CF tests, so results are relevant for much more diverse patient populations. This will improve screening accuracy and expand the benefits of actionable information to more people.


The examples of CF and SMA demonstrate that conventional genetic tests run in clinical labs can be significantly improved using technologies that are already available. In both cases, the incorporation of more robust data — especially data from studies that better represent the target testing population — is essential for generating more reliable and relevant results for all individuals. With the increasing availability of large-scale sequencing studies from worldwide populations and the advent of powerful long-read sequencing methods that can untangle challenging DNA sequences and variants involved in established and emerging diseases, we should rethink how genetic tests are designed and used. We are now armed with the right resources to ensure that molecular tests are thoughtfully designed for people of all ancestries. In some cases, more representative data will be enough to expand the value of testing. In others, the adoption of more informative technologies — such as those that make it possible to phase variants or resolve complex sequences — may be needed. As studies uncover more pathogenic variants in more populations, those newer technologies can scale to a plethora of genetic tests and should represent a useful investment for clinical labs.


  1. Need AC, Goldstein DB. Next generation disparities in human genomics: concerns and remedies. Trends Genet. 2009 Nov;25(11):489-94. doi: 10.1016/j.tig.2009.09.012. PMID: 19836853.
  2. Popejoy AB, Fullerton SM. Genomics is failing on diversity. Nature. 2016 Oct 13;538(7624):161-164. doi: 10.1038/538161a. PMID: 27734877; PMCID: PMC5089703.
  3. Schorling DC, Becker J, Pechmann A, Langer T, Wirth B, Kirschner J. Discrepancy in redetermination of SMN2 copy numbers in children with SMA. Neurology. 2019 Aug 6;93(6):267-269. doi: 10.1212/WNL.0000000000007836. PMID: 31235659
  4. Milligan JN, Larson JL, Filipovic-Sadic S, Laosinchai-Wolf W, et al. Multisite evaluation and validation of a sensitive diagnostic and screening system for spinal muscular atrophy that reports SMN1 and SMN2 copy number, along with disease modifier and gene duplication variants. J Mol Diagn. 2021 Mar 30:S1525-1578(21)00069-6. doi: 10.1016/j.jmoldx.2021.03.004.
  5. Schrijver I, Pique L, Graham S, Pearl M, et al. The spectrum of CFTR variants in nonwhite cystic fibrosis patients: implications for molecular diagnostic testing. J Mol Diagn. 2016 Jan;18(1):39-50. doi: 10.1016/j.jmoldx.2015.07.005. PMID: 26708955.
  6. Beauchamp KA, Johansen Taber KA, Grauman PV, Spurka L, et al. Sequencing as a first-line methodology for cystic fibrosis carrier screening. Genet Med. 2019 Nov;21(11):2569-2576. doi: 10.1038/s41436-019-0525-y. Epub 2019 Apr 30. Erratum in: Genet Med. 2019 May 15. PMID: 31036917; PMCID: PMC6831513.
  7. Karczewski KJ et al. (2020) The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 581: 434-443. doi: 10.1038/s41586-020-2308-7. PMID: 32461654