No two people have the identical genome sequence. If we compare a sequence region such as a gene from multiple people, however, there is a “consensus sequence,” which consists of the statistically predominant (by far) nucleotide at each location. On a simple level, these consensus sequences form the reference genome for comparison against individual genomic data. In preparing these consensus sequences, however, it becomes apparent that scattered about the human genome are points—single nucleotides—where one particular nucleotide is not the clear predominate. That is, one or more of the other three nucleotides are found at that location at more than random frequency. Because of this relatively high frequency of occurrence, this is referred to as a polymorphism rather than a mutation and is known as a single nucleotide polymorphism (SNP).
The significance and utility of SNPs
Some SNPs have a direct or even pathogenic impact on coding sequences, but many do not—either because they are outside of coding regions, or because they are silent mutations (ones that do not change amino-acid sequence) when in coding regions. Even without direct mechanistic impact, SNPs can be a useful tool in molecular diagnostics applications in cases where one is in close physical proximity on a chromosome to a gene of interest (close linkage, in genetic parlance). This close linkage means that the gene and that SNP will in most cases stay together during genetic recombination and passage to the next generation. This becomes particularly useful if in the distant past an individual developed a clinically meaningful spontaneous mutation in the closely linked gene.
Thus, either in the case of an SNP with direct mechanistic impact on a gene or one in close linkage to a mutation which has spread in a population, analysis of the form of an SNP present in an individual can provide clinically relevant information on the directly impacted or linked gene. In either case, a particular form of the relevant SNP serves as an easily assayable marker for the associated genotype. Such SNPs are very common in the human genome, and browsing through one of the SNP association databases such as dbSNP (http://www.ncbi.nlm.nih.gov/SNP/) or SNPedia (http://snpedia.com/index.php/SNPedia) turns up literally thousands of SNPs with known associations to phenotypes.
(A word of caution, however: particularly in the case of SNPs linked to a gene allelic form, the linkage is not absolute, and dissociation of the SNP from the linked allele can occur, with increasing frequency as the SNP and gene are further apart. In addition, phenotypes can arise from complex polygenic traits, where an SNP only relates to one factor in the final phenotype. Thus many exotic SNP linkages like “perfect pitch” Rs3057 have some meaningful statistical association with the end phenotype but are by no means by themselves a sure indicator. An understanding of the statistical significance of a particular SNP is therefore crucial to appropriate interpretation.)
With that prelude, it’s the “easily assayable” nature of SNPs which is the focus of this month’s article. While array methods covered in one of last year’s installments of The Primer (September 2013, Volume 45, No. 9) can provide information on literally tens of thousands of SNPs in a single individual sample, there are simpler, faster, and arguably easier methods to interrogate the status of small numbers of SNPs at one time in a diagnostic setting. We’ll examine a few of the more common clinically employed methodologies for this smaller-scale application.
SNP detection methods
The first methodology is based on allele-specific polymerase chain reaction, where a PCR product is formed only from a particular SNP form. Consider, for example, an SNP where the nucleotide is found as either a “T” in association with a “normal” phenotype, or a “G” in association to a pathogenic state. An approach known by a number of names, including Amplification Refractory Mutation Specific PCR (ARMS-PCR), is effectively based around a normal PCR reaction, either endpoint or real-time analyzed, where one of the primers is designed such that its 3′ nucleotide is directly across from the SNP location. The terminal nucleotide will anneal only (and thus allow for polymerase extension) to the SNP form of interest. In our current example we might design this primer to end with “C,” so a PCR product will be produced only when the template material has the pathogenic associated “G” SNP genotype. Generally we’d also make the alternate “A” terminated primer and run that in a parallel PCR reaction, allowing for the determination of whether a sample is homozygous “T”/wild type (only “A” primer reactions give product); homozygous “G”/pathogenic (only “C” primer reactions give product); or heterozygous (both reactions yield product).
In practice, it is sometimes found that the “wrong” primer can still allow amplification occasionally, leading to rare false-positive calls. This situation can arise in the event of a tautomeric shift (transient rearrangement of a nucleotide base structure, allowing for non-canonical pairing) right as the polymerase comes by. Incorporation of intentionally designed destabilizing primer/template mismatches near the 3′ primer end can greatly reduce this.
Another approach can be used in probe-based real-time PCR strategies. Here, the primers are designed to flank the SNP location and amplify the region regardless of the SNP allele. For a 5′-fluorogenic exonuclease probe system (e.g., “TaqMan”) differentiation is done by having two alternatively labelled probes in the reaction. Each probe exactly overlays the SNP location near its midpoint, and through careful probe design the correctly matching probe will show significantly better hybridization than its mismatched pair. The better hybridized probe then leads to stronger signal generation, and allows direct determination of heterozygous or either homozygous state by analysis of which fluorescent label appears as the reaction proceeds.
For hybridizing (non-nuclease) probes, the approach is similar but is based on a probe melt temperature determination at the end of the PCR reaction. If the probe is a perfect match to its hybridizing sequence over the SNP, it will have a higher melting temperature (Tm) by as much as 1°C to 3°C over amplicon, which mismatches the probe at the SNP location. Heterozygosity is detected as an intermediate melting behavior between the two homozygous conditions. (The net fluorescence signal observed is the aggregate of wild type and mutant allele signals.) Note that while this approach can use a single probe in a single reaction, it only provides data that the probe exactly matched, or didn’t exactly match, the SNP region and does not strictly prove the identity of any mismatch reducing the probe Tm. In practice, however, it is generally a safe assumption that an observed mismatch arises from the SNP expected; the probes are quite short and the likelihood of a completely different mutation occurring under one next to your SNP is small.
Binding dye-based real-time PCR methods
Not to be left behind by their probe-based brethren, binding dye-based real-time PCR methods can also be applied to targeted SNP analysis. Here, the most common approach is based around a technique known as high-resolution melt or HRM. Again, this employs primers which flank the SNP location of interest and traditional PCR to amplify the region regardless of the SNP allele. At the end of the PCR cycling, a melt curve is collected much as it would be in a traditional binding dye-detected real-time PCR. Following the last extension cycle, the reaction is cooled to roughly 60°C, and the temperature is gradually raised. As the rising temperature causes dissociation of the PCR amplification products, the double strand-specific binding dye decreases fluorescence, and in effect a “titration curve” is collected with distinct stepwise drops in fluorescence occurring as the reaction temperature passes the Tm of each product. Tiny differences in sequence, including the particular nucleotide pair present at the SNP location near the middle of the amplicon, can make small characteristic differences in the Tm and in the exact shape of this denaturation curve.
HRM differs from traditional binding dye real-time PCR in several ways. First, it differs in the choice of dye; specialized dyes other than SYBR Green provide better resolution of the melt curve. Second, HRM generally uses a slower ramp speed at which the sample temperature is raised to allow discrimination of closer temperature data points along the curve. Third, HRM protocols employ software with algorithms to correct for well-to-well variations such that the beginnings of key inflection points in the melt curve appear synchronized between samples, allowing for easier calling of samples which demonstrate a curve shape alteration partway through the melting process.
When run with control samples of each homozygous SNP genotype, HRM allows for the calling of homozygous or heterozygous sample SNP status by comparison of curve shapes. (Heterozygous samples in particular tend to lead to two amplicons with the SNP base differing between them. These amplicons generate some cross-hybridized products with a resulting single-base mispairing in their middle, and generally melt slightly before either of the homozygous products). For labs wishing to do HRM methods now, specialized real-time PCR machines (or modifications to existing machines), along with turnkey software packages or add-in modules intended for this purpose, are available.
Like the probe-based methods described above, HRM does not explicitly confirm the identity of the SNP nucleotide; other mutations either at the SNP or elsewhere along the amplicon can lead to a distinguishably different melt curve than the controls. However, unlike in probe-based methods, HRM may often provide a hint that variants are not as expected, through minor differences in curve shape. It can thus serve to flag unusual specimens in need of more in-depth analysis, as well as providing calls on known SNP loci.
The idea of more in-depth analysis leads to the obvious question of when individual SNP calling will be replaced by next-generation sequencing (NGS) methods. At present, when only one or a few SNPs in a sample are of interest, both NGS and directed region-specific sequencing methods are much slower and more costly than a simple PCR reaction of one of the forms described here. As larger numbers of SNPs are of interest, SNP arrays are also an option. As NGS methods continue to come down in price and time to result, and gain more bioinformatics support, they will begin to supplant these simpler methods. For the near future, however, and for labs already having a real-time PCR instrument, simple methods will remain of utility and will be encountered by the molecular laboratorian.