Identifying Natural Variation

Not all individuals of a species are identical to each other.  Natural variation is a foundation for genetics and evolutionary biologist and is detected and studied using many different methods.  The oldest, most familiar, and still among the best methods is simple but detailed observations of the morphology, structure, and behavior of organisms.   The eminent corn geneticist Barbara McClintock was often described as having a “feeling for the organism” in relation to her plants, a quality that many biologists share whether the organism is corn, E. coli, fruit flies, mice, or any other species that is studied closely.   This detailed and sometimes intuitive observation is a key tool in identifying and understanding biological variation. 

Other methods supplement such direct observations.  For example, agglutination or clotting assays are widely used for studying natural variation in blood types, and various transplantation or grafting assays are used as well in both plants and animals.   To a geneticist, all of these methods reveal different phenotypes—any quality that can be measured by some assay.  The phenotype may have an underlying basis in the genotype or the genome, or it may be due to non-genetic factors.  For example, the length of our hair is a common phenotype with almost no underlying genetic basis; the color and texture of our hair are phenotypes that have both an underlying genetic basis and significant non-genetic components (which can often disguise the genetic basis).

A key to thinking about natural variation is to distinguish between the genetic variation, which is due to the underlying variation in the genome, and non-genetic or environmental variation.  Genetic variation can be transmitted to future generations and provides the foundation for evolutionary change.  The importance and some of the challenges of distinguishing genetic from environmental variation are discussed in more detail in Chapter 10 and Chapter 16.  In the current era, genomes can be sequenced readily and genetic variation observed directly from the DNA sequence.  However, this is a very recent development in the history of studying biological variation.  In this box, we discuss a few of the tools that were used before genome sequencing was common, and that are still valuable methods for observing variation.  These tools were originally developed for other experimental purposes, and their application to looking at natural biological variation came shortly later. 

Protein variation using polyacrylamide gel electrophoresis 

Gel electrophoresis was described in detail in Tool Box 2-1.  In that box, the focus was on separating nucleic acids by size using the charges on their phosphate backbones.  With nucleic acids, the gel matrix is typically made from agarose and the gels are often run horizontally.  Similar principles of electrophoresis can be used to separate polypeptides.  For separating polypeptides, the gel matrix is usually made of polyacrylamide and the gels are often run vertically; the method is called polyacrylamide gel electrophoresis, abbreviated PAGE.  In addition, the charges on a polypeptide chain do not come from the backbone but from the side chains on the amino acids, as shown in Figure 2-14 and 2-15.  Two polypeptides each of 500 amino acids may have different electric charges (known as isoelectric points) because they do not have the same amino acid sequences.  Thus, PAGE can separate polypeptides based not only on their sizes but also on their electric charges, and is very widely used for separating and analyzing polypeptides with many variations for different applications.

PAGE has also been widely used to look at natural variation.  The proteins of different individuals can be isolated and separated based on their sizes and/or charges, and then stained and compared.  If a protein is found at different locations on the gels from two individuals, there must be a change in the amino acid sequence.  From that, it is reasonable to infer that there has been an underlying change in the nucleic acid sequence of the gene encoding that protein.  PAGE alone can usually not determine what the amino acid change has been, but it can reveal that the organisms differ.  PAGE was the method used to show that sickle cell anemia arises from a single amino acid substitution in the beta-globin chain of hemoglobin, for example, which we depict in Figure 16-16.  The phenotype of the disease state (sickle cell anemia) could be correlated with the phenotype using electrophoretic mobility. 

While PAGE has been important for recognizing the extent of natural variation and can be readily applied to all organisms, it has some limitations.  First, it can under-estimate the extent of variation in polypeptides since not all amino acid changes result in changes in electrophoretic mobility on PAGE.  Second, since it examines only changes in polypeptides, it detects only changes in the protein coding regions for genes and not in any other part of the genome.  As discussed in Section 3-6, changes in protein coding regions are only one source of the genomic variation important in evolution. 

DNA sequence changes using restriction endonucleases 

A more direct method to look at the DNA sequence variation in all parts of the genome is provided by classes of enzymes known as restriction endonucleases.  As we will discuss in Chapter 11, restriction endonucleases are enzymes that are important in protecting bacteria from invading DNA. While that is their natural function, they are very widely used for many experimental applications for molecular biology without considering their origins; most genetics and molecular biology laboratories have racks of different restriction endonucleases (or restriction enzymes) in their freezers for which no one knows their origins except as coming from one of the commercial vendors. They are a standard tool for many experiments in molecular biology.

Restriction enzymes bind to a DNA sequence at a specific sequence, and make a precise cut in the backbones of the two strands of DNA.  The recognition sequences for different restriction enzymes range from four bases to as many as twelve bases; most of the commonly used ones have recognition sites of five or six bases.  For example, as shown in Figure A, the widely used restriction endonucleases EcoRI recognizes and cuts the sequence 5’-GAATTC-3’, BamHI recognizes and cuts 5’-GGATCC-3’, SmaI recognizes and cuts 5’-GGGCCC-3’, and so on; hundreds of different enzymes are commercially available.  The sites are usually palindromes (that is, they read the same on the other, antiparallel strand) because the enzymes function as a dimer with two subunits that bind and cut the two strands. 

Figure A

As tools for manipulating DNA sequences in the laboratory, restriction enzymes are invaluable simply because they precisely cut a specific sequence.  The process is shown in Figure A.  SmaI cuts between the G and the C in its recognition site and leaves blunt ends on the molecules that it cuts.  Other restriction enzymes, such as EcoRI and BamHI make asymmetric cuts on the two strands, between the G and the A for EcoRI and between the two G’s for BamHI.  Thus, they leave an overhang of five bases on each strand, or a “sticky end”.  Suppose one wants to insert a segment of DNA from one source (a plant, for example) into bacterial plasmid so that the bacteria can produce many copies of the sequence or express the specific gene.  The process is shown in Figure B.  DNA from both the plant and the plasmid are purified separately and cut with BamHI so that both have complementary sticky ends.  The DNA molecules are combined in solution, the sticky ends from the plant DNA form base pairs with the sticky ends from the plasmid DNA, and the plant DNA is inserted into the plasmid; the backbones are sealed up the DNA ligase, an enzyme involved in DNA replication as discussed in Chapter 4.

Figure B

Restriction enzymes can also be used as tools for analyzing natural variation in DNA sequences simply because their recognition sites are specific.  An example is shown in Figure C.  One individual may have the sequence GAATTC at a specific site while a different individual may have AAATTC at the same site.  The change in the nucleotide sequence can be detected because the site in the first individual can be cut in vitro with EcoRI while the site in the second individual cannot but cut.  This will result in a size difference in the length of the restriction fragment between the two individuals that can be readily detected by agarose gel electrophoresis.  This is known as a restriction fragment length polymorphism or RFLP. RFLPs provide yet another phenotype, one that could be correlated with one observed from another assay but that does not need to be; it is a direct look at the underlying DNA sequence.  Because many hundreds of restriction enzymes are known, each with a specific recognition site, tens of thousands of RFLPs have been found as natural variation among genomes. 

Figure C

While we have discussed the variation arising from a change in a recognition site, restriction fragment polymorphisms also revealed another important source of genomic variation.  A change in the restriction site could arise from a single nucleotide change that could be detecting by digesting the DNA from both sources with many different enzymes.  But another common variation occurs when both individuals have the restriction sites but the distance between them is different; that is, the length of the DNA separating the restriction sites is different.  This led to the recognition of copy number variations or CNVs, that is, sequences that are present in both genomes but with a different number of copies.  It also led to the detection of inserted sequences that vary between genomes.  Thus, in addition to their importance in manipulating DNA sequences in the laboratory, restriction endonucleases have providing critical methods to analyze the structures and sequences of natural genomes. 

While restriction enzymes are still widely used, often as a first and inexpensive approach for analyzing a genome, they are still an indirect method to look at DNA sequence variation.  The development of methods for the rapid sequencing of genomes is now the most significant tool for identifying natural variation.  These methods are considered in Tool Box 3-2.