- This is an exercise straight out of one my courses, Basic Population Genetics for Dog Breeders, but it is so important for breeders to understand that I'm making it available here. Please take the time to work your way through it. There are some simple simulations you do with colored "alleles", then some computer simulations where you can do some experiments that will allow you to explore the factors that can affect your breed's gene pool in ways you wouldn't expect. These simple exercises will change the way you view the genetic stability of your breed - I promise. This might be a fun thing to do the next time you get together with a group of fellow breeders.
Things to think about when you're done -- How big is your breed - not in total numbers of dogs, but in the size of the breeding population?
- What fraction of the dogs in your breed are allowed to breed?
- Does your breed have restrictions in the standard that remove dogs from the breeding population?
- Do you have breeding restrictions on the puppies you place?
- Is your breed relatively unpopular or dominated by just a few kennels?
- How vulnerable is your breed to genetic drift?
You probably know that for every gene location -‐ a locus -‐ an animal has two alleles, one that came from the sire and one from the dam. Which one of the two alleles gets passed on to each offspring is random, so the pair of alleles that the offspring inherits for each gene is determined only by chance.
The Binomial Situation
Take out a coin. Every coin has two sides -‐ heads and tails -‐ and for this reason when we flip a coin we are talking about binomial probability. If it's a fair coin, there is a 50:50 chance of getting heads every time you flip it. You might get 5 heads in a row, but nevertheless at the next toss the chance of getting a heads is one out of two.
If you only flip it once and get heads, then the outcome of the trial is 100% heads. If you flip it 5 times and get 4 heads and 1 tails, then the outcome of the trial is 80% heads. If you flip it 100 times, or 1000 times, the probability of getting an extreme result goes down, and it should tend towards 50:50. This is the basis of the "bell curve" -‐ most of the results will be close to 50:50 and deviations from this will become rarer as they become more extreme.
What does this have to do with dog breeding? Remember, which of the two possible alleles an offspring inherits from each parent is determined randomly -‐ but when there are only a small number of "trials" (puppies in this case), the results could be extreme just by chance. This is relevant to dogs because in terms of "large" vs "small", the typical size of a litter is small. Because litters are statistically small, extreme results from a binomial sampling can occur.
Simulating Binomial Sampling With Beans
It's easy to demonstrate what we're talking about here. I'm sure you understand the example of the coin toss (or if you didn't, get out a coin and do some tossing). Let's do the same sort of thing, but now using beans to represent the alleles a dog could inherit at a particular locus.
This is where you get to play with the beans! Get out a small bowl and some cups. Start with two types (colors) of beans, and count out 50 of each into the bowl. Mix them all up with your hand. Now we're going to simulate inheritance.
On a piece of paper make three columns -‐ label the first column LL (for light-light, or you could use RR for red-red, or whatever color beans you're using), the middle column LD, and the third column DD. Before we do any bean breeding, let's do a bit of math. Based on what we've talked about already, answer these questions:
1) If you reach into the bowl and randomly (without peeking!) select one bean, what is the probablility it will be a "light" bean?
2) If you put that bean back in the bowl and mix it around, what is the probability of selecting another light bean?
The probability for each independent draw is 50% (0.5). So, what is the probability of drawing two light beans in a row (LL)? It's the product of their independent probabilities -‐ (0.5) x (0.5) = 0.25, or 25%.
If we know this for the light beans, it must be equally true of the dark beans (or whatever color you're using) (DD).
Now, what is the probability of choosing a light bean first and a dark bean second (LD)? It's still (0.5) x (0.5) = 0.25, or 25%.
And what about drawing the dark bean first and the light bean second (DL)? So the probability of getting DL is also 25%.
Go back to your piece of paper with the columns, and above LL write 25%, and above DD write 25%. What about the middle column? If we consider all the beans of a particular color to be identical, then LD is the same as DL. So the probability of getting two different colors is 25% + 25% = 50%, and you can write that above the middle column.
Okay, back to our beans. Mix them up, then reach into the bowl and pull out TWO beans at once, and put a tick mark in the appropriate column (if two light beans, count one for "LL"). Put those beans back (so there are always 100 beans in the bowl in a 50:50 ratio), draw another pair, and log the result. Do this a total 20 times. Now tally up the number of occurrences of each combination (e.g., 8 LL, 4 LD, 8 DD). Then divide each of these numbers by the total number of draws (20) to determine the frequency of each outcome (e.g., 8/20 = 0.4, or 40%). How close did this come to the statistically "expected" outcome?
Draw a line under those data and repeat this exercise 2 more times, and calculating the fractional outcomes as before.
In all liklihood, none of the three trials produced the results you predicted at the beginning. But extreme deviations from expected can occur with small sample sizes. Since we did all three of these trials exactly the same way, we can pool the results and calculate the overall frequency of each outcome by adding the results in each column and dividing by 60. For example, if for LL you got 8, 5, and 3 for the three trials, the total is 16, and you divide that by 60; likewise for the other two outcomes. What you should find is that the sum of the three draws (a total of 60) comes closer to the expected values you wrote at the top of the columns than the trials with only 20 draws. (Did you??? If you didn't , do another few draws of 20 just to convince yourself that eventually you will come close to the expected values.)
The bottom line here is that when you are working with a small sample (20), you are more likely to get frequencies that are different than expected. As the number of samples increases, the proportions should get closer and closer to the predictions.
Physical Simulation of Genetic Drift
Get out a cup, and put in it 100 beans in the proportions you got from your first trial above. (Just multiply the numbers in each column by 5.) If by some fluke you got the exact expected proportions (25% LL + 50% LD + 25% DD), pick the second trial (or third). We want to do a new simulation with a population where the frequencies of D and L are not equal. Do the same thing you did before -‐ record the results of 20 draws of a pair of beans, this time just once. Compute the proportions, then mix up another bowl of 100 beans in these new proportions. Do one more set of 20 draws and record your data.
Let's look at your data. You know you started with L and D beans in equal proportion at the very beginning. You then used the data from the first round of draws to create a the next generation of our bean population, which just by chance alone has a different proportion of L and D. And you repeated this again, creating another new generation that probably had again a different ratio of L:D. With every subsequent generation, the frequency of alleles in the population will vary, just by chance.
This change in the frequency of alleles in our population with each generation is called genetic drift. If you continued doing these trials, say 100 or 1000 of them, you would see that the effect of genetic drift on the genetics of a population can be profound. But instead of playing with beans for a few more hours, we can do the same kind of simulation very quickly using a computer program that will create a virtual population of alleles, then randomly select, replace, and select again, in the same way you just have, for as many generations as you want. Using this, we can do a bunch of experiments very quickly.
Computer Simulation of Genetic Drift
<If you skipped doing the bean experiment because you thought you would be fine just reading it instead, or because you thought it was hokey and a waste of time, please reconsider. Without doing this yourself, you won't understand how the computer simulation works that we're about to do. Get out your beans and just do it. Trust me, it will make what follows much easier to understand.>
You can put the beans away for now, and go to the Red Lynx Population Genetics Simulator (http://scit.us/redlynx/).
To see how it works, run some simulations using the default settings -‐ 2000 generations, population size of 800, and initial frequency of each of our alleles (A1 and A2) of 50%.
Each time you click on "Run Simulation", it will do the same thing you just did for beans -‐ starting with a 50:50 mix of alleles, it will create 800 new individuals with alleles drawn at random. Then starting over again with alleles at their new frequencies, it will repeat again for 2000 generations. It will draw a line for each run showing how the frequency of the A1 allele changed over time. The total number of alleles in the population stays the same over time, so if A1 goes up, A2 must go down. If A1 goes all the way to 100%, that means A2 has -‐ just by chance -‐ been lost from the population. Likewise, if A1 goes to zero, then all of the alleles in the population are A2. Each time you click on run, it does another simulation of 2000 generations and plots a new line.
Okay, let's do some experiments. From the bean counting experiment we did above, we decided that if a population is very large, the proportions of alleles drawn randomly should be close to what is predicted. When the population is small, just by chance you can get a result that is extreme.
We can do that experiment now with hundreds of generations in a few seconds. Try this:
1) Run 10 simulations with the default settings (population size of 800) except change the number of generations to 200, which is more reasonable for purebred dogs. How many times was the A1 allele lost from the population (its frequency went to zero)? How many times did the A1 allele go to fixation (100% A1) -‐ i.e, A2 was lost, and all individuals were therefore homozygous for A1?
2) Clear the graph, change the population size to 400, run 10 simulations, and note as above the number of times A1 was eliminated or became fixed in the population.
3) Do the same thing with population sizes of 100, 50, and 25. You should be getting the picture. What is the effect of population size on the genetic stability of our virtual population?
Now, let's simulate something more interesting. Let's pretend A1 is the gene for PRA or some other genetic disorder, and we'll make it rare in the population -‐ say 10%. Change initial frequency to 10%, put the population size back to 800, and run 10 trials, followed by population sizes of 400, 100, 50, and 25, as before.
As before, you will notice that population size has a large influence on the stability of the allele in the population, with the results getting more and more unpredictable as the population gets smaller. In these simulations, you probably found that many times your rare PRA allele was completely eliminated from the population, but occasionally (and more frequently at small population sizes), the frequency of this allele increased, perhaps substantially.
What is the size of the reproductive population in your breed? Think about some genetic disease that occurs in your breed that is caused by an autosomal recessive allele -‐ e.g., PRA, or von Wildebrand disease. This disease gene could start out being rare in your breed, but in just a few generations -‐ by chance alone -‐ it could be lost entirely, or it could become very common and even fixed in the breed. Of course, as this allele becomes more common, the frequency of affected animals will go up (because the number of homozygous offspring will increase), and suddenly a genetic disorder shows up in your breed. This isn't a spontaneous mutation -‐ it is an allele that has been there all along, and just by chance has become more common by genetic drift.
The frequencies of all alleles can vary each generation because of genetic drift, not just disease alleles. Just by chance, dogs might get larger, or bolder, or a rare color could become more common, or they might become more sensitive to a particular disease or have more allergies. The point to remember is these changes are occurring because of changes in allele frequencies of the population.
The Power of Genetic Drift
You now might be worrying about your own breed, wondering how large your breeding population is, and what nasty gene might be lurking in your gene pool waiting for the chance -‐ just by chance -‐ to become a serious problem. This is definitely something breeders should be thinking about. In most breeds, only a small percentage of puppies born each year are bred, and those are not selected randomly from each litter. Under these conditions, as you have seen, some dramatic shifts in allele frequencies can be occurring by chance without breeders even being aware. Population size is far more influential on the genetic status of a breed than most breeders realize.
http://www.radford.edu/~rsheehy/Gen_flash/popgen/
Basic genetic drift
You will see two graph axes and along the top the selection options. Let’s start with something you’re familiar with -
Population size = 100
A1 allele = 0.5
# of populations = 1
Number of generations = 200
Leave all others at the defaults
<If your screen is wide enough, you might see a box marked “Finite” in the extreme upper left. Leave that box unchecked.>
Hit Run. You will now see in the top graph that it has plotted the change in frequency of both A1 and A2 alleles, and notice that the two lines are mirror images of each other. As the frequency of one goes up, the frequency of the other MUST go down. At the bottom you will see a graph of the genotypes that result from the changes in allele frequencies. Remember from last time that you calculated the frequencies of LL, DD, and LD to be 0.25, 0.25, and 0.5. (These are the numbers you wrote as the “expected” values from the previous experiment on genetic drift.)
Now you see each of the possible genotypes plotted for the A1 and A2 alleles. The lines for the homozygous combinations (A1A1 and A2A2) should be similar to the lines in the top graph, but the third line is for the heterozygous combination (A1A2). You can see that the heterozygous condition is lost from the population if one of the alleles goes extinct (of course). With the loss of one of the alleles, you’ve really lost 2 possible genotypes from the population, as well as the phenotypes those combinations produced. Everybody in the population is now fixed for the remaining allele in the homozygous state, and if this happenes to be a gene that has a detrimental effect on the animal there’s nothing breeders can do about it. You can’t breed away from the problem because there is no alternative allele that you can select for. Obviously, this is bad.
You can play around with population size as you did in the previous exercise and see the effect it has on genotype, which is really what you are working with as a breeder because genotype determines phenotype.
Fitness effects
Biologists use the word “fitness” to refer to the liklihood that an animal will pass on its genes to the next generation. Animals that die before they reproduce have a fitness of zero. Animals that have more offspring have a greater fitness than ones with fewer offspring. In our population simulation, we can observe what happens when a particular genotype has a negative effect on fitness.
Leaving the settings as they were when you started the first exercise, we will now assign a detrimental effect to the A2A2 homozygous condition. In the boxes at the top under “Fitness”, leave 1’s in the A1A1 and A1A2 boxes (1 is no detrimental effect, 0 is lethal), and change A2A2 to 0.9 – a reduction in fitness of 10%. Run the simulation with these settings.
You will see that there is now a solid black line in the upper graph. This is the “theoretical” curve, and you can compare that with the behavior of the A1 allele. You should see that a reduction in fitness of only 10% has a pretty significant effect on the frequency of that allele – it is eliminated from the population more quickly than before, and of course the heterozyge combination is eliminated as well. At the upper left, it will tell you the “mean generations to fixation = some number. You can compare how this number changes with duplicate runs under the same conditions, and like you’ve seen before the more trials you do the closer the average response will get to the theoretical one.
Now do some experiments by reducing the fitness of the homozygous A2A2 in 10% steps, running 5 trials of each and recording the number of generations to fixation (e.g., 0.9, 0.8, 0.7, 0.6, etc.). As you would expect, as the fitness penalty for the homozygous A2A2 increases, both the homozygous and heterozygous combinations are lost more quickly from the population. But it takes just a tiny penalty to have an effect. Inbreeding causes inbreeding depression, which is essentially a reduction in fitness. In most populations of animals, the effects of inbreeding depression begin to appear at inbreeding coefficients above 5%. What you have just seen is that even a very small reduction in fitness can profoundly affect the allele frequencies in a population, but it just takes a bit longer on average than with more severe penalties.
Bottlenecks
A bottleneck is a drastic reduction in the size of the population. Most purebred dog breeds have a bottleneck somewhere in their past. Many breeds were drastically reduced during wars, distemper epidemics, or when they were no longer needed as a working breed. Other breeds were affected by artificial bottlenecks produced by the extreme dominance of a few very popular dogs in the breeding population, as happened in Standard Poodles when the Wycliffe kennel dominated the breed in the 1950’s. In some cases, a breed was reduced to only a few dogs – Norwegian Lundehunds are all descendants of 6 dogs that were all that remained of the breed after a series of unfortunate events – and 3 of these dogs were siblings and 5 shared a grandmother.
A drastic reduction in population size can substantially alter the allele frequencies in the subsequent population. We can simulate the effects of bottlenecks of various sizes.
Start again with these settings –
Population size = 500
A1 allele = 0.5
# of populations = 1
Number of generations = 200
Leave all others at the defaults (fitness of all genotypes = 1)
Run a few simulations at these settings so you can get an idea of the pattern this produces.
Now click the “Bottleneck!” box. We will start our bottleneck at generation 50, end it at 55, and reduce the population during the bottleneck from 500 to 100. Do 5 runs and record which allele (A1 or A2) was more frequent at the end of the run and whether either allele was fixed or lost (you will see the changes in genotypes in the bottom graph).
Reduce the population bottleneck to 50 (10% of the original population size), and run again 5 times, recording which allele was more frequent at the end and whether either allele was fixed or lost. Reduce the bottleneck again to 10 and repeat, then to 5 and repeat.
Repeat these trials with the initial frequency of A1 at 0.8 (so A2 is 0.2) instead of 50%. and reducing the size of the bottleneck by steps.
You will see that a bottleneck changes the genetic trajectory of the population, and the more severe the bottleneck, the less predictable are the consequences.
Think about this. Registration numbers for many breeds are declining. Popular sires are common. Breeders neuter most of their puppies or impose breeding restrictions to “protect their line.” All of these things have genetic consequences for the breed that will persist for dozens – or hundreds – of generations.
Just for fun, you can do some more runs while imposing a fitness penalty for homozygous A2A2 as you did above (or improvise – apply a fitness penalty to the heterozygote and see what happens).
Founder effect
The founder effect is really just a special case of a bottleneck. Some subset of a larger population is separated off to form a new population. (In a bottleneck, the rest of the original population usually died.) The number of founder dogs is the size of the bottleneck in this case. The fewer the number of animals used to start the new population, the less likely it is that the gene pool of the subset is the same as the original population. Many breeds were founded with just a few dogs, and many went on to suffer through a bottleneck or two as well. A breed starts out at founding with a population of animals that are declared to be “purebred” whatever-they-are. Generations later, the dogs might still look like the original founders because breeders are selecting for type, but the genes that aren’t under selection are subject to the effects of genetic drift and bottlenecks, with completely unpredictable results.
It doesn’t matter how many years of experience you’ve had as a breeder. It doesn’t matter how carefully you select the dogs in your breeding program and choose among the offspring to continue on with. Underneath the traits you can see, all sorts of things can be (and probably are) happening to allele frequencies that matter for other things, with effects that will last for generations. And for all of these effects we’ve looked at, the size of the population has a most profound effect.