(Note: this is also a test on how to use latex on wordpress)
The diffusion and coalescent are probably the two most important models in population genetics. The two processes essentially explain the same thing (evolution of a population), and I knew for a while (since attending Yun Song’s class in 2012), that there is a nice identity that links the two:
There is quite a bit of notation in here, so let’s start: is the diffusion process at time , and is the ancestral process at . The diffusion process describes how the allele frequencies change forward in time, and the ancestral process describes how ancestral lineages merge when we track them backwards in time; both of them can be thought as measuring genetic drift. From a probability theoretic point of view, it is also interesting to note that the first expectation describes the moments of , and the right-hand side is the moment-generating function of . Thus, both specify the distribution of the respective processes completely!
For the last years, I always took it as some arcane math, especially since the equation is most frequently applied in very theoretical papers. However, I recently discovered a really nice proof, that provides an intuitive understanding on why the equality holds. Ironically, it is also (I think) the original source of the equation: Tavare, 1984.
For the proof, consider a population where we know allele has allele frequency at time , and that is further in the past than . Now, at time , we sample individuals from the population, and we want to calculate the probability they all have allele . Now, on the one hand, we know that the population has allele frequency at time , and drawing individuals, each individual has probability to be of type , so the probability of the event is . On the other hand, if we start the ancestral process with individuals at time , and we want to calculate the probability that all these individuals have the alelle. In this case, we track them backwards in time until time (when some of them will have coalesced), and only lineages are left. Then, for all lineages to have type , all the surviving lineages must have type , which happens with probability . Thus, both sides of the equation are the expectation of indicators of the same event!
Let be an exponential with parameter