Population genetics theory

(Note: this is also a test on how to use latex on wordpress)

The diffusion and coalescent are probably the two most important models in population genetics. The two processes essentially explain the same thing (evolution of a population), and I knew for a while (since attending Yun Song’s class in 2012), that there is a nice identity that links the two:

    \[\mathbb{E}[X_t^n|X_s=x] = \mathbb{E}[x^{A_s}|A_t=n]\]

There is quite a bit of notation in here, so let’s start: X_t is the diffusion process at time t, and A_t is the ancestral process at t. The diffusion process describes how the allele frequencies change forward in time, and the ancestral process describes how ancestral lineages merge when we track them backwards in time; both of them can be thought as measuring genetic drift. From a probability theoretic point of view, it is also interesting to note that the first expectation describes the moments of X_t, and the right-hand side is the moment-generating function of A_t. Thus, both specify the distribution of the respective processes completely!

For the last years, I always took it as some arcane math, especially since the equation is most frequently applied in very theoretical papers. However, I recently discovered a really nice proof, that provides an intuitive understanding on why the equality holds. Ironically, it is also (I think) the original source of the equation: Tavare, 1984.

For the proof, consider a population where we know allele A has allele frequency X_s at time s, and that s is further in the past than t. Now, at time t, we sample n individuals from the population, and we want to calculate the probability they all have allele A. Now, on the one hand, we know that the population has allele frequency X_t at time t, and drawing n individuals, each individual has probability X_t to be of type A, so the probability of the event is X_t^n. On the other hand, if we start the ancestral process with n individuals at time t, and we want to calculate the probability that all these n individuals have the A alelle. In this case, we track them backwards in time until time s (when some of them will have coalesced), and only A_s lineages are left. Then, for all n lineages to have type A, all the A_s surviving lineages must have type A, which happens with probability X_s^{A_s}. Thus, both sides of the equation are the expectation of indicators of the same event!

\textbf{Conditional exponential}

Let X be an exponential with parameter \lambda

    \[\mathbb{E}X|X<\alpha = \frac{1}{\lambda} - \frac{\alpha e^{-\lambda \alpha}}{1-e^{-\lambda \alpha}}\]