The behavior you are seeing is simply the effect of *doubling* how far back in the past we start with each iteration, compared with the smaller *variance* in the distribution of how far we *have* to start the chain before coupling by time 0.

In other words, in the “next-to-last” iteration of the while loop, we start the chain at time -n=-(2^(j-1)-1), and advance to time zero, but the two bounding states haven’t coupled. So in the next (final) iteration, we start the chain farther back in time… but we didn’t really *have* to look so much farther back. Instead of roughly *doubling* the number of time steps in the past that we start, we could have just *incremented* the number of steps by a smaller fixed amount.

For example, let’s temporarily modify this increment to: updates.insert(0, (rng_next, 100)). In other words, instead of doubling, we always look 100 time steps farther back in the past with each iteration. If we do this, and repeatedly sample using monotone_cftp(), then we observe a “finer” variability in the distribution of number of iterations required… but at a cost of much slower execution time, since we are making a total number of updates that is quadratic in the mixing time, as opposed to a logarithmic multiplier with the doubling approach. (When testing this, I recommend using a much smaller chain, such as, say, Shuffle(26) described in the post, since Tiling((20,20,20)) will be prohibitively slow.)

]]>