You are comparing two different cases. One is "the probability of landing heads on the next flip" and the other is "sum of the number of heads." The latter is governed by the Central Limit Theorem, which explains why the sum converges so rapidly (in many cases). Summing acts very differently than simply asking "what's the next result," and its the summing that causes the convergence.

From the perspective of freeing ourselves from this "paradox" the key is that for every case where we have N tosses that landed heads, we also have a corresponding case where we have N tosses that landed tails. From the perspective of "sum of the number of heads," this matters. In the case where we discus "the coin has landed heads up 10 times in a row," it does not, because the fact that we have stated it has landed heads up 10 times precludes us from considering the case where it landed 10 times tails up. The 10 tails case doesn't have any effect on our discussion of the next coin flip because it simply didn't happen. We aren't interested in it.

It's a bit easier to visualize the non-paradox if, instead of counting the number of heads and tails, we assign heads and tails numeric values (such as +1 and -1) and take the *average*. Most humans find it easy to intuit that the average of a sample will approach the average of the random variable as N gets large.

This visualization can be done in many ways. One way is to look at all the different sequences of heads and tails that can occur. Clearly each sequence occurs with equal probability (with a fair coin). However, when you put these into "bins" based on how many heads you see, you find that there are many more sequences with an "average" number of heads than those which have extraordinary numbers of heads. This causes us to see average numbers more often than extraordinary numbers.

To give a concrete example, the strings of length 3: 0 heads = 1 string ({T, T, T}), 1 heads = 3 strings ({H, T, T}, {T, H, T}, {T, T, H}), 2 heads = 3 strings ({H, H, T}, {H, T, H}, {T, H, H}), 3 heads = 1 string ({H, H, H}). 8 total strings, each with a probability of occurring of 1/8. Thus, by addition, probability of 0 heads = 1/8, 1 heads = 3/8, 2 heads = 3/8, 3 heads = 1/8

Mod deletes comments. Please take extended discussion to chat. (Comments are for clarifying the question; answers go in an answer.) – Joseph Weissman – 2016-01-07T18:17:52.933