It seems like this is just a case of semantic ambiguity in English--in the first statement Seth Yalcin seems to have implicitly thought of "if ... then" as expressing a conditional probability, i.e. the claim that a randomly chosen marble is "likely" red (where likely can be defined in terms of any desired probability threshold, say >50%) given that we already know it was observed to be big. Whereas when an "if ... then" construction is used in the verbal description of modus tollens, it's supposed to refer only to material implication.
Suppose instead we try to interpret the "if ... then" only as material implication, i.e. for some marble m we are asserting that "big(m) -> likelyred(m)", where the "big" predicate refers to what's found after checking its size, and the "likelyred" predicate refers to the fact that a rational observer would assign a >50% unconditional probability to the event that the marble will be found to be red, prior to actually observing any of its actual features including its size. Here the problem arises that for any marble m that happens to be big, big(m) would be true, but likelyred(m) would be false since the unconditional probability that a marble is red is 40/100. And according to the truth table for material implication, P -> Q is false when statement P is true but statement Q is false. So if we assume the "if ... then" in P1) is supposed to refer to material implication, and we use the above translation of the "likelyred" predicate in terms of unconditional probabilities, then P1) would simply be false for any marble m that happens to be big. The fact that you can then use modus tollens to get a false conclusion is hardly an argument against modus tollens if you're starting from a false premise.
On the other hand, suppose we stick with the above translation of "likelyred", but the marble m we have chosen not actually big. In that case "big(m) -> likelyred(m)" would be true, since the truth table for material implication says that P -> Q is true when statements P and Q are both individually false. However, in that case it is in fact guaranteed to be true that P2) "likelyred(m) is false" and P3) "big(m) is false", so in this case modus tollens would lead you from true premises to a true conclusion.
If we wanted to capture some idea of conditional probability, we could invent a new predicate "conditionallylikelyredgivenbig" that could be conceptually described as "the marble is big, and upon learning that information, a rational observer who had not yet observed its color would assign a >50% conditional probability to the event of it being found to be red". In that case, if we have a marble m for which big(m) is true, then conditionallylikelyredgivenbig(m) is also true. On the other hand, if we have a marble m for which big(m) is false, then conditionallylikelyredgivenbig(m) is also false. These are the only two combinations that can happen for any of the marbles, and since the truth table for material implication says that P -> Q is true if both P and Q are true and if both P and Q are false, P1) big(m) -> conditionallylikelyredgivenbig(m) would be true for any choice of m.
But if we use this translation scheme, then P2) should be translated as "conditionallylikelyredgivenbig(m) is false", and since conditionallylikelyredgivenbig(m) was defined above to mean that the marble is big, conditionallylikelyredgivenbig(m) is false whenever the marble is not big, i.e. "conditionallylikelyredgivenbig(m) is false" is true when the marble is not big. And in that case, then with P3) translated as "big(m) is false", P3 is guaranteed to be true as well, so modus tollens operating on two true premises has given us a true conclusion. On the other hand, if the marble is big, that means P2) is false, and again it's no strike against modus tollens if one of your two starting conclusions is false and you use modus tollens to get a false conclusion.