Why is probability that at least one hypothesis out of $k$ being consistent with $m$ training examples $k(1- \epsilon)^m$?

1

My question is actually related to the addition of probabilities. I am reading on computational learning theory from Tom Mitchell's machine learning book.

In chapter 7, when proving the upper bound of probabilities for $\epsilon$ exhausted version space (theorem 7.1), it says that the probability that at least one hypothesis out of the $k$ hypotheses in the hypotheses space $|H|$ being consistent with m training examples is at most $k(1- \epsilon)^m$.

I understand that the probability of a hypothesis, $h$, consistent with m training examples is $(1-\epsilon)^m$. However, why is it possible to add the probabilities for $k$ hypotheses? And might the probability be greater than 1 in this case?

calveeen

Posted 2020-04-29T03:06:50.460

Reputation: 909

What is the opposite of "at least 1"? – nbro – 2020-04-29T04:08:19.670

thats equal to 1 - P(none are consistent) ? and P(none of k hypotheses are consistent) = $\epsilon^{km}$ ? so thats 1 - $\epsilon^{km}$ – calveeen – 2020-04-29T04:14:09.983

Answers

1

Let $A$ and $B$ be two events. In general, the probability that either $A$ or $B$ occurs is defined as

$$ P(A \text{ or } B) = P(A) + P(B) - P(A \text{ and } B) $$

If $A$ and $B$ are disjoint, i.e. they cannot happen at the same time, then $P(A \text{ and } B) = 0$, so the above formula becomes

$$ P(A \text{ or } B) = P(A) + P(B) $$

If the probability of one arbitrary hypothesis being consistent with $m$ training examples is $(1-\epsilon)^m$, then, given the rule above and assuming that only one hypothesis is consistent with $m$ training examples, the probability of one or more (i.e. at least one) of the hypotheses being consistent with training examples is the sum of the probabilities, i.e. $k (1-\epsilon)^m$.

This probability can be bigger than one if more than one hypothesis is consistent with $m$ training examples. In that case, you have to take into account the probability that both hypotheses are consistent.

See e.g. notes General Probability, I: Rules of probability for more details about the union rule and other rules of probability.

nbro

Posted 2020-04-29T03:06:50.460

Reputation: 19 783

is $k(1 - \epsilon)^m$ a lower bound on the probability that exactly one hypothesis out of k is consistent ? – calveeen – 2020-04-29T04:41:39.133

@calveeen It cannot be a lower bound. The probability that one hypothesis out of $k$ is consistent (according to you and your book) is $(1 - \epsilon)^m$, which is smaller than $k(1 - \epsilon)^m$, because $k \geq 1$. – nbro – 2020-04-29T12:12:32.753