|
EECS 126 - Probability and Random
Processes - J. Walrand |
CONDITIONAL
PROBABILITY AND INDEPENDENCE
·
Conditional Probability: Remark, Bayes’ Rule
·
Independence: Example
1, Example 2, General Definition, Subtility
Assume that we know that the outcome is in B Ì W. Given that information, what is the probability that the outcome is in A Ì W? This probability is written P[A | B] and is read “the conditional probability of A given B,” or the probability of A given B, for short.
For instance, one picks a card at random from a 52-card deck. One knows that the card is black. What is the probability that it is the ace of clubs? The sensible answer is that if the only think that one knows is that the card is black, then that card is equally likely to be any one of the 26 black cards. Therefore, the probability that it is the ace of clubs is 1/26. Similarly, given that the card is black, the probability that it is an ace is 2/26, because there are 2 black aces (spades and clubs).
We can formulate that calculation as follows. Let A be the set of aces (4 cards) and B the set of black cards (26 cards). Then, P[A | B] = P(A Ç B)/P(B) = {2/52}{26/52} = 2/26. Indeed, for the outcome to be in A, given that it is in B, that outcome must be in A Ç B. Also, given that the outcome is in B, the probability of all the outcomes in B should be renormalized so that they add up to 1. To renormalize these probabilities, we divide them by P(B). This division does not modify the relative likelihood of the various outcomes in B.
More generally, we define the probability of A given B by
P[A | B] = P(A Ç B)/P(B).
This definition makes sense if P(B) > 0. If P(B) = 0, we define P[A | B] = 0. This definition is somewhat arbitrary but it makes the formulas valid in all cases.
Define P’(A) = P[A|B] for any event A. Then P’(.) is a new probability
measure. In particular, the usual
formulas apply. For instance, P’[A Ç C] = P’[A | C]P’(C), i.e.,
P[A Ç
C | B] = P[A | B Ç
C]P[C | B],
which you can verify by using the definition of P[. | B]. After a while (meaning by the end of this week), you should be able to write expressions such as the one above by thinking of P’(.) as a new probability.
Let B1 and B2 be disjoint events whose union is W. Let also A be another event. We can write
P[B1 | A] = P(B1 Ç A)/P(A) = P[A | B1]P(B1)/P(A), and
P(A) = P(B1 Ç A) + P(B2 Ç A) = P[A | B1]P(B1) + P[A | B2]P(B2). Hence,
P[B1 | A] = P[A | B1]P(B1)/{
P[A | B1]P(B1) + P[A | B2]P(B2)
}.
This formula extends to a finite number of events Bn that partition W. The result is know as Bayes’ rule. Think of the Bn as possible “causes” of some effect A. You know the prior probabilities P(Bn) of the causes and also the probability that each cause provokes the effect A. The formula tells you how to calculate the probability that a given cause provoked the observed effect. Applications abound, as we will see in detection theory. For instance, you alarm can sound either if there is a burglar or also if there is no burglar (false alarm). Given that the alarm sounds, what is the probability that it is a false alarm?
It happens that knowing
that an event occurred does not change the probability of another event. In that case, we say that the events are independent. Let us look at an example first.
We roll two dice and we designate the pair of results by w = (w1, w2). Then W has 36 points: W = {( w1, w2) | w1 = 1, …, 6 and w2 = 1, …, 6}. Each of these points has probability 1/36. Let A = {w Î W | w1 Î {1, 3, 4}} and B = {w Î W | w2 Î {3, 5}}. Assume that we know that the outcome is in B. What is the probability that it is in A?

Using the conditional probability formula, we find P[A | B] = P(A Ç B)/P(B) = {6/36}/{12/36} = ½. Note also that P(A) = 18/36 = ½. Thus, in this example, P[A | B] = P(A).
The interpretation is that if we know the outcome of the second roll, we don’t know anything about the outcome of the first roll.
We pick two points independently and uniformly on [0, 1]. In this case, the outcome w of the experiment (the pair of points chosen) belongs to the set W = [0, 1]2. That point w is picked uniformly in [0, 1]2. Let A = [0.2, 0.5]´[0, 1] and B = [0, 1]´[0.2, 0.8]. The interpretation of A is that the first point is picked in [0.2, 0.5]; that of B is that the second point is picked in [0.2, 0.8]. Note that P(A) = 0.3 and P(B) = 0.6. Moreover, since A Ç B = [0.2, 0.5]´[0.2, 0.8], one finds that P(A Ç B) = 0.3´0.6 = P(A)P(B). Thus, A and B are independent events.
We can write all this by designating by (X, Y) the pair of points and we have that
P(X Î [0.2, 0.5] and Y Î [0.2, 0.8]) = P([0.2, 0.5]´[0.2, 0.8]) = 0.3´0.6 = P(A)P(B)
= P(X Î [0.2, 0.5])P(Y Î [0.2, 0.8]).
As you may suspect, we will say later that X and Y are independent.
[Note for purists: We introduce random variables X and Y early in
the game, but again (X, Y) = w in all our examples, so we do not have
to worry about the properties of the mapping from w to our (X, Y). I like to introduce (X, Y) early because
independence seems rather abstract when defined on arbitrary events, but it
becomes very intuitive on simple product spaces.]
Generally, we say that a collection of events {Ai, i Î I} are mutually independent if for any subcollection one has
P(Ai Ç Aj Ç … Ç
Ak) = P(Ai)P(Aj) … P(Ak).
The definition seems innocuous, but one has to be a bit careful. For instance, look at the following example:

The sample space W has four points that each have a probability ¼. The events A, B, C are defined as shown. You can verify that A and B are independent. Indeed, P(A Ç B) = ¼ = P(A)P(B). Similarly, A and C are independent and so are B and C. However, the events{A, B, C} are not mutually independent. Indeed, P(A Ç B Ç C) = 0 ¹ P(A)P(B)P(C) = 1/8.
The point of the example is the following. Knowing that A has occurred tells us something about the value of w that nature has selected. This knowledge, by itself, is not sufficient to affect our estimate of the probability that C has occurred. The same is true is we know that B has occurred. However, if we know that both A and B have occurred, then we know that C cannot have occurred. Thus, it is not correct to think that “A does not tell us anything about C, B does not tell us anything about C, therefore A and B do not tell us anything about C.” I encourage you to think about this example carefully.
Jean Walrand – January 2000 --- INDEX