EECS 126 - Probability and Random Processes - J. Walrand

PROBABILITY SPACE

### ·         Choosing at Random

·        Events

·        Examples

# Choosing At Random

How do we model choosing something at random?  Examples will help us come up with a good definition.

First consider picking a card out of a 52-card deck.  We could say that the odds of picking any particular card are the same as that of picking any other card, assuming that the deck has been well shuffled.  We then decide to assign a “probability” of 1/52 to each card.  That probability represents the odds that a given card is picked.  One interpretation is that if we repeat the experiment “choosing a card from the deck” a large number N of times (replacing the card previously picked every time and re-shuffling the deck before the next selection), then a given card, say the ace of diamonds, is selected approximated N/52 times.  Note that this is only an interpretation.  There is nothing that tells us that this is indeed the case; moreover, if it is the case, then there is certainly nothing yet in our theory that allows us to expect that result.  Indeed, so far, we have simply assigned the number 1/52 to each card in the deck.  Our interpretation comes from what we expect from the physical experiment.  This remarkable “statistical regularity” of the physical experiment is a consequence of some deeper properties of the sequences of successive cards picked from a deck.  We will come back to these deeper properties when we study independence.

Second, consider the experiment of throwing a dart on a dartboard.  The likelihood of hitting a specific point on the board, measured with pinpoint accuracy, is essentially zero.  Accordingly, in contrast with the previous example, we cannot assign numbers to individual outcomes of the experiment.  The way to proceed is to assign numbers to sets of possible outcomes.  Thus, one can look at a subset of the dartboard and assign some probability that represents the odds that the dart will land in that set.  It is not simple to assign the numbers to all the sets in a way that these numbers really correspond to the odds of a given dart player.  Even if we forget about trying to model an actual player, it is not that simple to assign numbers to all the subsets of the dartboard.  At the very least, to be meaningful, the numbers assigned to the different subsets must obey some basic consistency rules.  For instance, if A and B are two subsets of the dartboard such that A Ì B, then the number P(A) assigned to A must be at least as large as the number P(B) assigned to B.  Also, if A and B are disjoint, then P(A È B) = P(A) + P(B).  Finally, P(W) = 1, if W designates the set of all possible outcomes (the dartboard, possibly extended to cover all bases).  This is the basic story: probability is defined on sets of possible outcomes and it is additive.  [However, it turns out that one more property is required: countable additivity (see below).]

Note that we can lump our two examples into one.  Indeed, the first case can be viewed as a particular case of the second where we would define P(A) = |A|/52, where A is any subset of the deck of cards and |A| is the number of cards in the deck. This definition is certainly additive and it assigns the probability 1/52 to any one card.

Some care is required when defining what we mean by a random choice.  See Bertrand’s paradox for all illustration of possible confusion. Another example of the possible confusion with statistics is Simpson’s paradox.

Events

The sets of outcomes to which one assigns a probability are called events.  It is not necessary (and often not possible, as we may explain later) for every set of outcomes to be an event.

For instance, assume that we are only interested in whether the card that we pick is black or red.  In that case, it suffices to define P(A) = 0.5 = P(Ac) where A is the set of all the black cards and Ac is the complement of that set, i.e., the set of all the red cards. Of course, we know that P(W) = 1 where W is the set of all the cards and P(Æ) = 0, where Æ is the empty set.  In this case, there are four events: Æ, W, A, Ac.

More generally, if A and B are events, then we want Ac, A Ç B, and A È B to be events also.  Indeed, if we want to define the probability that the outcome is in A and the probability that it is in B, it is reasonable to ask that we can also define the probability that the outcome is not in A, that it is in A and B, and that it is in A or in B (or in both).  By extension, set operations that are performed on a finite collection of events should always produce an event.  For instance, if A, B, C, D are events, then [(A \ B) È C] Ç D should also be an event.  We say that the set of events is closed under finite set operations.  [We explain below that we need to extend this property to countable operations.]  With these properties, it makes sense to write for disjoint events A and B that P(A È B) = P(A) + P(B).  Indeed, A È B is an event, so that P(A È B) is defined.

You will notice that if we want A Ì W  (with A ¹ W and A ¹ Æ) to be an event, then the smallest collection of events is necessarily {Æ, W, A, Ac}.

If you want to see why, generally and mostly for uncountable sample spaces, all sets of outcomes may not be events, check this note

This topic is the first serious hurdle that you face when studying probability theory.  If you understand this section, you increase considerably your appreciation of the theory.  Otherwise, many issues will remain obscure and fuzzy.

We want to be able to say that if the events An for n = 1, 2, …, are such that An Í An + 1  for all n and if  A := Èn An, then P(An) ­ P(A) as n ® ¥.  Why is this useful?  This property is the key to being able to approximate events.  The property specifies that the probability is continuous: if we approximate the events, then we also approximate their probability.

This strategy of “filling the gaps” by taking limits is central in mathematics.  You remember that real numbers are defined as limits of rational numbers.  Similarly, we will see that integrals are defined as limits of sums.  The key idea is that different approximations should give the same result.  For this to work, we need the continuity property above.

To be able to write the continuity property, we need to assume that A := Èn An is an event whenever the events An for n = 1, 2, …, are such that An Í An + 1.  More generally, we need the set of events to be closed under countable set operations.

For instance, if we define P([0, x]) = x for x in [0, 1], then we can define P([0, a)] = a because if e is small enough, then An := [0, a – e/n] is such that An Í An + 1 and  [0, a) := Èn An.  We will discuss many more interesting examples.

You may wish to review the meaning of countability.

Examples

Throughout the course, we will make use of simple examples of probability space.  We review some of those here.

¨ Choosing uniformly in {1, 2, …, N}

We say that we pick a value X uniformly in {1, 2, …, N} when the N values are equally likely to be selected.  In this case, the sample space W is W = {1, 2, …, N}.  For any subset A Ì W, one defines P(A) = |A|/N where |A| is the number of elements in A.  For instance, P({2, 5}) = 2/N.

Equivalently, we will write P(X Î A) = P(A).  Thus, the probability that the random value X is picked in the set {2, 5} is 2/N.

[Note for purists: This notation jumps somewhat ahead (we have not defined a random variable X).  However, in this case, X = v and we are simply defining P(v Î A) := P(A).]

¨ Choosing uniformly in [0, 1]

Here, W = [0, 1] and one has, for example, P([0, 0.3]) = 0.3 and P([0.2, 0.7]) = 0.5.  That is, P(A) is the “length” of the set A. Thus, if X is picked uniformly in [0, 1], then one can write P(X Î [0.2, 0.7]) = 0.5.  We also write P(0.2 £ X £ 0.7) = 0.5.

¨ Choosing uniformly in [0, 1]2

Here, W =  [0, 1]2  and one has, for example, P([0.1, 0.4]´[0.2, 0.8]) = 0.3´0.6 = 0.18.  That is, P(A) is the “area” of the set A.  Thus, if (X, Y) is picked uniformly in [0, 1]2, then one can write P((X, Y) Î [0.1, 0.4]´[0.2, 0.8]) = 0.18.  Similarly, in that case,

P(X + Y £ 1) = 0.5 and P(X2 + Y2 £ 1) = p/4.

Jean Walrand – January 2000  --- INDEX