EECS 126 - Probability and Random Processes - J. Walrand


CONDITIONAL EXPECTATION

·         Key Ideas

·         Example 1: Discrete

·         Example 2: Joint Density

·         Example 3: Joint Density

·         Example 4: Hybrid

·         MMSE

·         Two Pictures

·         Lemma

·         Smoothing

·         Gambling System


 

Key Ideas

 

The conditional expectation tells us how to use the information in a random variable Y(w) to estimate another random variable X(w).  This conditional expectation is the best guess about X(w) given Y(w) if we want to minimize the mean squared error.  Of course, the value of this conditional expectation is a function of Y(w), so that it is also a random variable. We will learn to calculate the conditional expectation.

 


Example 1

 

Assume that {X(w), Y(w)} are discrete and take values in [x1 , …, xm]΄[y1, …, yn]. This pair of random variables is defined by specifying p(i,j) = P(X = xi, Y = yj), i = 1, …, m; j = 1, …, n. From this information, we can derive

P(Y = yj) = Si P(X = xi, Y = yj) = Si p(i,j). We can then calculate P[X = xi | Y = yj] = P(X = xi, Y = yj)/P(Y = yj) and we define

E[X|Y = yj] = Si xi P[X = xi | Y = yj].

 

We then define E[X|Y] = E[X|Y = yj] when Y = yj.  In other words,

 

E[X|Y] = SI E[X|Y = yj]1{Y = yj).

Note that E[X|Y] is a random variable.

For instance, your guess about the temperature in San Francisco certainly depends on the temperature you observe in Berkeley.  Since the latter is random, so is your guess about the former.

 

Although this definition is sensible, it is not obvious in what sense this is the best guess about X given {Y = yj }.  We discuss this below.


Example 2

 

Consider the case where (X, Y) have a joint density f(x,y) and marginal densities fX(x) and fY(y).  One can then define the conditional density of X given that Y = y as follows.  We see that

P[ x < X £ x + e | y < Y £ y + d ] = [f(x,y)ed]/[ fY(y)d] = f(x,y)e/fY(y) =: fX|Y(x|y)e.

As d goes down to zero, we see that fX|Y(x|y) is the conditional density of X given {Y = y}. We then define

 

E[X|Y=y] = ς x fX|Y(x|y)dx.


Example 3

 

Let (X, Y) be a point picked uniformly in [0, 1]2 and let Z = XY. Using the formula for the conditional density, one finds that  

    fX|Y(x|y) = - 1/(x ln(z)) for 0 £ z £ x £ 1, so that E[X|Z] = (Z - 1)/(ln(Z)).  The details are here

 


Example 4

 

The ideas of Examples 1 and 2 extend to hybrid cases.  For instance, consider the situation illustrated below:

 

The figure shows the joint distribution of (X, Y).  With probability 0.4, (X, Y) = (0.75, 0.25).  Otherwise (with probability 0.6), the pair (X, Y) is picked uniformly in the square [0, 1]2.  You see that E[X|Y = y] = 0.5 if y Ή 0.25.  Also, if Y = 0.25, then X = 0.75, so that

 

E[X|Y = 0.25] = 0.75.

 

Thus E[X|Y] = g(Y) where g(0.25) = 0.75 and g(y) = 0.5 for y Ή 0.25.

 

In this case, E[X|Y] is a random variable such that

 

E[X|Y] = 0.5 w.p. 0.6 and E[X|Y] = 0.75 w,p. 0.4.

 

(Note that the expected value of E[X|Y] is 0.5΄0.6 + 0.75΄0.4 = 0.6 and you can observe that E(X) = 0.5΄0.6 + 0.75΄0.4 = 0.6.  That is, E(E[X|Y]) = E(X) and we will learn that this is always the case.)


MMSE

 

The examples that we have explored led us to define E[X|Y] as the expected value of X when it has its conditional distribution given the value of Y.  In this section, we explain that E[X|Y] can be defined as the function g(Y) of the observed value Y that minimizes E((X – g(Y)2).  That is, E[X|Y] is the best guess about X that is based on Y, where best means that it minimizes the mean squared error.

 

One observation that helps appreciate this idea is that the value of a that minimizes E(X – a)2.  Indeed,

E(X – a)2 = E(X - E(X) + E(X) – a)2 = E(X – E(X))2 + (E(X) - a)2 + 2E((X – E(X))(E(X) - a))

  = E(X – E(X))2 + (E(X) - a)2  ³ E(X – E(X))2.

 

Now, E(X – g(Y)) 2 = ςς (x – g(y)) 2 f(x,y)dxdy = ς [ς (x – g(y)) 2 fX|Y(x|y)dx] fY(y)dy.  The previous expectation shows that we minimize the inside integral for each y by choosing g(y) = E[X|Y=y].

 

This is the story when joint densities exist.

 

There is another way to approach the conditional distribution: starting with the MMSE property as a definition.  One can the show that this exists and derive its properties.  We may do that in class, for the fun of it.


Two Pictures

The figure on the left shows that E[X|Y] is the average value of X on sets that correspond to a constant value of Y. The figure also highlights the fact that E[X|Y] is a random variable.

The figure on the right depicts random variables as points in some vector space. The figure shows that E[X|Y] is the function of Y that is closest to X. The metric in the space is d(V, W) = (E(V - W)2)1/2. To give you a concrete feel for this vector space, imagine that W = {w1, ..., wN} and that pk is the probability that w is equal to wk. In that case, the random variable X corresponds to the vector (X(w1)(p1)1/2, ..., X(wN)(pN)1/2).

 


Lemma

A random variable g(Y) is equal to E[X|Y] if and only if 

E(g(Y)h(Y)) = E(Xh(Y)) for any function h(.).     (*)

 

 


Smoothing

 

The figure below shows why E[E[X|Y,Z]|Y] = E[X|Y]. 

In particular, if Y is a constant, then E(E[X|Z]) = E(X).

Proof: We use the Lemma.  Let g(Y) = E[X|Y]. We show that for any h(Y) one has E(g(Y)h(Y)) = E(E[X|Y, Z]h(Y)). Now, E(E[X|Y, Z]h(Y)) = E(Xh(Y)) by (*) and E(g(Y)h(Y)) = E(E[X|Y]h(Y)) = E(Xh(Y)), also by (*). This completes the proof.


Gambling System

 

Conditional expectation provides a way to evaluate gambling systems.  Let {Xn, n ³ 1} be i.i.d. random variables with P{Xn = -1) = P(Xn = 1) = 0.5.  The random variable  Xn represents your gain at the n-th game of roulette, playing black or red and assuming that there is no house advantage (no 0 nor double-zero).  Say that you have played n times and observed (X1, X2, …, Xn).  You then calculate the amount Yn = hn(X1, X2, …, Xn) that you gamble on the next game.  You earn YnXn+1 on that next game.  After a number of such games, you have accumulated

Z = Y0X1 + Y1X2 +…+ YnXn+1.

 

(Here, Y0 is some arbitrary initial bet.) Assume that the random variables Yn are bounded (which is not unreasonable since there may be a table limit), then you find that

 

E(YnXn+1) = E(E[YnXn+1| X1, X2, …, Xn]) = E(YnE[Xn+1| X1, X2, …, Xn]) = E(Yn0) = 0.

 

Consequently, E(Z) = 0.  This result shows the “impossibility” of a gambling system and guarantees that the casinos will be doing well.

 



Jean Walrand – November 2003  --- INDEX