|
EECS 126 - Probability and Random
Processes - J. Walrand |
ESTIMATION
The estimation problem is similar to the detection problem except that the unobserved random variable X does not take values in a finite set. That is, one observes Y and one must compute an estimate of X based on Y that is close to X on average.
Once again, one has Bayesian and non-Bayesian formulations. The non-Bayesian case typically uses MLE[X | Y] defined as in the discussion of Detection.
An estimator of X given Y is a function g(Y).
The estimator g(Y) is unbiased if E(g(Y)) = E(X): At least its mean is the same as the of X.
It is then efficient if var(g(Y)) is the smallest possible among the unbiased estimators of X. [We need the Cramer-Rao lower bound to explain what this minimum variance can be.]
If we make more and more observations, we look at the estimator xn of X given (Y1, …, Yn): xn = gn(Y1, …, Yn).
We say that xn is asymptotically unbiased if lim E(xn) = E(X); it is asymptotically efficient if its variance becomes close to the minimum possible variance.
One is given the joint distribution of (X, Y) and one must choose an estimate x of X based on Y to minimize E(c(X, x)).
MMSE: If c(X, x) = |X - x |2 and x = g(Y) where g(.) can be an arbitrary function, then the best choice is x = E[X | Y].
LLSE: If c(X, x) = |X - x |2 and x = a + bY, then the best choice is x = E[X] + b{Y – E[Y]) where b = cov(X,Y)/var(Y).
The LLSE has the property that Z = LLSE [X | Y] if and only if E(Z) = E(X) and cov(X – Z, Y) = 0. The interpretation of the last condition is that the error is then uncorrelated with the observation so that we cannot reduce it by adding a linear function of Y.
These results extend to the multivariate cases.
There are many cases where one keeps on making observations. How do we update the estimate? For instance, how do we calculate x’ = LLSE [X | Y, Z] = bY + cZ if one knows x = LLSE [X | Y] = aY? The answer lies in the following observations (we assume all the random variables are zero-mean to simplify the notation):
We want X - bY + cZ ^ {Y, Z}. A picture shows that L[X | Y, Z] = L[X | Y] + k(Z – L[Z|Y]).

We must choose k so that X – L[X | Y, Z] ^ Z.
These ideas lead to Kalman’s filter, a topic covered in EECS226A.
We will see in our discussion of the CLT how to quantify how reliable an estimator is. Roughly, the typical error decreases like 1/Ön where n is the number of observations.
To estimate q given X it may be enough to consider functions of T(X) instead of all the functions of X. This happens certainly if the density of X given T(X) does not depend on q. Indeed, in that case, there is no useful information in X about q than is not already in T(X). In this case, we say that T(X) is a sufficient statistic for (estimating) q. Equivalently, T(X) is a sufficient statistic for (estimating) q if the density fX(x; q) of X given q has the form
fX(x; q) = h(x)g(T(x); q).
For instance, if q is the unknown mean of (given q) i.i.d. N(q, s2) random variables X1, …, Xn, then one can show that X1 + …+ Xn is a sufficient statistics for q. Knowing a sufficient statistics enables us to “compress” observations without loss of relevant information.
Assume that T(X)
is a sufficient statistic for (estimating) q. Then you can verify
that MLE[q| X], MAP[q| X], HT[q| X], and E[q| X]
are all functions of T(X). (By
HT[q| X] we designate the
solution of the hypothesis testing problem.)
Jean Walrand – November 2003 --- INDEX