|
EECS
126 - Probability and Random Processes - J.
Walrand |
DETECTION AND
HYPOTHESIS TESTING
·
Bayesian
·
Maximum Likelihood
Estimation
The detection problem is roughly as follows. We want to guess which of finitely many possible causes produced an observed effect. For instance, you have a fever (observed effect); do you think you have the flu or a cold or the malaria? As another example, you observe some strange shape on an X-ray; is it a cancer or some infection of the tissues? A receiver gets a particular waveform; did the transmitter send the bit 0 or the bit 1? (Hypothesis testing is similar.) As you can see, these problems are prevalent in applications.
There are two basic formulations: either we know the prior probabilities of the possible causes (Bayesian) or we do not (non-Bayesian). When we do not, we can look for the maximum likelihood detection or we can formulate a hypothesis-testing problem.
Assume that X takes values in a finite set {0, …, M}. We know the conditional density or distribution of Y given {X = x}. We want to choose Z in {0, …, N} on the basis of Y to minimize E(c(X, Z)) where c(.) is nonnegative.
Since E(c(X, Z)) = E(E[c(X, Z)|Y]), we should choose Z = g(Y) where g(y) = arg min z E[c(X, z)| Y = y].
A particular example is when M = N and c(m, n) = 1{m ¹ n}. In that case, g(y) = arg max x P[X = x | Y = y], i.e.,
g(y) = MAP[X | Y = y], the maximum a posteriori estimate of X given {Y = y), the most likely value of X given {Y = y}.
The common criticism of this formulation is that in many cases the prior distribution is not known at all. For instance, consider designing a burglar alarm for your house. What prior probability should you use? You suspect a “garbage in, garbage out” effect here and you are correct.
Instead of choosing the value of X that is most likely given the observation, one can choose the value of X that makes the observation most likely. That is, one can choose arg max x P[Y = y | X = x]. This estimator is called the maximum likelihood estimator of X given {Y = y}, or MLE[X | Y = y].
Note that MLE[X | Y = y] = MAP[X | Y = y] when the prior distribution of X is uniform. Note also that MLE[X | Y = y] can be calculated without knowing the prior distribution of X. Finally, a deeper property of the MLE is that under weak assumptions it tends to be a good estimator (asymptotically efficient).
1. Simple Hypothesis
H0: X = 0 or H1: X = 1. Should one reject H0 on the basis of the observation Y?
One is given the distribution of the observation Y given X. The problem is to choose Z = g(Y) to minimize the probability of missed detection P[Z = 0 | X = 1] subject to a bound on the probability of false alarm: P[Z = 1|X= 0] £ a. (Think of {X = 1} = “burglar” and {X = 0} = “no burglar”.)
1.1. Discrete Case
Given X, Y has a known p.m.f. P{Y = y | X]. Let L(y) = P[Y = y|X=1]/ P[Y = y|X=0] (the likelihood ratio).
The solution is randomized:
Z = 1 if L(y) > b,
Z = 0 if L(y) < b, and
Z = 1 w.p. g and Z = 0 w.p.1 - g if L(y) = b.
The values of b and g are selected so that P[Z = 1|X= 0] = a.
1.2. Continuous Case
Given X, Y has a known p.d.f. fY|X[y|x]. Let L(y) = fY|X[y|1] / fY|X[y|0] (the likelihood ratio).
Then
Z = 1 if L(y) > b, and
Z = 0 if L(y) < b.
The values of b is selected so that P[Z = 1|X= 0] = a.
It may happen that L(y) is increasing in y. In such a case (assuming continuous), the solution becomes
Z = 1 if y > y0, and
Z = 0 if y < y0.
The values of y0 is selected so that P[Z = 1|X= 0] = a.
1.3. Useful Remark
It may happen that L(y) is increasing in y. In such a case (assuming continuous), the solution becomes
Z = 1 if y > y0, and
Z = 0 if y < y0.
The values of y0 is selected so that P[Z = 1|X= 0] = a.
1.4. Example 1: If X = k, Y is exponentially distributed with mean m(k), for k = 0, 1 where 0 < m(0) < m(1). Here, fY|X[y|x] = l(x)exp{- l(x)y} where l(x) = m(x)-1 for x = 0, 1. Thus Z = 1{Y > y0} where y0 is such that P[Z = 1|X= 0] = a. That is, exp{- l(0)y0} = a, or y0 = - ln(a)/l(0).
1.5. Example 2: If X = k, Y is Gaussian with mean m(k) and variance 1, for k = 0, 1 where 0 < m(0) < m(1). Here, fY|X[y|x] = Kexp{-(x - m(k))2/2} where K = 1/(2p)1/2. Accordingly, L(y) = B exp{x(m(1) - m(0))} where B = exp{(m(0)2 - m(1)2)/2}. Thus Z = 1{Y > y0} where y0 is such that P[Z = 1|X= 0] = a. That is, P(N(m(0), 1) > y0) = a.
1.6. Proof of the Neyman-Pearson Theorem (results 1.1 and 1.2)
Let Z be as specified by the result and let V be another random variable based on Y, possibly with randomization, and such that P[V = 1|X = 0] £ a. We want to show that P[Z = 0|X = 1] £ P[V = 0|X = 1]. Note that {b - L(Y)}{Z - V} £ 0, so that L(Y){Z - V} ³ b{Z - V}. Taking E[.|X = 0] and observing that E[WL|X=0] = E[W|X=1], we find P[Z = 1|X = 1] - P[V = 1|X = 1] ³ b{P[Z = 1|X = 0] - P[V = 1|X = 0]} ³ 0, which proves the result.
2. Composite Hypotheses
2.1. Example 1
Consider once again the examples in 1.4 and 1.5. Note that the optimal decision Z does not depend on the value of m(1). Consequently, the optimal decision would be the same if the two hypotheses were
H0: m = m(0)
H1: m > m(0).
The hypothesis H1 is called a composite hypothesis because it does not specify a unique value of the parameter to be tested.
2.2. Example 2
Consider once again the examples in 1.4 and 1.5 but with the hypotheses
H0: m £ m(0)
H1: m > m(0).
Both hypotheses H0 and H1 are composite. We claim that the optimal decision Z is still the same as in the original simple hypotheses case. To see that, observe that P[Z = 1| m] = P[Y > y0| m] £ P[Y < y0| m(0)] = a, so that our decision meets the condition that P[Z = 1| H0] £ a, and it minimizes P[Z=0| H1] subject to that requirement.
2.3. Example 3
Both 2.1 and 2.2 consider one-sided tests where the values of the parameter m under H1are all larger than those permitted under H0. What about a two-sided test with
H0: m = m(0)
H1: m ¹ m(0)?
More generally, one might consider a test with
H0: m Î A
H1: m Î B
where A and B are two disjoint sets.
In general, optimal tests for such situations do not exist and one resorts to approximations. We saw earlier that the optimal decisions for a simple hypothesis test is based on the value of the likelihood ratio L(y), which is the ratio of the densities of y under the two hypothesis X = 1 and X = 0, respectively. One might then try to extend this test by replacing L(y) by the ratio of the two densities under H1and H0, respectively. How do we define the density under the hypothesis H1 "m Î B"? One idea is to calculate the MLE of m given Y and H1, and similarly for H0. This approach works well under some situations. However, the details would carry us a bit to far. Interested students will find expositions of these methods in any good statistics book. Look for the keywords "likelihood ratio test, goodness-of-fit test".
More Examples of HT
1. Is a coin biased? You flip a coin 100 times and count the number Y of heads. You must decide whether the coin is fair (X = 0) of biased, say with P(H) = 0.6. The goal is to minimize the probability of missed detection P[fair|biased] subject to a false alarm probability P[biased|fair] £ b.
Solution: You can verify that Remark 1.3 holds. Thus, the best decision is Z = 1 (biased) if Y > y(0), Z = 0 (fair) if Y < y(0), and Z = 1 w.p. g if Y = y(0). You must choose y(0) and g so that P[Z = 1| fair] = b. Let us plot P[Y = y | fair]:

The figure illustrates P[Z = 1| fair] where "fair" = {X = 0}. For b = 0.001, one finds (using a calculator) y(0) = 66; for b = 0.01, y(0) = 63.
One finds also that P[Y >= 58 | fair] = 0.066 and P[Y >= 59 | fair] = 0.043; accordingly, if b = 0.05 one should decide Z = 1 w.p. 1 if Y >= 59 and Z = 1 w.p. 0.3 if Y = 58. Indeed, in that case, P[Z = 1 | fair ] = P[Y >= 59 | fair] + 0.3 P[Y = 58 | fair] = 0.043 + 0.3(0.066 - 0.043) = 0.05.
2. More examples coming soon.
Jean Walrand – October 2003 --- INDEX