Hypothesis Testing

Hypothesis Testing Summary

1. Tests on Means:

Formulating Hypotheses: Ho is what we want to heave, Ha what we want to prove.

Statistical Errors: 2 kinds of errors can be committed with Hypothesis tests:

Type I Error: Heaving Ho when it is true. (finding guilty when he's innocent)
P (Type I Error) =
, the sum of the tail probabilities.
Type II error: Not heaving Ho when it is false. (accepting innocent when he's guilty).
P (Type II Error) =
, to find it use " what if " scenario.

Significance Level : the sum of the tail probabilities -- defines the reject Ho zone.

1-tail or 2-tail: 1-tail if Ha statement has < or >, 2-tail if Ha statement has .
Critical Values: find for 2-tail tests, and for 1-tail tests.
p-value (observed significance level): probability of tail(s) created by the test statistic

Types of Tests:

large sample (n > 30) test on µ, use z for critical values or compare p-value with alpha.
small sample test on µ, use t-values for both 1 and 2 tail tests.
unknown, use raw data and formula to find "s" sample standard deviation.

2. Tests on Proportions:

Tests on a single population proportion.

Using confidence limits instead of z-values to test a proportion.

Formulae for Hypothesis Tests

 parameters formulae tests on Means tests on proportions c. i. limits for µ c. i. limits for p

Hypothesis Testing

Because the only constant in life is change -- and, to survive, businesses have to be aware of the changing trends in the economy, labour force, supply of raw materials and the demand for their products, etc. -- business analysts constantly test the validity and reliability of the data upon which their budgets and business decisions are based. Such tests are called Hypothesis Tests.

Law and Order: Hypothesis Testing Unit

Every hypothesis test begins with two statements labeled Ho, the null hypothesis; and Ha, the alternative hypothesis. Ho is like the origin or starting point – assumptions or values currently held, but seriously in question or in need of review. We run the hypothesis test to find out if things have changed to the point where we should reject or heave Ho. We need to know if the numbers currently in use support our old assumptions or indicate a need for change.

Let's compare a hypothesis test to a criminal trial because we run hypothesis tests when we SUSPECT that our data is out of date, just as we run criminal trials when we SUSPECT that a crime has been committed. In the trial, Ho states that the defendant is innocent (until proven guilty). Ha, the alternative hypothesis, states that the defendant is guilty. The criteria for the test is: if the weight of evidence is beyond a reasonable doubt – we will heave Ho. Until then, we can't even consider accepting Ha – that the defendant is guilty.

Alpha, called the level of significance in a hypothesis test is known as the weight of evidence is beyond a reasonable doubt in the criminal trial. It forms the criteria upon which we make our decision whether to heave Ho or not. It marks the confidence limits and/or critical values for the test. If the sample statistic lies beyond the critical values we must heave Ho in the hypothesis test. In the criminal trial, we heave Ho when the weight of evidence lies beyond a reasonable doubt.

As with hypothesis tests, two types of errors can be committed in the criminal trial. A Type I error – rejecting Ho when it is true – happens when the jury finds the defendant guilty when s/he is innocent. The probability of committing a Type I error in the hypothesis test is -- or the probability of getting a test value in the reject zone. (easy to remember since á and Type I). Stephen Truscott, a Native man who spent decades in prison for a crime he didn't commit is a classic case of a Type I error.

A Type II error – not rejecting Ho when it is false – happens when the jury finds the defendant NOT GUILTY when s/he actually is. In the hypothesis test, we commit a Type II error when our calculated sample statistics give us a test value of z between the critical values, so we cannot reject Ho even though in reality, it is false. The probability of committing a Type II error in the hypothesis test is called Beta, . It is the probability of getting a test value in the do not reject zone, even though, unknown to us, our null hypothesis is false. The O.J. Simpson case is a classic Type II Error in law.

The only differences between the hypothesis test and the criminal trial is that critical values for hypothesis tests are published in math books -- for the criminal trial/tests, they're found in law books. The other important difference is: it's easier to crunch and trust numbers than it is to crunch and trust humans and the evidence they present.

_____________________________

Hypothesis Tests Step by Step:

Step 1: Specify the population value of interest.
This is µ the mean, or p, the population proportion stated in the question.

Step 2: Formulate Ho the null hypothesis.
Ho should always have an equal sign & should be what we want to reject. (innocent)
Formulate Ha, the alternative hypothesis.
Ha should be what we're trying to prove if we have to reject Ho. (guilty)

Step 3: Specify the level of significance .
Given in the question. The probability of the tail(s).

Step 4: Specify the test criteria for rejecting Ho. Draw a Picture!!
use critical value(s) of z or t to define one-tail or
2-tail reject Ho zone(s).

Step 5: Calculate the sample statistic from the data. (add it to the picture)
use formula for z or t ,
calculate test values to compare with critical value(s).

Calculate the p-value from the sample statistic. (add it to the picture)
compare p-value to
. If p < , reject Ho.

Step 6: Make a decision. (based on the picture)
Reject or do not reject Ho.

Step 7: Draw a conclusion.
state what the test results indicate about the hypotheses.

When to use z or t? Use t-distribution on small samples ,
Otherwise n > 30, use z-values.
If ,
is unknown, calculate and use s from data.

One-Tail or Two?

 1-tail test -- (left or right) 2-tail test if Ha states µ or p < a; it is a lower-tail test (<)if Ha states µ or p > a; it is an upper-tail test (>) if Ha states µ or p, a.because µ or p could be > a or < a .

______________________________________

Example: The analyst at a company suspects that his telemarketers are making more than last month's average of 15 calls/hour, so he runs a small sample t-test at the 5% level of significance, on 16 of the telemarketers. Here's the results of the test calculations and his conclusion.

Solution: (the value for the test statistic was based on raw data not shown).

 Hypothesis Test on Mean (small sample) Value of interest: µ , mean # of calls/hour Hypotheses: Ho: l [ 15 Ha: l > 15 Level of significance: = 5 % Criteria: if t > 1.753, reject Ho Test Statistic: Decision: Since 2 > 1.753, we must reject the null hypothesis Ho Conclusion: the test indicates that the telemarketers are making more than 15 calls/hour.

Here's the image of the test:

Note how he made Ha state what he was out to prove; ie: the average is more than 15 calls/hour.

__________________________________

Formulating Ho and Ha

Example: An appliance manufacturer is considering the purchase of a new machine for stamping out sheet metal parts. If µ0 is the average number of good parts stamped out per hour by his old machine and µ is the corresponding average for the new machine, the manufacturer wants to test the null hypothesis µ = µ0 against a suitable alternative. What should the alternative hypothesis be if:

(a) he doesn't want to buy the new machine unless it's faster than the old one ;
(b) he wants to buy the new machine unless it proves slower than the old one.

Solution:

(a) the alternative hypothesis should be µ > µ0 so he will buy the new machine only if the null hypothesis must be rejected to indicate that the new machine is faster.
(b) the alternative hypothesis should be µ < µ0 so that he won't buy the new machine unless he must reject the null hypothesis which says the old machine is slower than the new one.

Example: The average drying time of a manufacturer's paint is 20 minutes. He is considering modifying the chemical composition of it in order to change the average drying time. He wants to test the null hypothesis µ = 20 minutes against a suitable alternative, where µ is the average drying time of the modified paint. What alternative hypothesis should he use if:

(a) he does not want to modify the paint unless it decreases the drying time?
(b) the modified paint is cheaper so he will do it unless it increases the drying time?

Solution:

(a) Ha should be µ < 20 so that he will modify the paint only if Ho must be rejected to indicate that the modification doesn't decrease the drying time.
(b) the alternative hypothesis should be µ > 20 so he will modify the paint unless Ho must be rejected to indicate that the modification increases the drying time.

________________________________________

Statistical Errors: Type I and Type II errors

Type I error:

Rejecting Ho when it is true -- deciding guilty when the defendant is innocent.
Probability of "Reject-Ho-Zone" is so P(Type I Error) = .

Type II error:

Not Rejecting Ho when it is false -- deciding not guilty when the defendant is guilty.
Beta is not a constant. Its value depends on the unknown value of the population parameter we're testing and the level of significance we use. See (c) part in the next example for clarification.

 Truth of H0 Decision H0 is true H0 is false Do not reject H0 Correct decision Type II error Reject H0 Type I error Correct decision

p-value or Observed Significance Level

Informally, the p-value is the probability in the tails defined by the calculated test statistic (not the critical values). In a 2-tail test, the p-value is the sum of the probabilities of the 2 tails so it is 2 times the probability of one of the tails.

Decisions Based on the p-value :

If p-value < , we reject Ho. We find the test statistic and use its p-value to make our decision. This approach is used often today because software provides the p-value and it indicates the degree to which the data disagree with the null hypothesis Ho.

Example: An airport's planning commission wants to check the claim that the average time cars spend in the short term parking is 42.5 minutes. They're not sure if this number is too high or too low. They know the standard deviation of this population is 7.6 minutes. They decide that if a sample of 50 parking stubs gives a mean parking time between 40.5 and 44.5 minutes, they will accept that the average time is 42.5 minutes, otherwise, they will reject this claim.

(a) What is the probability they will be committing a Type I error with these values?

Solution: We need the tail probability set up by the interval between 40.5 and 44.5 minutes, so we standardize these values.

We find z for = 40.5 and = 44.5 with µ = 42.5 minutes:
P ( | z | > 1.87) = 0.5000 – 0.4693 = 0.0307
since there are 2 tails, the reject zone probability = 2(0.0307) = 0.0614.
The probability they will commit a Type I error is 0.0614.

(b) If they had run the same test at the 5% level of significance using these values, what would they have decided? (heave Ho or not?)

Solution: Using p-value, the rule is: if p-value < , we reject Ho. Since p-value = 6.14% and = 5%, we would not reject Ho.

(c) Now suppose the true mean of the parking times is µ = 45.5 minutes. Using the same test set up, (do not reject Ho if 40.5 < < 44.5, find the probability of committing a type II error -- accepting Ho when it is false.

Solution: Here's the picture now: Note: P( z < – 4) = 0 so we find P (z < – 0.93).

(d) Describe the consequences of committing a Type I and Type II error?

Solution: If we commit a Type I error, we would reject Ho, the claim that µ = 42.5 minutes when in reality, it is true . The mean time is actually equal to 42.5 minutes.

If we commit a Type II error, we would not reject Ho, the claim that µ = 42.5 minutes when in reality, it is false. The true mean is less or more than 42.5 minutes.

This illustrates why is not a constant like is. For each choice of lower and upper values for the "do not reject zone", there will be a different value for .
If we need to find a value for , we use an approach like the planning commission did.

The power of a test is the probability of rejecting a false null hypothesis Ho,
at a given level of significance.

When we assume in Ha, the alternative hypothesis, that a specific " what if " value for the test parameter is true, the power of the test = for that particular " what if " value.

In the preceding example, the " what if " value was 45.5 minutes which gave us = 0.1762. The power of this test then is 1 – 0.1762 = 0.8238

We calculate the power of a significance test in three steps:

1. Specify Ha with the alternative or " what if " value we're testing,
Specify the significance level
.

2. Find the value(s) of that define the reject Ho zone.

3. Calculate the probability of observing these values for assuming the population parameter is equal to the " what if " value we chose.

Example: An oceanographer wants to test whether the average depth of the ocean in a certain area is 72.4 fathoms. She takes a random sample of 35 depth readings and she knows the standard deviation in the depth is 2.1 fathoms.

(a) If her sample readings give an average of 73.2 fathoms, use the p-value to decide whether she should reject the null hypothesis or not at the 5% level of significance.

Solution: (a)

The value of interest is the average ocean depth in a given area.

 Ho: l = 72.4 Ha: Level of significance: = 5 % Criteria: if p-value < 5%, reject Ho Statistic: p-value: P(z > 2.25) = 0.5000 – 0.4878 = 0.01222(0.0122) = 0.0244 or 2.44% Decision: since 2.44% < 5%, reject Ho Conclusion: the data doesn't support the claim that the average depth in the area is 72.4 fathoms. It indicates that the average depth in the area is greater than 72.4 fathoms.

(b) Use the same sample statistic and level of significance to run the test with the alternative hypothesis Ha changed to µ = 74 fathoms. Use z-values instead of p-value to find Beta, the probability of committing a Type II error and the power of this test value.

Solution: The value of interest is the average ocean depth in a given area. Using the data listed previously and z = 1.96 for the 5% level of significance establishes the "don't reject" zone between 71.7 fathoms and 73.1 fathoms. The z-value for 73.1 is – 2.54 which means that the probability of the "don't reject" zone is 0.5000 – 0.4945 = 0.0055

So, , the power of the test is 0.9945 or nearly 1.

__________________________________________________

Hypothesis Tests on a Population Proportion

Requirement: If the sample size n is large enough so that

,

the sampling distribution of a population proportion is approximately normal, with

population proportion = p,

sample proportion (p-bar) , and

standard error (sigma p-bar)

This means we can use z-values to define the "reject-Ho-zone(s)".

The only differences between the set-up for a test on a mean and a test on a proportion are:

1. the population parameter of interest is the sample proportion not the sample mean.

2. the null and alternative hypotheses are statements about

Example: A drug company wants to test whether it is really true that 20% of the patients who take one of their products suffer side effects from the drug. In a random sample of
150 patients, 42 suffer the side effects.

a) Conduct a 2-tail hypothesis test with alpha = 0.01.

Solution: (a) The test parameter is the proportion of patients who suffer side effects.

 Ho: p = 0.20 Ha: (2-tail test) Level of significance: a = 1 % Criteria: if | z | > 2.575, reject Ho Statistic: Conclusion: 2.53 < 2.575, so we cannot reject the null hypothesis that 20% of the patients suffer side effects.

b) Conduct a 1-tail hypothesis test with alpha = 0.01 where Ha is that the probability of side effects is greater than 20%.

Solution: The test parameter is the proportion of patients who suffer side effects.

 Ho: Ha: p > 0.20 (1-tail test) Level of significance: a = 1 % Criteria: if z > 2.33, reject Ho Statistic: z = 2.53 as before and 2.53 > 2.33 the critical value for 1-tail of 1%. Conclusion: we must reject the null hypothesis. The test indicates that more than 20% of the patients suffer side effects.

(c) Find the p-value for the 1-tail test in part (b).

Solution: We need P( z > 2.53) = 0.5000 – 0.4943 = 0.0057.

Level of Significance versus Level of Confidence

The level of significance in a 2-tail hypothesis test defines the reject Ho zones. It is the sum of the probabilities included in the tails of the distribution. Therefore, the probability of the do not reject Ho zones is or the level of confidence for the interval around the population mean or proportion. This means that the critical values of p can be calculated from the confidence interval limits. Instead of finding critical values using z, we can use .

________________________________________________

Example: In part (a) of the previous example we did a 2-tail test at the 1% level of significance on the null and alternate hypotheses that p = 0.20 or . Let's repeat the test using the confidence interval limits to make our decision about rejecting Ho.

Solution: to find the confidence interval limits for a proportion, we use:

and since , we'll set up a 99% confidence interval about p = 0.20.

The confidence interval limits are 0.20 ± 2.575(0.032) = 0.20 ± 0.0824.

If the sample proportion falls between 0.1176 and 0.2824, we cannot reject Ho.

Since the sample proportion = 0.28, is in this interval, we cannot reject Ho.

_______________________________