Hypothesis Testing

Hypothesis Testing Summary

1. Tests on Means:

Formulating Hypotheses: Ho is what we want to heave, Ha what we want to prove.

Statistical Errors: 2 kinds of errors can be committed with Hypothesis tests:

Significance Level : the sum of the tail probabilities -- defines the reject Ho zone.

Types of Tests:

2. Tests on Proportions:

Tests on a single population proportion.

Using confidence limits instead of z-values to test a proportion.

Formulae for Hypothesis Tests

  parameters formulae
tests on Means  
tests on proportions    
c. i. limits for µ

c. i. limits for p  

Hypothesis Testing

Because the only constant in life is change -- and, to survive, businesses have to be aware of the changing trends in the economy, labour force, supply of raw materials and the demand for their products, etc. -- business analysts constantly test the validity and reliability of the data upon which their budgets and business decisions are based. Such tests are called Hypothesis Tests.

Law and Order: Hypothesis Testing Unit

Every hypothesis test begins with two statements labeled Ho, the null hypothesis; and Ha, the alternative hypothesis. Ho is like the origin or starting point – assumptions or values currently held, but seriously in question or in need of review. We run the hypothesis test to find out if things have changed to the point where we should reject or heave Ho. We need to know if the numbers currently in use support our old assumptions or indicate a need for change.

Let's compare a hypothesis test to a criminal trial because we run hypothesis tests when we SUSPECT that our data is out of date, just as we run criminal trials when we SUSPECT that a crime has been committed. In the trial, Ho states that the defendant is innocent (until proven guilty). Ha, the alternative hypothesis, states that the defendant is guilty. The criteria for the test is: if the weight of evidence is beyond a reasonable doubt – we will heave Ho. Until then, we can't even consider accepting Ha – that the defendant is guilty.

Alpha, called the level of significance in a hypothesis test is known as the weight of evidence is beyond a reasonable doubt in the criminal trial. It forms the criteria upon which we make our decision whether to heave Ho or not. It marks the confidence limits and/or critical values for the test. If the sample statistic lies beyond the critical values we must heave Ho in the hypothesis test. In the criminal trial, we heave Ho when the weight of evidence lies beyond a reasonable doubt.

As with hypothesis tests, two types of errors can be committed in the criminal trial. A Type I error – rejecting Ho when it is true – happens when the jury finds the defendant guilty when s/he is innocent. The probability of committing a Type I error in the hypothesis test is -- or the probability of getting a test value in the reject zone. (easy to remember since á and Type I). Stephen Truscott, a Native man who spent decades in prison for a crime he didn't commit is a classic case of a Type I error.

A Type II error – not rejecting Ho when it is false – happens when the jury finds the defendant NOT GUILTY when s/he actually is. In the hypothesis test, we commit a Type II error when our calculated sample statistics give us a test value of z between the critical values, so we cannot reject Ho even though in reality, it is false. The probability of committing a Type II error in the hypothesis test is called Beta, . It is the probability of getting a test value in the do not reject zone, even though, unknown to us, our null hypothesis is false. The O.J. Simpson case is a classic Type II Error in law.

The only differences between the hypothesis test and the criminal trial is that critical values for hypothesis tests are published in math books -- for the criminal trial/tests, they're found in law books. The other important difference is: it's easier to crunch and trust numbers than it is to crunch and trust humans and the evidence they present.


Hypothesis Tests Step by Step:

Step 1: Specify the population value of interest.
This is µ the mean, or p, the population proportion stated in the question.

Step 2: Formulate Ho the null hypothesis.
Ho should always have an equal sign & should be what we want to reject. (innocent)
Formulate Ha, the alternative hypothesis.
Ha should be what we're trying to prove if we have to reject Ho. (guilty)

Step 3: Specify the level of significance .
Given in the question. The probability of the tail(s).

Step 4: Specify the test criteria for rejecting Ho. Draw a Picture!!
use critical value(s) of z or t to define one-tail or
2-tail reject Ho zone(s).

Step 5: Calculate the sample statistic from the data. (add it to the picture)
use formula for z or t ,
calculate test values to compare with critical value(s).

Step 6: Make a decision. (based on the picture)
Reject or do not reject Ho.

Step 7: Draw a conclusion.
state what the test results indicate about the hypotheses.

When to use z or t? Use t-distribution on small samples ,
Otherwise n > 30, use z-values.
If ,
is unknown, calculate and use s from data.

One-Tail or Two?

1-tail test -- (left or right) 2-tail test
if Ha states µ or p < a; it is a lower-tail test (<)

if Ha states µ or p > a; it is an upper-tail test (>)

if Ha states µ or p, a.

because µ or p could be > a or < a .


Example: The analyst at a company suspects that his telemarketers are making more than last month's average of 15 calls/hour, so he runs a small sample t-test at the 5% level of significance, on 16 of the telemarketers. Here's the results of the test calculations and his conclusion.

Solution: (the value for the test statistic was based on raw data not shown).

Hypothesis Test on Mean (small sample)
Value of interest: µ , mean # of calls/hour  
Hypotheses: Ho: l [ 15 Ha: l > 15
Level of significance: = 5 % Criteria: if t > 1.753, reject Ho
Test Statistic: Decision: Since 2 > 1.753, we must reject the null hypothesis Ho
Conclusion: the test indicates that the telemarketers are making more than 15 calls/hour.

Here's the image of the test:

Note how he made Ha state what he was out to prove; ie: the average is more than 15 calls/hour.


Formulating Ho and Ha

Example: An appliance manufacturer is considering the purchase of a new machine for stamping out sheet metal parts. If µ0 is the average number of good parts stamped out per hour by his old machine and µ is the corresponding average for the new machine, the manufacturer wants to test the null hypothesis µ = µ0 against a suitable alternative. What should the alternative hypothesis be if:


Example: The average drying time of a manufacturer's paint is 20 minutes. He is considering modifying the chemical composition of it in order to change the average drying time. He wants to test the null hypothesis µ = 20 minutes against a suitable alternative, where µ is the average drying time of the modified paint. What alternative hypothesis should he use if:



Statistical Errors: Type I and Type II errors

Type I error:

Type II error:

  Truth of H0
Decision H0 is true H0 is false
Do not reject H0 Correct decision Type II error
Reject H0 Type I error Correct decision

p-value or Observed Significance Level

Informally, the p-value is the probability in the tails defined by the calculated test statistic (not the critical values). In a 2-tail test, the p-value is the sum of the probabilities of the 2 tails so it is 2 times the probability of one of the tails.

Decisions Based on the p-value :

If p-value < , we reject Ho. We find the test statistic and use its p-value to make our decision. This approach is used often today because software provides the p-value and it indicates the degree to which the data disagree with the null hypothesis Ho.

Example: An airport's planning commission wants to check the claim that the average time cars spend in the short term parking is 42.5 minutes. They're not sure if this number is too high or too low. They know the standard deviation of this population is 7.6 minutes. They decide that if a sample of 50 parking stubs gives a mean parking time between 40.5 and 44.5 minutes, they will accept that the average time is 42.5 minutes, otherwise, they will reject this claim.

Solution: We need the tail probability set up by the interval between 40.5 and 44.5 minutes, so we standardize these values.

Solution: Using p-value, the rule is: if p-value < , we reject Ho. Since p-value = 6.14% and = 5%, we would not reject Ho.

Solution: Here's the picture now: Note: P( z < – 4) = 0 so we find P (z < – 0.93).

Solution: If we commit a Type I error, we would reject Ho, the claim that µ = 42.5 minutes when in reality, it is true . The mean time is actually equal to 42.5 minutes.

This illustrates why is not a constant like is. For each choice of lower and upper values for the "do not reject zone", there will be a different value for .
If we need to find a value for , we use an approach like the planning commission did.

The power of a test is the probability of rejecting a false null hypothesis Ho,
at a given level of significance.

When we assume in Ha, the alternative hypothesis, that a specific " what if " value for the test parameter is true, the power of the test = for that particular " what if " value.

In the preceding example, the " what if " value was 45.5 minutes which gave us = 0.1762. The power of this test then is 1 – 0.1762 = 0.8238

We calculate the power of a significance test in three steps:

1. Specify Ha with the alternative or " what if " value we're testing,
Specify the significance level

2. Find the value(s) of that define the reject Ho zone.

3. Calculate the probability of observing these values for assuming the population parameter is equal to the " what if " value we chose.

Example: An oceanographer wants to test whether the average depth of the ocean in a certain area is 72.4 fathoms. She takes a random sample of 35 depth readings and she knows the standard deviation in the depth is 2.1 fathoms.

(a) If her sample readings give an average of 73.2 fathoms, use the p-value to decide whether she should reject the null hypothesis or not at the 5% level of significance.

Solution: (a)

The value of interest is the average ocean depth in a given area.

Ho: l = 72.4 Ha:
Level of significance: = 5 % Criteria: if p-value < 5%, reject Ho
p-value: P(z > 2.25) = 0.5000 – 0.4878 = 0.0122

2(0.0122) = 0.0244 or 2.44%

Decision: since 2.44% < 5%, reject Ho
Conclusion: the data doesn't support the claim that the average depth in the area is 72.4 fathoms. It indicates that the average depth in the area is greater than 72.4 fathoms.

(b) Use the same sample statistic and level of significance to run the test with the alternative hypothesis Ha changed to µ = 74 fathoms. Use z-values instead of p-value to find Beta, the probability of committing a Type II error and the power of this test value.

Solution: The value of interest is the average ocean depth in a given area. Using the data listed previously and z = 1.96 for the 5% level of significance establishes the "don't reject" zone between 71.7 fathoms and 73.1 fathoms. The z-value for 73.1 is – 2.54 which means that the probability of the "don't reject" zone is 0.5000 – 0.4945 = 0.0055


Hypothesis Tests on a Population Proportion

Requirement: If the sample size n is large enough so that


the sampling distribution of a population proportion is approximately normal, with

population proportion = p,

sample proportion (p-bar) , and

standard error (sigma p-bar)

This means we can use z-values to define the "reject-Ho-zone(s)".

The only differences between the set-up for a test on a mean and a test on a proportion are:

1. the population parameter of interest is the sample proportion not the sample mean.

2. the null and alternative hypotheses are statements about

Example: A drug company wants to test whether it is really true that 20% of the patients who take one of their products suffer side effects from the drug. In a random sample of
150 patients, 42 suffer the side effects.

Solution: (a) The test parameter is the proportion of patients who suffer side effects.

Ho: p = 0.20 Ha: (2-tail test)
Level of significance: a = 1 % Criteria: if | z | > 2.575, reject Ho
Conclusion: 2.53 < 2.575, so we cannot reject the null hypothesis that 20% of the patients suffer side effects.

Solution: The test parameter is the proportion of patients who suffer side effects.

Ho: Ha: p > 0.20 (1-tail test)
Level of significance: a = 1 % Criteria: if z > 2.33, reject Ho
Statistic: z = 2.53 as before and 2.53 > 2.33 the critical value for 1-tail of 1%.
Conclusion: we must reject the null hypothesis. The test indicates that more than 20% of the patients suffer side effects.

Solution: We need P( z > 2.53) = 0.5000 – 0.4943 = 0.0057.

Level of Significance versus Level of Confidence

The level of significance in a 2-tail hypothesis test defines the reject Ho zones. It is the sum of the probabilities included in the tails of the distribution. Therefore, the probability of the do not reject Ho zones is or the level of confidence for the interval around the population mean or proportion. This means that the critical values of p can be calculated from the confidence interval limits. Instead of finding critical values using z, we can use .


Example: In part (a) of the previous example we did a 2-tail test at the 1% level of significance on the null and alternate hypotheses that p = 0.20 or . Let's repeat the test using the confidence interval limits to make our decision about rejecting Ho.

Solution: to find the confidence interval limits for a proportion, we use:

The confidence interval limits are 0.20 ± 2.575(0.032) = 0.20 ± 0.0824.

If the sample proportion falls between 0.1176 and 0.2824, we cannot reject Ho.

Since the sample proportion = 0.28, is in this interval, we cannot reject Ho.


PRACTICE QUESTIONS are in a different file -- click here.

( Statistics MathRoom Index )

MathRoom Door

(all content of the MathRoom Lessons © Tammy the Tutor; 2004 - ).