hypothesis tests

Hypothesis Testing

Hypothesis Testing Summary

1. Tests on Means:

Formulating Hypotheses: Ho is what we want to heave, Ha what we want to prove.

Statistical Errors: 2 kinds of errors can be committed with Hypothesis tests:

Type I Error:

Heaving Ho when it is true. (finding guilty when he's innocent)
P (Type I Error) =

, the sum of the tail probabilities.

Type II error:

Not heaving Ho when it is false. (accepting innocent when he's guilty).
P (Type II Error) =

, to find it use " what if " scenario.

Significance Level : the sum of the tail probabilities -- defines the reject Ho zone.

1-tail or 2-tail:

1-tail if Ha statement has < or >, 2-tail if Ha statement has

Critical Values:

find

for 2-tail tests, and

for 1-tail tests.

p-value (observed significance level

Types of Tests:

large sample (n > 30) test on µ, use z for critical values or compare p-value with alpha.

small sample test on µ, use t-values for both 1 and 2 tail tests.

unknown, use raw data and formula to find "s" sample standard deviation.

2. Tests on Proportions:

Tests on a single population proportion.

Using confidence limits instead of z-values to test a proportion.

Formulae for Hypothesis Tests

parameters formulae

tests on Means

tests on proportions

c. i. limits for µ

c. i. limits for p

Hypothesis Testing

Because the only constant in life is change -- and, to survive, businesses have to be aware of the changing trends in the economy, labour force, supply of raw materials and the demand for their products, etc. -- business analysts constantly test the validity and reliability of the data upon which their budgets and business decisions are based. Such tests are called Hypothesis Tests.

Law and Order: Hypothesis Testing Unit

Every hypothesis test begins with two statements labeled Ho, the null hypothesis; and Ha, the alternative hypothesis. Ho is like the origin or starting point – assumptions or values currently held, but seriously in question or in need of review. We run the hypothesis test to find out if things have changed to the point where we should reject or heave Ho. We need to know if the numbers currently in use support our old assumptions or indicate a need for change.

Let's compare a hypothesis test to a criminal trial because we run hypothesis tests when we SUSPECT that our data is out of date, just as we run criminal trials when we SUSPECT that a crime has been committed. In the trial, Ho states that the defendant is innocent (until proven guilty). Ha, the alternative hypothesis, states that the defendant is guilty. The criteria for the test is: if the weight of evidence is beyond a reasonable doubt – we will heave Ho. Until then, we can't even consider accepting Ha – that the defendant is guilty.

Alpha, called the level of significance in a hypothesis test is known as the weight of evidence is beyond a reasonable doubt in the criminal trial. It forms the criteria upon which we make our decision whether to heave Ho or not. It marks the confidence limits and/or critical values for the test. If the sample statistic lies beyond the critical values we must heave Ho in the hypothesis test. In the criminal trial, we heave Ho when the weight of evidence lies beyond a reasonable doubt.

As with hypothesis tests, two types of errors can be committed in the criminal trial. A Type I error – rejecting Ho when it is true – happens when the jury finds the defendant guilty when s/he is innocent. The probability of committing a Type I error in the hypothesis test is -- or the probability of getting a test value in the reject zone. (easy to remember since á and Type I). Stephen Truscott, a Native man who spent decades in prison for a crime he didn't commit is a classic case of a Type I error.

A Type II error – not rejecting Ho when it is false – happens when the jury finds the defendant NOT GUILTY when s/he actually is. In the hypothesis test, we commit a Type II error when our calculated sample statistics give us a test value of z between the critical values, so we cannot reject Ho even though in reality, it is false. The probability of committing a Type II error in the hypothesis test is called Beta, . It is the probability of getting a test value in the do not reject zone, even though, unknown to us, our null hypothesis is false. The O.J. Simpson case is a classic Type II Error in law.

The only differences between the hypothesis test and the criminal trial is that critical values for hypothesis tests are published in math books -- for the criminal trial/tests, they're found in law books. The other important difference is: it's easier to crunch and trust numbers than it is to crunch and trust humans and the evidence they present.

_____________________________

Hypothesis Tests Step by Step:

Step 1: Specify the population value of interest.
This is µ the mean, or p, the population proportion stated in the question.

Step 2: Formulate Ho the null hypothesis.
Ho should always have an equal sign & should be what we want to reject. (innocent)
Formulate Ha, the alternative hypothesis.
Ha should be what we're trying to prove if we have to reject Ho. (guilty)

Step 3: Specify the level of significance .
Given in the question. The probability of the tail(s).

Step 4: Specify the test criteria for rejecting Ho. Draw a Picture!!
use critical value(s) of z or t to define one-tail or 2-tail reject Ho zone(s).

Step 5: Calculate the sample statistic from the data. (add it to the picture)
use formula for z or t , calculate test values to compare with critical value(s).

Calculate

the p-value from the sample statistic. (add it to the picture)
compare p-value to

. If p <

, reject Ho.

Step 6: Make a decision. (based on the picture)
Reject or do not reject Ho.

Step 7: Draw a conclusion.
state what the test results indicate about the hypotheses.

When to use z or t? Use t-distribution on small samples ,
Otherwise n > 30, use z-values.
If , is unknown, calculate and use s from data.

One-Tail or Two?

1-tail test -- (left or right) 2-tail test

if Ha states µ or p < a; it is a lower-tail test (<)
if Ha states µ or p > a; it is an upper-tail test (>)
if Ha states µ or p, a.
because µ or pcould be > a or < a .

______________________________________

Example: The analyst at a company suspects that his telemarketers are making more than last month's average of 15 calls/hour, so he runs a small sample t-test at the 5% level of significance, on 16 of the telemarketers. Here's the results of the test calculations and his conclusion.

Solution: (the value for the test statistic was based on raw data not shown).

Hypothesis Test on Mean (small sample)

Value of interest: µ , mean # of calls/hour

Hypotheses: Ho: l [ 15 Ha: l > 15

Level of significance: = 5 % Criteria: if t > 1.753, reject Ho

Test Statistic: Decision: Since 2 > 1.753, we must reject the null hypothesis Ho

Conclusion: the test indicates that the telemarketers are making more than 15 calls/hour.

Here's the image of the test:

Note how he made Ha state what he was out to prove; ie: the average is more than 15 calls/hour.

__________________________________

Formulating Ho and Ha

Example: An appliance manufacturer is considering the purchase of a new machine for stamping out sheet metal parts. If µ₀ is the average number of good parts stamped out per hour by his old machine and µ is the corresponding average for the new machine, the manufacturer wants to test the null hypothesis µ = µ₀ against a suitable alternative. What should the alternative hypothesis be if:

(a) he doesn't want to buy the new machine unless it's faster than the old one ;

(b) he wants to buy the new machine unless it proves slower than the old one.

Solution:

(a) the alternative hypothesis should be µ > µ₀ so he will buy the new machine only if the null hypothesis must be rejected to indicate that

the new machine is faster

(b) the alternative hypothesis should be

µ < µ₀

so that he

won't buy

the new machine

unless

he must reject the null hypothesis which says

the old machine is slower

than the new one.

Example: The average drying time of a manufacturer's paint is 20 minutes. He is considering modifying the chemical composition of it in order to change the average drying time. He wants to test the null hypothesis µ = 20 minutes against a suitable alternative, where µ is the average drying time of the modified paint. What alternative hypothesis should he use if:

(a) he does not want to modify the paint

unless

decreases

the

drying time

(b) the modified paint is cheaper so

he will do it unless

increases

the

drying time

Solution:

(a) Ha should be

µ < 20

so that he

will modify

the paint

only if

Ho must be rejected to

indicate

that the modification

doesn't decrease the drying time

(b) the alternative hypothesis should be

µ > 20

so he

will modify

the paint

unless

must be

rejected

indicate

that the

modification increases the drying time

________________________________________

Statistical Errors: Type I and Type II errors

Type I error:

Rejecting Ho when it is true

guilty

when

innocent

P(Type I Error) =

Type II error:

Not Rejecting Ho when it is false

not guilty

when

guilty

Beta is not a constant

depends

population

parameter

and the level of significance

	*Truth of H₀*
*Decision*	H₀ is true	H₀ is false
Do not reject H₀	Correct decision	Type II error
Reject H₀	Type I error	Correct decision

p-value or Observed Significance Level

Informally, the p-value is the probability in the tails defined by the calculated test statistic (not the critical values). In a 2-tail test, the p-value is the sum of the probabilities of the 2 tails so it is 2 times the probability of one of the tails.

Decisions Based on the p-value :

If p-value < , we reject Ho. We find the test statistic and use its p-value to make our decision. This approach is used often today because software provides the p-value and it indicates the degree to which the data disagree with the null hypothesis Ho.

Example: An airport's planning commission wants to check the claim that the average time cars spend in the short term parking is 42.5 minutes. They're not sure if this number is too high or too low. They know the standard deviation of this population is 7.6 minutes. They decide that if a sample of 50 parking stubs gives a mean parking time between 40.5 and 44.5 minutes, they will accept that the average time is 42.5 minutes, otherwise, they will reject this claim.

(a) What is the probability they will be committing a Type I error with these values?

Solution: We need the tail probability set up by the interval between 40.5 and 44.5 minutes, so we standardize these values.

We find z for

= 40.5 and

= 44.5 with µ = 42.5 minutes:

since there are 2 tails, the reject zone probability = 2(0.0307) = 0.0614.

The probability they will commit a Type I error is

(b) If they had run the same test at the 5% level of significance using these values, what would they have decided? (heave Ho or not?)

Solution: Using p-value, the rule is: if p-value < , we reject Ho. Since p-value = 6.14% and = 5%, we would not reject Ho.

Solution: Here's the picture now: Note: P( z < – 4) = 0 so we find P (z < – 0.93).

(d) Describe the consequences of committing a Type I and Type II error?

Solution: If we commit a Type I error, we would reject Ho, the claim that µ = 42.5 minutes when in reality, it is true . The mean time is actually equal to 42.5 minutes.

Type II error

not

reject Ho

when

it is false

This illustrates why is not a constant like is. For each choice of lower and upper values for the "do not reject zone", there will be a different value for .
If we need to find a value for , we use an approach like the planning commission did.

The power of a test is the probability of rejecting a false null hypothesis Ho,
at a given level of significance.

When we assume in Ha, the alternative hypothesis, that a specific " what if " value for the test parameter is true, the power of the test = for that particular " what if " value.

In the preceding example, the " what if " value was 45.5 minutes which gave us = 0.1762. The power of this test then is 1 – 0.1762 = 0.8238

We calculate the power of a significance test in three steps:

1. Specify Ha with the alternative or " what if " value we're testing,
Specify the significance level .

2. Find the value(s) of that define the reject Ho zone.

3. Calculate the probability of observing these values for assuming the population parameter is equal to the " what if " value we chose.

Example: An oceanographer wants to test whether the average depth of the ocean in a certain area is 72.4 fathoms. She takes a random sample of 35 depth readings and she knows the standard deviation in the depth is 2.1 fathoms.

(a) If her sample readings give an average of 73.2 fathoms, use the p-value to decide whether she should reject the null hypothesis or not at the 5% level of significance.

Solution: (a)

The value of interest is the average ocean depth in a given area.

Ho: l = 72.4	Ha:
Level of significance: = 5 %	Criteria: if p-value < 5%, reject Ho
Statistic:
p-value: P(z > 2.25) = 0.5000 – 0.4878 = 0.0122 2(0.0122) = 0.0244 or 2.44%	Decision: since 2.44% < 5%, reject Ho
Conclusion: the data doesn't support the claim that the average depth in the area is 72.4 fathoms. It indicates that the average depth in the area is greater than 72.4 fathoms.

(b) Use the same sample statistic and level of significance to run the test with the alternative hypothesis Ha changed to µ = 74 fathoms. Use z-values instead of p-value to find Beta, the probability of committing a Type II error and the power of this test value.

Solution: The value of interest is the average ocean depth in a given area. Using the data listed previously and z = 1.96 for the 5% level of significance establishes the "don't reject" zone between 71.7 fathoms and 73.1 fathoms. The z-value for 73.1 is – 2.54 which means that the probability of the "don't reject" zone is 0.5000 – 0.4945 = 0.0055

So,

, the power of the test is 0.9945 or nearly 1.

__________________________________________________

Hypothesis Tests on a Population Proportion

Requirement: If the sample size n is large enough so that

the sampling distribution of a population proportion is approximately normal, with

population proportion = p,

sample proportion (p-bar) , and

standard error (sigma p-bar)

This means we can use z-values to define the "reject-Ho-zone(s)".

The only differences between the set-up for a test on a mean and a test on a proportion are:

1. the population parameter of interest is the sample proportion not the sample mean.

2. the null and alternative hypotheses are statements about

Example: A drug company wants to test whether it is really true that 20% of the patients who take one of their products suffer side effects from the drug. In a random sample of
150 patients, 42 suffer the side effects.

a) Conduct a 2-tail hypothesis test with alpha = 0.01.

Solution: (a) The test parameter is the proportion of patients who suffer side effects.

Ho: p = 0.20 Ha: (2-tail test)

Level of significance: a = 1 % Criteria: if | z | > 2.575, reject Ho

Statistic:

Conclusion: 2.53 < 2.575, so we cannot reject the null hypothesis that 20% of the patients suffer side effects.

Solution: The test parameter is the proportion of patients who suffer side effects.

Ho: Ha: p > 0.20 (1-tail test)

Level of significance: a = 1 % Criteria: if z > 2.33, reject Ho

Statistic: z = 2.53 as before and 2.53 > 2.33 the critical value for 1-tail of 1%.

Conclusion: we must reject the null hypothesis. The test indicates that more than 20% of the patients suffer side effects.

Solution: We need P( z > 2.53) = 0.5000 – 0.4943 = 0.0057.

Level of Significance versus Level of Confidence

The level of significance in a 2-tail hypothesis test defines the reject Ho zones. It is the sum of the probabilities included in the tails of the distribution. Therefore, the probability of the do not reject Ho zones is or the level of confidence for the interval around the population mean or proportion. This means that the critical values of p can be calculated from the confidence interval limits. Instead of finding critical values using z, we can use .

________________________________________________

Example: In part (a) of the previous example we did a 2-tail test at the 1% level of significance on the null and alternate hypotheses that p = 0.20 or . Let's repeat the test using the confidence interval limits to make our decision about rejecting Ho.

Solution: to find the confidence interval limits for a proportion, we use:

The confidence interval limits are 0.20 ± 2.575(0.032) = 0.20 ± 0.0824.

If the sample proportion falls between 0.1176 and 0.2824, we cannot reject Ho.

Since the sample proportion = 0.28, is in this interval, we cannot reject Ho.

_______________________________

PRACTICE QUESTIONS are in a different file -- click here.

( Statistics MathRoom Index )

	parameters	formulae
tests on Means
tests on proportions
c. i. limits for µ
c. i. limits for p

Hypothesis Test on Mean (small sample)
Value of interest: µ , mean # of calls/hour
Hypotheses: Ho: l [ 15	Ha: l > 15
Level of significance: = 5 %	Criteria: if t > 1.753, reject Ho
Test Statistic:	Decision: Since 2 > 1.753, we must reject the null hypothesis Ho
Conclusion: the test indicates that the telemarketers are making more than 15 calls/hour.

Ho: p = 0.20	Ha: (2-tail test)
Level of significance: a = 1 %	Criteria: if \| z \| > 2.575, reject Ho
Statistic:
Conclusion: 2.53 < 2.575, so we cannot reject the null hypothesis that 20% of the patients suffer side effects.

Ho:	Ha: p > 0.20 (1-tail test)
Level of significance: a = 1 %	Criteria: if z > 2.33, reject Ho
Statistic: z = 2.53 as before and 2.53 > 2.33 the critical value for 1-tail of 1%.
Conclusion: we must reject the null hypothesis. The test indicates that more than 20% of the patients suffer side effects.