Level 5 Statistics |

**Statistics**

We use *English* letters for *sample* statistics and *Greek* letters for *population* stats.

So, when referring to a sample, useand *s*, when referring to a population use l and r.

The average or *mean* of a population is l (*mu* a Greek *m*),

the *standard deviation* is r (small *sigma* -- a Greek *s*).

The average of a sample is called *x-bar* -- because of the bar above it.

If asked for the *variance* in either population or sample, it is r^{ }² or s²

.

**Summation Notation**

An uppercase Greek S -- called *Sigma* -- looks like this -- S -- indicates *summation*.

The *indices -- small sub and superscript numbers* -- indicate the range of the summation.

**Example 1:**

The indices tell us where to start and end,

the term *i* or *i²* tells us what to do to the numbers we're summing.

**mean deviation**:

* x _{i}* are the data values, is the mean,

As the words indicate, this is the

the data values and the mean of the data distribution. Notice the absolute value sign.

Since we're summing the

if we took their true values (including the sign), we'd always get 0 since the data

is distributed symmetrically about the mean. That's why we take absolute value.

Here's a normal distribution curve.

As we can see, the ** standard deviation** measures the

In most distributions, the data lies on the interval (l - 3r, l + 3r)

We can use these as estimates of the min and max of a distribution.

Half the data lies on either side of the mean.

**standard deviation**: for a sample; for a population.

**Note:** Never mind the formulas -- use your TI-83 calculator. Enter the data in a list,

then **Stat > Calc > Enter > L _{x}** (list number where data is stored).

It lists everything you need.

Remember -- Greek letters for population -- so use r

**Z-Score or Standard score**

measures how many standard deviations lie between the data value *x*, and the mean l.

Look at the formula.

The top is the interval between the *x-value *and the* mean*.

When we divide by r, we find how many of them fit into this interval.

for a sample; for a population

If **z** is negative, the data value is below the mean.

If **z** is positive, the data value is above the mean.

Using these formulas we can also solve for *x *and/o*r **l*

**Example 2:** Which is the better mark -- 89% in a class with mean = 72, st. dev. = 9 or

89% in a class with mean = 72, st. dev. = 8.5?

**Solution: **We find the **z** value for both. The first is (89 - 72)/ 9 = 1.89

The second is (89 - 72)/ 8.5 = 2

This means that the first mark is 1.89 standard deviations above the mean,

but the second mark is 2 standard deviations above the mean so it is the better mark.

.

**Example 3:** A biologist collects 20 samples of monster moths to record data

about their enormous wing span. She measures in meters and her data

indicate a mean wing span of 2.23 meters with a standard deviation of 0.42 meters.

a) What is the wing span of a monster moth with **z-score** = -0.23?

b) What is the z-score of a monster moth with a 3.07 meter wing span?

c) What is the wing span of a monster moth if it lies 2.79 r 's below the mean?

**Solution:**

a) We know **z** , we want *x*:

Since *x = z(*r*) + **l*, *x* = -0.23(0.42) + 2.23 = 2.13 meters

b)

c) 2.79 r 's below the mean is z-score = - 2.79

Since *x = z(*r*) + **l*, *x* = (- 2.79)(0.42) + 2.23 = 1.06 meters

.

**Bivariate Stats**(2 variables)

**Scatter Plots, Correlation, and Line of Regression**

To get the "**a**" and "**b**" for the line of regression **y = ax + b** ,

enter the 2 lists of data values, then use 2-var stats and **linreg** from the Stats menu on your calculator. If asked to find the value for a different data value, plug this x-value into the equation for the line.

To find the **correlation coefficient r**, use either the formula for the rectangle or use your calculator. Strong correlation comes from values of **r** close to ! 1 .

.

**Hint:** We used to call the line of regression "*the line of best fit* " . The correlation coefficient

**Example 4:**

**Solution:** The answer is C -- B and D have negative slopes so *r* can't be 0.53

A has a steeper slope than C so *r* is closer to 1 than to ½.

Recall that a line with slope = 1 makes a 45° angle with the x-axis.

If you have trouble seeing it, use the edge of your ruler to define the

line that seems to run right up or down the middle of the scatter plot.

(I've shown it in B.)

.

.

**Practice**

1) The data in the tables display the results of the 536 math exam in

Susan's and Andy's classes in different schools. Both students got the same mark

on the exam, however, Andy was accepted at Dawson whereas Susan was not.

Andy said his Z-score was 0.62. Susan felt that with her mark she should've been

accepted, so she did some calculation to justify her claim.

What statistic should she use to convince the admin at Dawson to reconsider?

Susan's Class Marks (25) |
Andy's Class Marks (21) |

56, 59, 60, 61, 62, 63, 65, 65, 66, 67, 68, 70, 70, 71, 71, 72, 73, 74, 75, 76, 79, 80, 81, 83, 85 |
63, 65, 67, 69, 71, 71, 72, 74, 75, 75, 77, 77, 79, 79, 80, 81, 82, 84, 84, 85, 87 |

.

2) Find the value of *a, b, c *and* d*.

x_{i} |
l | r | z |

12 | a |
2.5 | 1.6 |

30 | 26 | b |
1.3 |

c |
12 | 1.2 | -1.6 |

22 | 25 | 3.5 | d |

.

3) Julie, Mark and Karen are in a class of 33 students.

Their teacher gave them this data about their final marks in math 536:

Student | Mark | Z-score |

Julie | 60 | -1.7 |

Mark | 97 | 2 |

Karen | 80 | ? |

Find Karen's Z-score.

.

**Solutions**

1) For Susan's class, the mean = 70%, the standard deviation r = 7.69%.

For Andy's class, l = 76%, the standard deviation r = 6.62

Since Andy's Z-score was 0.62, he got 80% therefore so did Susan.

Susan's Z-score = (80 - 70) / 7.69 = 1.3 -- more than twice Andy's Z-score.

This then is how to get Dawson to reconsider and admit her.

.

2)

a = 8 |
b = 3.08 |
c = 10.08 |
d = - 0.86 |

.

3) Karen's Z-score is 0.3

Use 2 equations in l and r

Find l = 77 and r = 10.

.

.

.