536 stats

Level 5 Statistics

Statistics

**** NOTE: not everyone has the same fonts, so if you see "r" and/or "l" in formulas,
change "r" to sigma and change "l" to mu (µ)

We use English letters for sample statistics and Greek letters for population stats.

So, when referring to a sample, use and s, when referring to a population use µ and r.

The average or mean of a population is µ (mu a Greek m),
the standard deviation is r (small sigma -- a Greek s).

The average of a sample is called x-bar -- because of the bar above it.

If asked for the variance in either population or sample, it is r² or s²

Summation Notation

An uppercase Greek S -- called Sigma -- looks like this -- S -- indicates summation.
The indices -- small sub and superscript numbers -- indicate the range of the summation.

Example 1:

The indices tell us the range of values, the i or i² tells us what to do to the values in the sum.

mean deviation:

x_i are the data values, is the mean, n = # of data items
As the words indicate, this is the average of the differences between the data values and the mean of the data distribution. Notice the absolute value sign. Since we're summing the differences between the data values and the mean, if we took their true values (including the sign), we'd always get 0 since the data is distributed symmetrically about the mean. That's why we take absolute value.

Here's a normal distribution curve.

As we can see, the standard deviation measures the spread of the distribution.
In most distributions, the data lies on the interval (l – 3r, l + 3r)
We can use these as estimates of the min and max of a distribution.
Half the data lies on either side of the mean.

standard deviation: for a sample;

for a population.

Note: Never mind the formulas -- use your TI-83 calculator. Enter the data in a list,
then Stat > Calc > Enter > L_x (list number where data is stored).
It lists everything you need.
Remember -- Greek letters for population -- so use µ_x if your data is a population.

Z-Score or Standard score

measures how many standard deviations lie between the data value x, and the mean µ.
Look at the formula.
The numerator measures the distance between the data value x and the mean.
Then we divide by r to find how many of them fit into this interval.

for a sample; for a population

If z is negative, the data value is below and left of the mean.
If z is positive, the data value is above and right of the mean.

These formulas can also be used to solve for

Example 2: Which is the better mark -- 89% in a class with mean = 72, st. dev. = 9 or
89% in a class with mean = 72, st. dev. = 8.5?

Solution: We find the z value for both. The first is (89 – 72)/ 9 = 1.89
The second is (89 – 72)/ 8.5 = 2
This means that the first mark is 1.89 standard deviations above the mean,
but the second mark is 2 standard deviations above the mean so it is the better mark.

Example 3: A biologist collects 20 samples of monster moths to record data about their enormous wing span. She measures in meters and her data indicate a mean wing span of
2.23 meters with a standard deviation of 0.42 meters.

a) What is the wing span of a monster moth with z-score = – 0.23?
b) What is the z-score of a monster moth with a 3.07 meter wing span?
c) What is the wing span of a monster moth if it lies 2.79 r 's below the mean?

Solution:

a) We know z , we want x:
Since x = z(r) + l, x = – 0.23(0.42) + 2.23 = 2.13 meters

c) 2.79 r 's below the mean is z-score = – 2.79
Since x = z(r) + l, x = (– 2.79)(0.42) + 2.23 = 1.06 meters

Bivariate Stats(2 variables)

Scatter Plots, Correlation, and Line of Regression

To get the "a" and "b" for the line of regression y = ax + b ,
enter the 2 lists of data values, then use 2-var stats and linreg from the Stats menu on your calculator. If asked to find the value for a different data value, plug this x-value into the equation for the line.

To find the correlation coefficient r, use either the formula for the rectangle or use your calculator. Strong correlation comes from values of r close to ! 1 .

Hint: We used to call the line of regression "the line of best fit " . The correlation coefficient r is related to the slope of the line so estimate the slope of the line from the scatter plot. Small slope (0.15, – 0.02) means low correlation, slope close to ! 1 means strong correlation. Pay attention to the sign of the slope (positive or negative). If the line leans to the left, the slope is < 0.

Example 4:

Solution: The answer is C -- B and D have negative slopes so r can't be 0.53
A has a steeper slope than C so r is closer to 1 than to ½.
Recall that a line with slope = 1 makes a 45° angle with the x-axis.
If you have trouble seeing it, use the edge of your ruler to define the line that seems to run right up or down the middle of the scatter plot. (It's shown it in B.)

Practice

1) The data in the tables display the results of the 536 math exam in Susan's and Andy's classes in different schools. Both students got the same mark on the exam, however, Andy was accepted at Dawson whereas Susan was not. Andy said his Z-score was 0.62. Susan felt that with her mark she should've been accepted, so she did some calculation to justify her claim.
a) What statistic should she use to convince the admin at Dawson to reconsider?

b) Standardize her mark and decide whether she is right.

Susan's Class Marks (25) Andy's Class Marks (21)

56, 59, 60, 61, 62, 63, 65, 65,
66, 67, 68, 70, 70, 71, 71, 72, 73,
74, 75, 76, 79, 80, 81, 83, 85
63, 65, 67, 69, 71, 71, 72,
74, 75, 75, 77, 77, 79, 79,
80, 81, 82, 84, 84, 85, 87

2) Find the value of a, b, c and d.

x_i l r z

12 a 2.5 1.6

30 26 b 1.3

c 12 1.2 -1.6

22 25 3.5 d

3) Julie, Mark and Karen are in a class of 33 students. Their teacher gave them this data about their final marks in math:

Student Mark Z-score

Julie 60 – 1.7

Mark 97 2

Karen 80 ?

Find Karen's Z-score.

Solutions

1) a) Susan should find her z-score -- she should "normalize" her grade.

b) For Susan's class, the mean = 70%, the standard deviation r = 7.69%.
For Andy's class, l = 76%, the standard deviation r = 6.62
Since Andy's Z-score was 0.62, he got 80% therefore so did Susan.
Susan's Z-score = (80 – 70) / 7.69 = 1.3 -- more than twice Andy's Z-score.
This then is how to get Dawson to reconsider and admit her.

2)

a = 8 b = 3.08 c = 10.08 d = - 0.86

3) Karen's Z-score is 0.3
Use 2 equations in l and r
Find l = 77 and r = 10.

Freebies Index