MEASURES OF POSITION

Note: TI-83 calculator notes at the end of this lesson.

Quartiles, Quintiles and Percentiles

We use measures of position or rank when we have to compare different data values in a unique or a number of sample or population distributions. What we do is section the distribution into sets that include a specified percentage of the data values. Then it becomes a simple task to evaluate the relative strengths and positions of specific data values.

The 3 most common measures of position are:

Quartiles: section the distribution into 4 approximately equal quarters (25%)

Quintiles: section the distribution into 5 approximately equal fifths (20%)

Percentiles: section the distribution into 100 approximately equal hudredths (1%)

Important Note: Quartiles and Percentiles are arranged from lowest or worst results to highest or best -- in other words -- they are arranged in increasing or ascending order. Quintiles are ass backwards. The best results are in the first quintile.

diagram

Notice how the 4th quartile includes the 1st quintile also the 75th to 99th percentile.

The median Q 2 is in the 3rd quintile
Q 1 is in the 4th quintile, and Q 3 is in the 2nd quintile.

Finding Quartiles:

Note: repeated data values must be in the same quartile -- this is one of the reasons the quartiles aren't always equal. The other reason is that we don't always get an integer quotient when we divide by 4. If there are an odd number of data values, with lots of repeats, we have to make the quartiles APPROXIMATELY EQUAL, respecting the fact that repeats must be in the same quartile.

We always start by finding Q 2 ; the median or middle value. First, we arrange the data in ascending order to find the middle number.

With an odd number of data values, there is a MIDDLE VALUE so it is Q 2 , the median.
With an even number of data values, Q 2 is the mean of the two middle values in the distribution.

Example: Find the median:

2 4 7 8 10 11 15 16 19

Since there are 9 (odd) data values, the median is the 5th one -- Q 2 = 10.

Example: Find the median:

4 7 8 10 11 15 16 19

Now we have 8 (even) data values, the median is the mean of the 4th and 5th one Q 2 = 10.5.

The median is the data value in the position.

When n is odd, (n + 1) / 2 is an integer, so Q 2 = the data value in that position.
When n is even, (n + 1) / 2 is half-way between 2 integers, so Q 2 = mean of these 2 data values.

Q 1 and Q 3 are called hinges.
Q 1 is the lower or left hinge and Q 3 is the upper or right hinge.

Q 1 is the median of the first half of data distribution
and Q 3 is the median of the second half.

All these values can be obtained from the TI-83 calculator STAT functions.

(TI-83 calculator notes)

Quartile Data Display: Box and Whisker Plots

One of the simplest ways to display a distribution of data is a box-and-whisker plot. We use 5 data values for this display: the minimum, maximum, and the 3 Quartile values that section the data into 4 approximately equal groups or quarters (25%). The "box" which stretches from Q 1 to Q 3 includes Q 2 , the median or middle value and the central half of the data distribution. Q 1 and Q 3 are the medians of the first and second halves of the distribution respectively.

Q 1 – Q 3 is called the Inter Quartile Range.
It tells us the spread or range of the central half of the distribution.

Example: Draw a simple box and whisker plot for this data:

3.9 4.1 4.2 4.3 4.3 4.4 4.4 4.4 4.4 4.5 4.5 4.6 4.7 4.8 4.9 5.0 5.1

There are 17 (odd number) data values so the median is the ninth value: Q2 = 4.4

The 4.4 in the middle of the list is Q 2, so we can't re-use it. Now the 2 halves of the data set are:

3.9, 4.1, 4.2, 4.3, 4.3, 4.4, 4.4, 4.4 and 4.5, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1

With 8 values in the first ½, the median is the mean of the middle 2 : Q1 = (4.3 + 4.3)/2 = 4.3

The median of the second half is: Q3 = (4.7 + 4.8)/2 = 4.75

Here then is a simple box and whisker plot of this data:

Notice that the second ¼ of the data values lie between Q1 = 4.3 and Q2 = 4.4.

In addition to the line inside the box that marks the median, the box-and-whisker plot may include a cross or an "x" that marks the mean of the data.

The difference between the mean and the median is used to measure the "skewness" of the distribution. Skewness left or right measures the concentration of the data. If the mean and median are close together, the distribution is centered on them. If they're far apart, the distribution is skewed left or right.

The mean of this data = 4.5
Since mean > median, this data is skewed slightly right.

Outliers:

An outlier is a data value that lies too far from the central values to be reasonable. Any data point that lies beyond 1½ times the length of the box, above or below it, is considered an outlier. The Lower Bound = Q1 – (1.5×IQR) and the Upper Bound = Q3 + (1.5×IQR)

Example: Find the outliers, if any, for these 15 data values:

10.2, 14.1, 14.4. 14.4, 14.4, 14.5, 14.5, 14.6, 14.7, 14.7, 14.7, 14.9, 15.1, 15.9, 16.4

Solution:

First, we find the IQR. There are 15 data points, so the median is at position (15 + 1) ÷ 2 = 8. Then Q2 = 14.6. Now, there are 7 data points on either side of the median, so Q1 is the 4th value in the list and Q3 is the 12th:

Q1 = 14.4 and Q3 = 14.9. Then IQR = 14.9 – 14.4 = 0. 5.

Outliers will be any points that lie below Q1 – 1.5 (IQR) or above Q3 + 1.5 (IQR)

1.5 × 0.5 = 0.75, therefore

Q1 – 1.5 (IQR) = 14.4 – 0.75 = 13.65 and Q3 + 1.5 (IQR)= 14.9 + 0.75 = 15.65.

Since 10.2 < 13.65, and (15.9 and 16.4) > 15.65,
The outliers are 10.2, 15.9, and 16.4.

Notice that we have 2 outliers above the median/mean and only one below. This was indicated by the fact that the data is skewed towards the right. Notice also that the range of the distribution including the outliers is 16.4 – 10.2 = 6.2. However, the IQR = 0.5 and since this is the range of the middle half of the distribution, a distribution centered on its mean and median should have a range of 2 × 0.5 = 1. Once we eliminate the outliers we have 14.1 for the minimum and 15.1 for the maximum, making the range of the distribution = 1 or 2 × IQR.
A box-and-whisker plot of this data will show 14.1 as the minimum and 15.1 as the maximum.

Notation:

Because quintiles and percentiles assign a RANK to a data value, we use R 5 and R 100 to indicate quintile and percentile values. To indicate that the quintile rank of some data item is 4 we write either R 5 / 4 or R 5 = 4. We use similar notation for percentiles.

Quintiles:(R5)

With Quintiles, we section the distribution into 5 approximately equal fifths (20%) that are ordered ass backwards. The best results are in the first quintile. We must be careful when we order our data since the best results for certain activities are the lowest values. For instance, if we're dealing with golf scores and race times, the best results are the smallest values -- whereas with sales figures or the number of students who pass their exams, the best results are the biggest values. When we find quintiles, it is best to order the data in descending order according to the type of data in the question.

Like quartiles, we must be sure that repeats share the same quintile and since we don't always get an integer when we divide the number of data by 5, there are approximately 20% of the data in each quintile.

Example: Section these 15 data values into quintiles:

10.2, 14.1, 14.4. 14.4, 14.4, 14.5, 14.5, 14.6, 14.7, 14.7, 14.7, 14.9, 15.1, 15.9, 16.4

Solution:

Since the data is arranged in ascending order, we'll start from the right end and work our way towards the left. We'd like to put 15 ÷ 5 or 3 data items in each quintile, however, the repeats make that impossible. Here's how this distribution is best sectioned into quintiles:

1st Quintile 2nd Quintile 3rd Quintile 4th Quintile 5th Quintile
16.4, 15.9, 15.1 14.9, 14.7, 14.7, 14.7, 14.6, 14.5, 14.5, 14.4. 14.4, 14.4 14.1, 10.2,

Notice that both the mean and the median fall in the 3rd quintile (the middle),
and the hinges
Q 1 and Q 3 are in the 2nd and 4th quintiles respectively.

Percentiles: (R100)

The percentile rank tells us what percent of the data values are less than or equal to the item in question. Each percentile therefore includes 1% of the data. Percentiles are indicated in the diagram by the axis at the bottom.

If Harry's math mark puts him in the 78th percentile, we know he did as well as or better than 78% of his classmates.

Finding Percentile for a Data Value:

When we have the raw data of the distribution and we're asked to find the percetile rank for a specific item, here's how we proceed:

Example: Find the percentile for the data item in red.

Solution:

a) there are 20 data in all of which 32 is the 12th so R 100 = 12/20 × 100 = 60th percentile.

b) since we must count any data items less then or equal to x, we must count the 3rd - 61 so though the red 61 is the 11th item in the list, there are still 12 values less than or equal to 61.
Now we have 18 items in total, so R 100 = 12/18 × 100 = 66.67 or the 66th percentile.

When finding percentile, round up to the next integer for any decimal.

Example: What is the percentile of a racer who placed:

a) 12th out of 25? b) 5th out of 40? c) 3rd out of 200?

Solution: These numbers tell us how many racers were better than the one in question so we have to calculate the number that are less than or equal to the given position.

a) 12th out of 25 means 11 racers were faster, so there were 25 – 11 = 14 racers slower than or as fast as the 12th position runner. So, the percentile for 12th position = 14/25 × 100 = 56th.

b) 5th means 4 were better, percentile for 5th position = (40 – 4)/40 = 36/40 × 100 = 90th.

c) 3rd means 2 were better, percentile for 3rd position = (200 – 2)/200 = 198/200 × 100 = 99th

Finding the Data Value, Given the Percentile:

When we have the raw data list and we want to know which of the items has a given percentile rank, we can find it if we know how many data values are less than or equal to the one in question. To do this, we solve the percentile formula for the number of data values less than or equal to x, and then we count until we find it.

Example: Find the item in these 20 data with percentile rank = 50

Solution: When we solve the percentile formula for the fraction's numerator, we get:

percentile = 50, total = 20, so there are 50 × 20 ÷ 100 = 10 data items less than or equal to the one we want. The 10th item in the list is 25 and there are no other 25's, so that's x.

Practice

1) The 15 candidates for a job were given an aptitude test marked out of 80. Here are the results:

50, 51, 52, 58, 59, 60, 63, 63, 64, 68, 69, 70, 75, 76, 77

a) The candidates in the first quintile were hired.What were their scores? How many were hired?
b) In which quintile is the score 64?
c) Construct the box and whisper plot for the distribution, list the range, quartiles and the mean.
d) Between which quartiles are the data most concentrated?
e) Find the percentile rank for scores of 63 and 75.
f) What is the data value if R100 = 70?

2) The table data displays the final math marks for two classes of 20 students at TeachemGood High School. Trevor and Shaun both got 89%. Trevor is in Class A, Shaun is in Class B.

Marks for students in Class A (Trevor's class)
75 76 77 79 81 83 84 85 86 87
87 87 89 90 91 91 96 97 97 98

Marks for students in Class B (Shaun's class)
75 75 75 76 77 77 78 78 79 84
85 85 87 88 88 89 94 95 96 98

The parents committee needs to find out who gets the awards for achievement in math so they do some stats on the data. Use percentiles to decide which student's position in his group will give him a better chance of winning an award?

3) The data shows the marks (%) for 26 math students on their final exam.

49 54 57 58 58 60 61 61 63
66 69 70 71 75 79 79 82 85
86 87 88 91 91 93 94 99  

When they asked about their marks, their teacher said this: