Measures of Central Tendency
Whenever we have more than a few points of data, we need some ways to describe it. We can talk about the minimum (lowest number) or maximum (highest number) or range (distance between the minimum and maximum), but we usually want to find a way to describe the middle or the majority of the data. That's what measures of central tendency do: they use different methods to describe the middle of the data.
There are three measures of central tendency: Mode, Median, and Mean (or Average). (See Lessons on Mode, Median, and Mean for details on how to find these measures of central tendency).
Why have three different measures of central tendency? All three measures have different pros and cons -- and are differently sensitive to different configurations of data. So, using all three measures of central tendency not only tells you more about the "middle" of your data, but the differences between the measures can tell you even more about your data.
Let's start with Mode. The mode of a dataset is the datapoint that shows up the most often. In a large dataset without too many different results, mode can be helpful.
Let's say you were studying a day care with 35 kids in it and you wanted to know what typical age of the students. If the ages were:
$2, 3, 3, 3, 2, 4, 3, 3, 3, 3, 4, 3, 3, 3, 3, 2, 3, 3, 2, 3, 2, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 3, 3$
The mode of this dataset is $3$.
It would be fair to say that the typical kid in this day care is 3 years old. In this case mode is a reasonable measure of central tendency.
On the other hand, if you asked a group of 6 kids how much money they had on them, mode might not yield as helpful an answer. Here is how much money the kids had in their wallets:
$1, 2, .50, 3, 20, 20$
The mode of this dataset is \$20.
However, is it accurate to say that most of the kids had \$20 on them? Not really. It just so happens that two kids had a \$20, but most of the kids had much less, so the mode is probably not a great way to summarize this dataset.
How does Median describe these same data sets? The median of a dataset is the datapoint in the middle, or at the 50th percentile. To find the median, you line the numbers up from lowest to highest and identify the middle number (or the average of the middle two numbers).
Let's look at our daycare data again:
$2, 3, 3, 3, 2, 4, 3, 3, 3, 3, 4, 3, 3, 3, 3, 2, 3, 3, 2, 3, 2, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 3, 3$
Once we put it in order
$2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, \underline{3}, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4$
We can see that the median of the dataset is $3$.
It would be fair to say that the typical kid in this day care is 3 years old. In this case median is also a reasonable measure of central tendency.
Let's look at our dataset of kids' money again:
$1, 2, .50, 3, 20, 20$
Let's put the datapoints in order from least to greatest:
$.50, 1, \underline{2}, \underline{3}, 20, 20$
The middle numbers are 2 and 3, so the median is \$2.50.
\$2.50 might not perfectly capture the amount of money that "most of the kids have" or "about what kids have in their pockets" but it's probably a more reasonable number that \$20.
How does Mean describe these same data sets? The mean of a dataset is found by adding the datapoints up and dividing by the number of datapoints.
Let's look at our daycare data again:
$2, 3, 3, 3, 2, 4, 3, 3, 3, 3, 4, 3, 3, 3, 3, 2, 3, 3, 2, 3, 2, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 3, 3$
To find the mean or average:
$$\eqalign{\dfrac{\text{sum of datapoints}}{\text{number of datapoints}}&=\text{mean}\\\dfrac{\tiny{2+3+3+3+2+4+3+3+3+3+4+3+3+3+3+2+3+3+2+3+2+4+3+3+3+3+3+3+3+3+3+3+2+3+3}}{35}&=\text{mean}\\\dfrac{103}{35}&=2.9}$$
We can see that the mean of the dataset is $2.9$.
For this dataset, 3 is probably a better answer than 2.9... but either one is fine. All measures of central tendency work pretty well for this group.
Let's look at our dataset of kids' money again:
$1, 2, .50, 3, 20, 20$
To find the mean or average:
$$\eqalign{\dfrac{\text{sum of datapoints}}{\text{number of datapoints}}&=\text{mean}\\\dfrac{1+2+.5+3+20+20}{6}&=\text{mean}\\\dfrac{46.50}{6}&=7.75}$$
We can see that the mean of the dataset is \$7.75.
\$7.75 doesn't perfectly capture the amount of money that "most of the kids have" or "about what kids have in their pockets" but it's also probably a more reasonable number that \$20.
So, what do we learn by looking at different measures of central tendency? We learn that they can yield very different answers. We can also learn from those very different answers!
When the measures of central tendency are all very close (like in our day care example), we know that the data are normally distributed and nothing is too wacky. If there was a 10 year old in the class, the mean would have been much higher than the mode or median, but it isn't, so we know that the data is pretty consistent.
What our money example shows us is that when the measures are all very different, the dataset is weird. When your mean is much different than your median, there are probably outliers in the data (the $20 kids in this case... but outliers could also be very low numbers). And, mode is tricky to use. It can mean nothing (in many cases, it's pure luck when you get multiple numbers the same) or it can mean everything (as in the three year olds in our day care class).
Let's do one more thing, let's take one of our \$20 kids and make him a \$500 kid (maybe he just had a birthday!):
$1, 2, .50, 3, 20, 500$
Mode: None. There are no two numbers alike now (this is common)
Median: \$2.50 (Note: medians tend to be pretty stable when things change in the upper and lower numbers)
Mean: $\dfrac{1+2+.50+3+20+500}{6}=\dfrac{526.50}{6}=87.75$ (Here, to say that most kids have about 87 dollars is totally wrong. Note how the mean is the most sensitive to outliers!).
Overall, the lesson is that there are several measures of central tendency and they all tell us slightly different things about a dataset, so know how to use all of them and think about what they tell you!
Practice Problems:
Measures of Central Tendency
Measures of central tendency sometimes produce similar results and sometimes produce very different results. Think about the solutions below and, without doing the math, explain how mode, median, and mean for each of the datasets will differ:
- Your family is shopping for houses You know that you want to live in Torrance and you look at 15 houses that range in price from \$350,000 to \$470,000. However, one day you see an open house in Bel Air and you have to go visit that house with the 2 pools and the private movie screening room. That house lists for \$4.5 million dollars. How will the mode, median, and mean of the list prices of the houses you looked at differ? What measure captures the "normal" house price the best? The worst?
- Over the course of the semester you have taken 10 tests. You've gotten two perfect scores of 100. The rest of your scores have ranged from 85 to 99, spread out pretty evenly, but surprisingly, other than the 100s, you have never gotten the same score twice. How will the mode, median, and mean of your scores differ? What measure captures your "normal" score the best? The worst?
- All of your friends are pitching in to pay for a party. Of your group of 30 friends, almost all have contributed 20 dollars. Two of your friends have good jobs and they put in 50. One friend is always a slacker and hasn't put in anything. How will the mode, median, and mean of the amount of money your friends pitched in differ? What measure captures the "normal" amount the best? The worst?
- You are shopping for coats. You are looking at two coats at Ross for \$39.99. There is a great coat at Forever 21 for \$27.99. Macy's has another coat for \$56.99. The GOOP website has a truly amazing coat for \$8,690. How will the mode, median, and mean of coat prices differ? What measure captures the "normal" price the best? The worst?
In the following scenarios, what could be causing the differences in measures of central tendency?
- A teacher is looking at the SAT scores for her class. She's puzzled by the results. The mode score is 2000, which is amazing. The median score is 1800, which is still pretty good. But, the mean score is 700. What could have happened to create these three different measures?
- A student is working hard in AP Calculus. He learns that his median test score is 80 and he's thrilled that he will get a B. But, then his grade is a D. Baffled, he goes to his teacher and his teacher says that he's sorry, but he bases grades on an average (or mean) and not a median. What could have caused these scores to be so different?
- In Mississippi, the mean household income is approximately \$20,000/year. The median household income is approximately \$38,000/year. What could explain the differences between these two numbers?
- In Orange County, California, the median household income is approximately \$34,000/year. The mean household income is approximately \$75,000/year? What could explain the differences between those two numbers?
- Thinking about income would mode be a good way to judge the "typical" household income in a state or county? Why or why not?
Answer Key:
- Your family is shopping for houses You know that you want to live in Torrance and you look at 15 houses that range in price from \$350,000 to \$470,000. However, one day you see an open house in Bel Air and you have to go visit that house with the 2 pools and the private movie screening room. That house lists for \$4.5 million dollars. How will the mode, median, and mean of the list prices of the houses you looked at differ? What measure captures the "normal" house price the best? The worst?