# SAT Stats: Sampling II

In the last post we talked about ways to construct a good sample (or critique a bad sample!).

There is one other way that SAT tests students about sampling: it asks students to draw conclusions about a population based on results from the sample.

The principle here is simple: if a sample is random and large enough, results from the sample should generalize to the population:

• If 25% of a random sample of 5,000 Americans says that they favor immigration reform, then we can conclude that 25% of Americans favor immigration reform.
• If 90% of a random sample of 100 students at a high school say that they wish the school offered yoga, then we can conclude that 90% of the students at that high school wish that the school offered yoga.

There is just one tricky thing about sampling that we must remember:

One can only generalize to the population from which the sample was chosen.

So, if you sample sophomores, you can only talk about sophomores; you can't talk about juniors or high school students, just sophomores.  If you want to do an experiment to see if tutoring works, and you offering tutoring to high school students with low math scores, and then you find the tutoring works, all you can say (at best) is that tutoring helps high school students with low math scores raise their grades.  You can't say it helps all students because you only tested the tutoring on high school students with low math scores.

This is why it's so important to define your population when you do research.  Once you sample from a population, your conclusions are limited to the population that you sampled from.

This is often fine.  We might only care if tutoring works for students with low scores. We might not be interested in improving  the grades of students who are already doing well.  That's a perfectly reasonable choice for us to make.  But, once we define our population as "students with low math scores" and sample from that population, those are the only people we can talk about when we talk about results from the sample.

SAT really pushes students to pay attention to these details and the difference between and right or wrong answer is often just one word.

Here's an example of this kind of question from a released SAT:

A market researcher selected 200 people at random from a group of people who indicated that they liked a certain book. The 200 people were shown a movie based on the book and then asked whether they liked or disliked the movie.  Of those surveyed, 95% said they disliked the movie. Which of the following inferences can appropriately be drawn from this survey result.

A. At least 95% of people who go see movies will dislike this movie.
B. At least 95% of people who read books will dislike this movie.
C. Most people who dislike this book will like this movie.
D. Most people who like this book will dislike this movie.

So, the population that they sampled from (randomly!) was people who liked a certain book. So, that's the only group that they can talk about.  That leaves D as the only possible answer (even though it is less specific than A and B which cite the 95% statistic)!

It's pretty common sensical.   And it's good stuff to remember in life.  It's important to know that, when that politician surveys the constituents who are on his email list and then reports that 90% of those surveyed think he's doing a great job, his population was only the people who had sign up for his email list in the first place. Even if he surveyed a random sample of those people, he can only generalize to the population of people who like him enough to sign up for his email list. It's the kind of detail that we sometimes forget when we're reading the news, but that can put a whole new spin on statistics and news stories.

So, it's a good thing that SAT is testing this stuff -- but rough for kids who haven't learned it. Fortunately, they just need the basics, and some practice, to make sure they are looking out for all of the SAT's research and sampling questions.