AP Statistics
12 min read•july 11, 2024
Jerry Kosoff
Jerry Kosoff
Practicing FRQs is a great way to prep for the AP exam! Review student responses for a Mixed Units FRQ and corresponding feedback from Fiveable teacher Jerry Kosoff.
The city council of a large city is considering raising a city tax to provide funding for road repairs. The council wishes to gauge citizen interest in the plan. The council mails a survey to random sample of 1,000 city residents, of whom 450 reply. The survey asks “should we increase city taxes in order to provide additional funding to road repairs?” Of those who reply, 170 say yes; the other 280 say no.
In #1, you give a plausible reason for bias (people travel a lot and would like to see the taxes increase), but do not explicitly describe the impact this has on the estimate they got (37.8%). Saying “it seems like most of the city residents would like to increase city taxes” is not the same as clearly stating that you believe this the survey resulted in an overestimate of the true proportion. I know it seems harsh, but that’s how the rubrics go with describing bias: (1) explain HOW the bias comes to be, (2) explain WHY the bias happens, (3) give a specific DIRECTION of the bias (over/under estimate of the true ____). Your answer only does #1 and #2.
In part 2, your sentence looks great - except you said “95% of city residents” instead of “95% of the constructed intervals” or something similar. Read your sentence back, and I think you’ll see what I mean. That would ding your answer from full to partial credit unfortunately. You have something similar in part 3 - you say “true MEAN of city residents” instead of “true PROPORTION of city residents”. Mixing up mean & proportion there would also knock you down a scoring level, even though the rest of your answer is on point.
Finally, in part 4, you are correct in that we need the observations (not “samples”) within the sample to be independent, but that comes from the “10% condition” (that the sample size is less than 10% of the overall population size). The reason for the “at least 10 successes/failures” condition is to ensure an approximately normal sampling distribution of p-hat, from which we can calculate the confidence interval.
In part #1, you would likely earn partial credit. When discussing bias on the AP exam, you typically have to do 3 things: (1) explain the source of the bias (“how” it happens), (2) explain the reason for that source existing (“why” it happens), and (3) explain the impact on the result (“what” happens). When reading your response, I see evidence for #1 and #3 - you mention “not everyone responded to the survey” (#1 - how) and that this will probably “underestimate the true proportion” (#3 - the impact). To my eyes, though, your response does not address why this happens and why the non-response will lead to an underestimate, which would imply that the 37.8% is lower than the true proportion if we actually asked everyone (and it would be maybe more like 50% or something like that). You would need to make an argument for *why the people who responded to the survey are more likely to say no and thus produce an underestimate * - perhaps they are strongly opposed to taxes of any kind, or the wording of the question made them feel like their money could be better spent elsewhere. Whatever you decide is the case, you should present and defend why it impacts the responses. For nonresponse to turn into nonresponse bias, the people who do participate must be more likely to answer a certain way than the people who don’t participate.
Additionally, while I’m not assuming this is the case, I often have students misunderstand that getting responses from fewer people than you expect does not automatically produce an underestimate. “Underestimate” specifically refers to the proportion/mean/whatever-statistic-is-being-measured being lower than would be reflected in the population. A small and biased sample can produce an overestimate just as easily as an underestimate - perhaps in this scenario we ask a small group of people who live near roads with lots of potholes what they think. They would be likely to support the city’s proposal more than others, and therefore produce an overestimate. [OK, thanks for coming to my TED Talk about bias. On to the next part…]
In part #2, we have a little bit of reviewing to do. In part (a), you correctly interpret what a 95% confidence interval is, but that is not the same as a confidence level. A confidence level represents a “long-run capture rate” that is then reflected in each specific confidence interval. You can check out an overview from a previous stream at this link 1 - it’s time-stamped to the part you’d need. The correct answer in this case would sound something like “if we were to take many, many random samples of 300 city residents and ask them the question, about 95% of the confidence intervals we constructed would capture the correct value for p, the proportion of all city residents who would respond yes to the question.”
For #2 part (b), you’ve also committed a relatively common error, in that while it is true that 50% is in fact in the interval, the presence of other, smaller values in the interval provides evidence against the claim that at least 50% of residents support the proposal. It’s just as plausible that 48.5%, or 49%, or 49.9% would say “yes”. And since all values within a confidence interval are considered “reasonable” values for p, we cannot say with confidence that the true population proportion is at least 50%. We could only say that if the entire interval is 50% of higher - for example, (0.512, 0.592).
In part ( c ), you give the correct rationale for the “large counts” condition - short and to the point! This would earn full credit.
For part 1, you would likely earn partial credit. When discussing bias on the AP exam, you typically have to do 3 things: (1) explain the source of the bias (“how” it happens), (2) explain the reason for that source existing (“why” it happens), and (3) explain the impact on the result (“what” happens). When reading your response, I see evidence for (1) and (2) when you describe “only 450 of those 100 people replied” and cite the possibility that those who replied “might have different opinions about the plan.” From there, you should “take a stand” per se and make a conjecture as to how those people would differ in their opinions from the general population, and if that will produce an overestimate (more likely to say yes) or underestimate (less likely to say yes) of the true proportion. You can actually justify either direction here, as long as your explanation is clear.
Your response in part 2a is strong, and shows a clear understanding of what a confidence level represents. In part 2b, you give the correct answer (“no”) with a correct reason (“50% is contained in the interval”), but you lose me a bit with your last sentence. In theory, we could get any proportion in a sample of 300 residents just by random chance alone. A more direct statement may be something like “this means that 50% is a plausible value for the proportion of all adults who would say yes, and we therefore do not have statistical evidence that the proportion is greater”
Your response in part 3 is on the money - the approximately normal sampling distribution is why we check that condition!
In part (a), you clearly identify the possibility of non-response bias. However, when discussing bias on the exam, it’s important to pick a direction of the bias - you’re correct that we may end up with an over-representation of strongly opinionated people, but you should “pick a side” as to how those people will land (either more or less in favor of the proposal than the general public), resulting in either an under or over-estimate of the true parameter. In most cases, you can defend either side, as long as you give a reasonable possibility. In parts (b) and ( c ), you’ve provided correct answers with appropriate context.
Great work! All three parts are complete. In part (a), you named the source of bias, explained how it might impact people’s responses, and connected that to the proportion we were trying to estimate. In part (b), you correctly interpret both parts, and in part ( c ) you give the correct reason for checking that condition.
When discussing bias like in part (a), you need to take it step further and explicitly comment on whether you think the sample results in this case are too high (an overestimate of p) or too low (an underestimate of p). In a case like this where it’s not obvious, it’s OK to “pick a side” and just go with it: for example, “citizens who are concerned about tax increases may be more likely to respond and say no, producing an underestimate of the proportion of all citizens who would support the proposal”
In part 2, your confidence level interpretation is well done (though I think you should say “95% of samples of 161 produce confidence intervals that…”, and you reach the correct conclusion in part b. 3 has the correct rationale for the 10 successes/failures condition.