11.2 Standardized Testing and High-Stakes Assessment
3 min read•august 7, 2024
Standardized tests are a big deal in education. They're used for everything from college admissions to measuring school performance. But they're not without controversy. High-stakes tests can have major consequences for students, teachers, and schools.
Test quality is super important. Good tests need to be reliable (give consistent results) and valid (measure what they're supposed to). But there are concerns about bias, , and the negative effects of on students.
Types of Standardized Tests and Scores
Standardized Tests and Scoring Methods
Top images from around the web for Standardized Tests and Scoring Methods
Understanding Test Results | Educational Psychology View original
Is this image relevant?
High-stakes testing by states | Educational Psychology View original
Understanding Test Results | Educational Psychology View original
Is this image relevant?
High-stakes testing by states | Educational Psychology View original
Is this image relevant?
1 of 3
Standardized tests are assessments administered and scored in a consistent manner for all test-takers
Ensures results are comparable across individuals and groups
Commonly used for educational admissions, placement, and accountability purposes (, , )
refers to standardized tests with significant consequences for test-takers, teachers, or schools based on performance
Can determine grade promotion, graduation, teacher evaluations, or school funding
Norm-referenced scores compare an individual's performance to a reference group or norm group
Percentile ranks indicate the percentage of individuals in the norm group who scored at or below a given score (65th percentile means outperformed 65% of norm group)
Standard scores convert raw scores to a common scale based on the norm group's mean and standard deviation (IQ scores, SAT scores)
Test Quality and Fairness
Reliability and Validity
is the consistency and stability of test scores across different occasions, forms, or raters
assesses consistency of scores over time by administering the same test twice
evaluates consistency between different versions of a test
measures agreement between multiple raters or scorers
is the extent to which a test measures what it claims to measure and supports appropriate score interpretations and uses
ensures test items adequately represent the domain of interest
compares test scores to other relevant criteria or outcomes (college GPA)
evaluates how well the test measures the underlying theoretical construct (intelligence)
Bias, Accommodations, and Item Analysis
occurs when a test systematically disadvantages certain subgroups due to factors unrelated to the construct being measured
Can result from inappropriate content, language, or cultural references that favor some groups over others
(DIF) analysis identifies items that perform differently across subgroups matched on overall ability
are changes to test administration that remove construct-irrelevant barriers for students with disabilities or English language learners
Extra time, separate setting, read-aloud
are changes to the test itself that alter what the test measures
Simplified language, reduced answer choices
examines performance on individual test questions to evaluate their quality and functioning
Difficulty, discrimination, distractor analysis
Issues and Concerns
Negative Consequences of High-Stakes Testing
Teaching to the test narrows the curriculum and instruction to focus on tested content at the expense of other important skills and subjects
Drill and practice on test-like items
Neglect of non-tested subjects (arts, physical education)
Test anxiety can impair performance and disproportionately impact some students
Excessive worry, nervousness, and physiological arousal during testing
More common among females, minorities, and lower-achieving students
Overemphasis on test scores can lead to unintended consequences and gaming the system
Cheating, exclusion of low-performing students, artificial score inflation