You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

Box plots are powerful tools for visualizing data distributions. They show the , helping you quickly grasp the center, , and shape of your data. Understanding how to build and read box plots is key to spotting trends and .

Mastering box plots opens up a world of data insights. You'll be able to compare datasets, identify , and spot unusual values at a glance. This skill is crucial for making informed decisions and communicating findings effectively in various fields.

Box plot components

Five-number summary and interquartile range (IQR)

  • A box plot visually represents the five-number summary of a dataset ( value, first quartile (Q1), , third quartile (Q3), and value)
  • The box spans the , the range between Q1 and Q3, containing the middle 50% of the data
  • Q1 is the value below which 25% of the data falls, while Q3 is the value above which 25% of the data lies
  • The IQR is calculated by subtracting Q1 from Q3 IQR=Q3Q1IQR = Q3 - Q1

Median, whiskers, and outliers

  • The median, the middle value when the dataset is arranged in ascending or descending order, is represented by a line inside the box
  • extend from the box to the minimum and maximum values within 1.5 times the IQR (Q11.5×IQRQ1 - 1.5 \times IQR and Q3+1.5×IQRQ3 + 1.5 \times IQR)
  • Data points outside the whiskers' range are considered outliers and are plotted as individual points
  • Outliers are data points significantly different from the rest of the data (unusually high or low values compared to the majority of the dataset)

Constructing box plots

Calculating the five-number summary and IQR

  • Arrange the data in ascending order
  • Determine the minimum value, Q1, median, Q3, and maximum value
  • Calculate the IQR by subtracting Q1 from Q3
  • Identify any data points outside 1.5 times the IQR below Q1 or above Q3 as potential outliers

Drawing the box plot

  • Draw a vertical or horizontal line, depending on the desired orientation, and mark the minimum and maximum values
  • Draw a box with the bottom edge at Q1 and the top edge at Q3, keeping the width consistent when comparing multiple box plots
  • Mark the median inside the box with a line
  • Draw whiskers from the box to the minimum and maximum values within 1.5 times the IQR
  • Plot any outliers as individual points beyond the whiskers

Interpreting box plot distributions

Shape and symmetry

  • Examine the of the box and the length of the whiskers to infer the shape of the distribution
  • A symmetric distribution has a box with the median line close to the center and whiskers of approximately equal length on both sides (bell-shaped curve)
  • A skewed distribution has a box with the median line closer to one end and one whisker longer than the other (right-skewed: longer upper whisker, left-skewed: longer lower whisker)

Center and spread

  • The median line inside the box represents the center or typical value of the distribution
  • The spread of the distribution is indicated by the length of the box (IQR) and the total range between the minimum and maximum values, including outliers
  • A larger IQR or total range suggests a wider spread of data, while a smaller IQR or total range indicates a narrower spread
  • Comparing box plots of different datasets can reveal differences in center and spread (median values, IQRs, and overall ranges)

Outliers and their impact

Identifying and investigating outliers

  • Outliers are data points that fall outside 1.5 times the IQR below Q1 or above Q3
  • Investigate the nature and cause of outliers to determine if they are genuine data points (rare or unusual cases) or the result of errors (measurement errors or data entry mistakes)
  • Consider whether outliers should be included, excluded, or treated separately in the analysis based on their origin and the research question

Impact on statistical measures and analysis

  • Outliers can significantly affect statistical measures like the mean and standard deviation by pulling these measures towards their extreme values
  • When outliers are present, consider using robust statistical methods less sensitive to outliers (median or trimmed mean) instead of the mean
  • Report the presence of outliers and their potential impact on the results for transparency and accurate interpretation of the data
  • Example: In a dataset of house prices, a few luxury mansions (outliers) can greatly increase the mean price, making it less representative of the typical house price in the area
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary