You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

2.4 Fitting Linear Models to Data

3 min readjune 24, 2024

Scatter plots help us visualize relationships between variables. We can see if they're connected positively, negatively, or not at all. They also show us if the relationship is linear or nonlinear, which is crucial for understanding data patterns.

finds the best-fitting straight line through our data points. This line helps us make predictions and understand how changes in one variable affect another. We can also measure how well our model fits the data using tools like .

Scatter Plots and Linear Relationships

Scatter plots for variable relationships

Top images from around the web for Scatter plots for variable relationships
Top images from around the web for Scatter plots for variable relationships
  • Graphical representation of data points on a coordinate plane
    • Each point represents a pair of values for two variables (x and y)
    • plotted on x-axis, plotted on y-axis
  • Visualize relationship between two variables
    • : As x increases, y tends to increase
    • : As x increases, y tends to decrease
    • No : No apparent relationship between x and y
  • Assess strength of correlation visually
    • Strong correlation: Data points closely follow clear pattern
    • Weak correlation: Data points more scattered and deviate from pattern
  • Identify potential that may affect the overall relationship

Linear vs nonlinear relationships

  • Linear relationships:
    • Data points in appear to follow straight line
    • Change in y proportional to change in x
    • Example: Relationship between distance traveled and time at constant speed
  • Nonlinear relationships:
    • Data points in scatter plot do not follow straight line pattern
    • Change in y not proportional to change in x
    • Examples:
      • Exponential: Relationship between population growth and time
      • Quadratic: Relationship between height of thrown object and time
      • Logarithmic: Relationship between perceived loudness and actual intensity of sound

Linear Regression and Predictions

Line of best fit interpretation

  • or minimizes sum of squared distances between line and data points
  • Equation of line of best fit given by y=mx+by = mx + b
    • mm: slope of line, represents change in y per unit change in x
    • bb: , represents value of y when x is zero
  • Calculate slope (mm) and y-intercept (bb) using formulas:
    • m=i=1n(xixˉ)(yiyˉ)i=1n(xixˉ)2m = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2}
    • b=yˉmxˉb = \bar{y} - m\bar{x}
      • xˉ\bar{x} and yˉ\bar{y}: means of x and y values
      • xix_i and yiy_i: individual data points
      • nn: number of data points
  • Use line of best fit to make predictions about dependent variable (y) for given value of independent variable (x)
  • represent the difference between observed and predicted y-values

Linear models for predictions

  • Make prediction using linear model:
    1. Determine equation of line of best fit (y=mx+by = mx + b)
    2. Substitute given x-value into equation to calculate predicted y-value
  • Assess accuracy of linear model using (R2R^2)
    • R2R^2: proportion of variance in dependent variable explained by linear model
    • R2R^2 ranges from 0 to 1, values closer to 1 indicate better fit
    • R2=1i=1n(yiy^i)2i=1n(yiyˉ)2R^2 = 1 - \frac{\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}{\sum_{i=1}^{n}(y_i - \bar{y})^2}
      • yiy_i: actual y-value for given x-value
      • y^i\hat{y}_i: predicted y-value for given x-value
      • yˉ\bar{y}: mean of y-values
  • Limitations of linear models:
    • May not be appropriate for nonlinear relationships
    • (making predictions outside range of observed data) can lead to inaccurate results
    • (making predictions within the range of observed data) is generally more reliable

Measures of Model Fit and Correlation

  • : Measures the average deviation of observed y-values from the predicted y-values
  • : Measures the strength and direction of the between two variables
  • Both measures provide additional insight into the accuracy and reliability of the linear model
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary