You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

2.1 Simple Linear Regression: Theory and Implementation

3 min readaugust 7, 2024

Simple linear regression is the foundation of . It explores the relationship between two variables, using one to predict the other. This method helps us understand how changes in one variable affect another, forming the basis for more complex regression techniques.

In this section, we'll dive into the theory behind simple linear regression and learn how to implement it. We'll cover key concepts like , , and model fitting, setting the stage for more advanced regression methods in later chapters.

Linear Regression Fundamentals

Linear Relationship and Variables

Top images from around the web for Linear Relationship and Variables
Top images from around the web for Linear Relationship and Variables
  • Linear regression models the relationship between a and one or more independent variables
    • Assumes a linear relationship exists between the variables
    • Dependent variable (response variable) is the variable being predicted or explained by the model
    • (predictor variable) is used to predict or explain the dependent variable
  • Dependent and independent variables are represented on a scatter plot
    • Dependent variable is plotted on the y-axis
    • Independent variable is plotted on the x-axis
    • Each data point represents an observation of both variables
  • Goal of linear regression is to find the best-fitting line through the data points
    • Line minimizes the differences between observed values and predicted values
    • Differences between observed and predicted values are called

Slope and Intercept

  • Slope represents the change in the dependent variable for a one-unit increase in the independent variable
    • Indicates the steepness of the line
    • Positive slope means the dependent variable increases as the independent variable increases
    • Negative slope means the dependent variable decreases as the independent variable increases
  • Intercept represents the value of the dependent variable when the independent variable is zero
    • Point where the line crosses the y-axis
  • Slope and intercept define the linear equation: y=mx+by = mx + b
    • yy is the dependent variable
    • mm is the slope
    • xx is the independent variable
    • bb is the intercept

Model Fitting and Evaluation

Least Squares Method and Correlation Coefficient

  • Least squares method is used to find the best-fitting line by minimizing the sum of squared residuals
    • Residuals are the differences between observed values and predicted values
    • Squaring the residuals ensures that positive and negative residuals do not cancel each other out
    • Best-fitting line minimizes the sum of squared residuals, providing the closest fit to the data points
  • measures the strength and direction of the linear relationship between variables
    • Ranges from -1 to +1
    • Positive correlation indicates a positive relationship (as one variable increases, the other increases)
    • Negative correlation indicates a negative relationship (as one variable increases, the other decreases)
    • Correlation coefficient of 0 indicates no linear relationship between the variables

R-squared and Residuals

  • (coefficient of determination) measures the proportion of variance in the dependent variable explained by the independent variable(s)
    • Ranges from 0 to 1
    • Higher R-squared values indicate a better fit of the model to the data
    • R-squared of 1 means the model explains all the variability in the dependent variable
    • R-squared of 0 means the model does not explain any variability in the dependent variable
  • Residuals are the differences between the observed values and the predicted values from the model
    • Used to assess the goodness of fit and identify outliers or unusual observations
    • Residuals should be normally distributed and have constant variance ()
    • Plotting residuals against predicted values can help identify patterns or issues with the model

Model Application

Prediction and Interpretation

  • Once a linear regression model is fitted, it can be used to make predictions for new observations
    • Plug in the value of the independent variable to the linear equation to predict the dependent variable
    • Predictions are based on the assumption that the relationship between variables remains the same
  • Interpreting the coefficients (slope and intercept) provides insights into the relationship between variables
    • Slope indicates the change in the dependent variable for a one-unit change in the independent variable
    • Intercept represents the predicted value of the dependent variable when the independent variable is zero
  • Confidence intervals and prediction intervals can be calculated to quantify the uncertainty in the predictions
    • Confidence intervals estimate the range of values for the population parameters (slope and intercept)
    • Prediction intervals estimate the range of values for individual predictions
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary