Regression

Cards (79)

  • Regression
    Analysis using correlation to make predictions
  • In this lesson

    1. Learn how to assess the relationship between a dependent variable and one or more explanatory variables
    2. Learn how to predict a person's score on the criterion variable by a knowledge of their scores on one or more explanatory variable
    3. Learn how to use confidence limits when analyzing data by the use of multiple regression
  • Agenda
    • 10.1 An Introduction to the linear model (regression)
    • 10.2 Bias in linear models
    • 10.3 Generalizing the model
    • 10.4 Sample Size and the linear model
    • 10.5 Fitting Linear Model: The General Procedure
    • 10.6 Assumptions of regression analysis
    • 10.7 Simple linear regression
    • 10.8 Multiple regression
    • 10.9 Reporting Linear Regression
  • Explanatory and criterion variables

    • Explanatory (predictor, independent) variable
    • Criterion (outcome, dependent) variable
  • The linear model with one predictor

    • Criterion variable (Y)
    • Explanatory variable (X)
  • Linear Regression
    A method by which we fit a straight line to the data
  • Regression line
    The line of best fit
  • As x increases by 1

    y increases by 10
  • Regression Equation
    y = a + bx
  • Linear Equations

    • Y = bX + a
    • Υi = (𝛽1Xi +𝛽0 ) + εi
  • Linear relationship between X and Y
    • Slope (𝛽1 or b) - gradient of the line
    • Intercept (𝛽0 or a) - The point at which the line cross the vertical axis of the graph
  • Regression Equation

    • Shows how y changes as a result of x changing
    • The steeper the slope, the more y changes as a result of x
  • What is someone's predicted score on y when their score on x = 20? Assume a = 5 and b = 2.
  • As x increases
    y decreases
  • As x increases by 1

    y decreases by 3
  • Predict the score of a person who watched 3.5 hours of TV per night. y=18 - (3x)
  • For every value of x
    y increases by 5
  • Non-Perfect Relationships

    • Draw the line in the best place possible: the place where the maximum number of dots will be nearest the line → best fit
  • How do you know the values of a and b?
  • The linear model with several predictors

    • Notice the second predictor (X2) and the associated parameter (b2))
  • What do we do in regression?

    1. We estimate the model
    2. To determine how well a line fits the data points, the first step is to define mathematically the distance between the line and each data point
    3. We could assess the fit of a model by looking at the deviations between the model and data collected
  • Residuals
    The differences between what the model predicts and the observed data. The differences between the actual scores and predicted scores.
  • Residual sum of squares (SSR)
    A gauge of how well a linear model fits the data
  • Estimating the model: Methods of Least Squares
    1. The best-fitting line is the one that has the smallest total squared error
    2. This line is called least-squared-error solution
    3. For each value of X in the data, this equation determines the point on the line that gives the best prediction of Y
  • Standard error of estimate

    The standard distance between the predicted Y values on the regression line and the actual Y values in the data
  • Assessing the goodness of fit, sum of squares, R and R2
    1. SSR tells us how much error there is in a model, but it does not tell us whether using the model is better than nothing
    2. We need to compare the model against a baseline to see whether it "improves" how well we can predict the outcome
    3. We fit the baseline model, using the mean of the outcome
    4. Then we fit the best model, and calculate the error, SSR
    5. If the model is good, it should have significantly less error within that baseline model
  • SST (Total sum of squares)

    Represents how good the mean is as a model of the observed outcome scores
  • SSR (residual sum of squares)
    Can be used to calculate how much better the linear model is than the baseline model of "no relationship"
  • SSM (model sum of squares)
    If the value is large, the linear model is very different from the using the mean to predict the outcome variable
  • R2
    The proportion of improvement of the model, expressed as a percentage
    1. test
    Based upon the ratio of improvement (SSM) due to the model and the error in the model (SSR)
  • Bias in Linear Models

    • Is the model influenced by a small number of cases?
    • Does the model generalize to other samples?
  • Outliers
    Cases that differ substantially from the main trend in the data
  • Standardized residuals

    Residuals converted to z-scores (mean of 0, sd of 1)
  • Studentized residuals
    Unstandardized residual divided by an estimate of its standard deviation that varies point by point
  • Adjusted predicted value
    The predicted value of the outcome for that case from a model if the case is removed/excluded
  • Deleted Residual

    The difference between the adjusted predicted value and the original observed value
  • Studentized Deleted Residual

    Deleted residual divided by standard error
  • Cook's Distance

    A measure of the overall influence of a case on the model
  • Leverage (hat values)

    Gauges the influence of the observed value of the outcome variable over the predicted values