Regression Analysis

    Cards (72)

    • Regression
      Analysis using correlation to make predictions
    • Explanatory and criterion variables
      • Explanatory (predictor, independent) variable
      • Criterion (outcome, dependent) variable
    • The linear model with one predictor

      • Criterion variable (Y)
      • Explanatory variable (X)
    • Linear Regression
      A method by which we fit a straight line to the data
    • Regression line
      The line of best fit
    • As x increases by 1

      y increases by 10
    • Regression Equation

      y = a + bx
    • Linear Equations
      • Y = bX + a
      • Υi = (𝛽1Xi +𝛽0 ) + εi
    • Linear relationship between X and Y
      • Slope (𝛽1 or b) - gradient of the line
      • Intercept (𝛽0 or a) - The point at which the line cross the vertical axis of the graph
    • Regression Equation

      • Shows how y changes as a result of x changing
      • The steeper the slope, the more y changes as a result of x
    • As x increases
      y decreases
    • Intercept
      The point at which the line crosses the y-axis
    • Which regression line gives the better prediction?
    • The linear model with several predictors

      • Second predictor (X2) and the associated parameter (b2)
    • What do we do in regression?
      1. Estimate the model
      2. Determine how well a line fits the data points by defining the distance between the line and each data point
    • Deviations
      The vertical distances between what the model predicted and each data point was observed
    • Residuals
      The differences between what the model predicts and the observed data
    • Residual sum of squares (SSR)

      A gauge of how well a linear model fits the data
    • Estimating the model: Methods of Least Squares
      The best-fitting line is the one that has the smallest total squared error
    • Standard error of estimate

      The standard distance between the predicted Y values on the regression line and the actual Y values in the data
    • SST (Total sum of squares)

      Represents how good the mean is as a model of the observed outcome scores
    • SSR (residual sum of squares)
      Can be used to calculate how much better the linear model is than the baseline model of "no relationship"
    • SSM (model sum of squares)
      If the value is large, the linear model is very different from using the mean to predict the outcome variable
    • R2
      The proportion of improvement of the model, expressed as a percentage
      1. test
      Based upon the ratio of improvement (SSM) due to the model and the error in the model (SSR)
    • Outliers
      Cases that differ substantially from the main trend in the data
    • Standardized residuals

      Residuals converted to z-scores (mean of 0, sd of 1)
    • Studentized residuals
      Unstandardized residual divided by an estimate of its standard deviation that varies point by point
    • Adjusted predicted value

      The predicted value of the outcome for a case if it is removed/excluded
    • Deleted Residual

      The difference between the adjusted predicted value and the original observed value
    • Studentized Deleted Residual

      Deleted residual divided by standard error
    • Cook's Distance

      A measure of the overall influence of a case on the model
    • Leverage (hat values)

      Gauges the influence of the observed value of the outcome variable over the predicted values
    • Mahalanobis Distance
      Measures the distance of cases from the mean(s) of the predictor variable(s)
    • Studentized Deleted Residual

      A measure of the overall influence of a case on the model
    • Cook's Distance

      Gauges the influence of the observed value of the outcome variable over the predicted values
    • Leverage (hat values)
      Measure the distance of cases from the mean(s) of the predictor variable(s)
    • Leverage (hat values)

      • If there are no influential cases, all leverage values should be equal to the average value
      • Investigate cases with values greater than twice or three times the average
    • Mahalanobis Distance

      Measures the distance of cases from the mean(s) of the predictor variable(s), they have a chi-square distribution
    • Mahalanobis Distance

      • Cut-off points are established by looking for the critical value for the desired alpha level
      • For larger samples (e.g. n=500) with 5 predictors, values >25 → major concern
      • For smaller samples (e.g. N=100) and fewer predictors (e.g. 3), values > 15 are problematic
      • For very small samples (e.g. N=30) with 2 predictors, values > 11 should be examined
    See similar decks