Regression Analysis

Cards (72)

  • Regression
    Analysis using correlation to make predictions
  • Explanatory and criterion variables
    • Explanatory (predictor, independent) variable
    • Criterion (outcome, dependent) variable
  • The linear model with one predictor

    • Criterion variable (Y)
    • Explanatory variable (X)
  • Linear Regression
    A method by which we fit a straight line to the data
  • Regression line
    The line of best fit
  • As x increases by 1

    y increases by 10
  • Regression Equation

    y = a + bx
  • Linear Equations
    • Y = bX + a
    • Υi = (𝛽1Xi +𝛽0 ) + εi
  • Linear relationship between X and Y
    • Slope (𝛽1 or b) - gradient of the line
    • Intercept (𝛽0 or a) - The point at which the line cross the vertical axis of the graph
  • Regression Equation

    • Shows how y changes as a result of x changing
    • The steeper the slope, the more y changes as a result of x
  • As x increases
    y decreases
  • Intercept
    The point at which the line crosses the y-axis
  • Which regression line gives the better prediction?
  • The linear model with several predictors

    • Second predictor (X2) and the associated parameter (b2)
  • What do we do in regression?
    1. Estimate the model
    2. Determine how well a line fits the data points by defining the distance between the line and each data point
  • Deviations
    The vertical distances between what the model predicted and each data point was observed
  • Residuals
    The differences between what the model predicts and the observed data
  • Residual sum of squares (SSR)

    A gauge of how well a linear model fits the data
  • Estimating the model: Methods of Least Squares
    The best-fitting line is the one that has the smallest total squared error
  • Standard error of estimate

    The standard distance between the predicted Y values on the regression line and the actual Y values in the data
  • SST (Total sum of squares)

    Represents how good the mean is as a model of the observed outcome scores
  • SSR (residual sum of squares)
    Can be used to calculate how much better the linear model is than the baseline model of "no relationship"
  • SSM (model sum of squares)
    If the value is large, the linear model is very different from using the mean to predict the outcome variable
  • R2
    The proportion of improvement of the model, expressed as a percentage
    1. test
    Based upon the ratio of improvement (SSM) due to the model and the error in the model (SSR)
  • Outliers
    Cases that differ substantially from the main trend in the data
  • Standardized residuals

    Residuals converted to z-scores (mean of 0, sd of 1)
  • Studentized residuals
    Unstandardized residual divided by an estimate of its standard deviation that varies point by point
  • Adjusted predicted value

    The predicted value of the outcome for a case if it is removed/excluded
  • Deleted Residual

    The difference between the adjusted predicted value and the original observed value
  • Studentized Deleted Residual

    Deleted residual divided by standard error
  • Cook's Distance

    A measure of the overall influence of a case on the model
  • Leverage (hat values)

    Gauges the influence of the observed value of the outcome variable over the predicted values
  • Mahalanobis Distance
    Measures the distance of cases from the mean(s) of the predictor variable(s)
  • Studentized Deleted Residual

    A measure of the overall influence of a case on the model
  • Cook's Distance

    Gauges the influence of the observed value of the outcome variable over the predicted values
  • Leverage (hat values)
    Measure the distance of cases from the mean(s) of the predictor variable(s)
  • Leverage (hat values)

    • If there are no influential cases, all leverage values should be equal to the average value
    • Investigate cases with values greater than twice or three times the average
  • Mahalanobis Distance

    Measures the distance of cases from the mean(s) of the predictor variable(s), they have a chi-square distribution
  • Mahalanobis Distance

    • Cut-off points are established by looking for the critical value for the desired alpha level
    • For larger samples (e.g. n=500) with 5 predictors, values >25 → major concern
    • For smaller samples (e.g. N=100) and fewer predictors (e.g. 3), values > 15 are problematic
    • For very small samples (e.g. N=30) with 2 predictors, values > 11 should be examined