Save
the ultimate psy101 reviewer (not rlly Bismillah)
Regression Analysis
Save
Share
Learn
Content
Leaderboard
Learn
Created by
xena
Visit profile
Cards (72)
Regression
Analysis using
correlation
to make
predictions
Explanatory and criterion variables
Explanatory
(predictor,
independent
) variable
Criterion
(outcome,
dependent
) variable
The
linear model with one predictor
Criterion
variable (Y)
Explanatory
variable (X)
Linear Regression
A method by which we fit a straight
line
to the data
Regression line
The line of
best
fit
As
x
increases by 1
y increases by 10
Regression
Equation
y = a + bx
Linear Equations
Y =
bX
+ a
Υi
= (𝛽1Xi +𝛽0 )
+
εi
Linear relationship between X and Y
Slope
(𝛽1 or b) -
gradient
of the line
Intercept
(𝛽0 or a) - The point at which the line cross the
vertical
axis of the graph
Regression
Equation
Shows how y changes as a result of
x
changing
The steeper the slope, the
more
y changes as a result of
x
As
x
increases
y
decreases
Intercept
The point at which the
line
crosses the
y-axis
Which
regression line
gives the better prediction?
The
linear model with several predictors
Second
predictor
(X2) and the associated
parameter
(b2)
What do we do in regression?
1.
Estimate
the model
2. Determine how well a
line
fits the data points by defining the
distance
between the line and each data point
Deviations
The vertical distances between what the model predicted and each data point was observed
Residuals
The differences between what the model predicts and the
observed
data
Residual
sum of squares (SSR)
A gauge of how well a
linear
model fits the data
Estimating
the model: Methods of
Least Squares
The best-fitting line is the one that has the
smallest
total
squared
error
Standard
error of estimate
The standard distance between the predicted Y values on the
regression line
and the
actual
Y values in the data
SST
(Total sum of squares)
Represents how good the
mean
is as a model of the
observed
outcome scores
SSR (residual sum of squares)
Can be used to calculate how much better the
linear
model is than the
baseline
model of "no relationship"
SSM (model sum of squares)
If the value is
large
, the
linear
model is very different from using the mean to predict the outcome variable
R2
The
proportion
of improvement of the model, expressed as a
percentage
test
Based upon the
ratio
of
improvement
(SSM) due to the model and the error in the model (SSR)
Outliers
Cases that differ substantially from the
main
trend in the data
Standardized
residuals
Residuals converted to
z-scores
(mean of
0
, sd of 1)
Studentized residuals
Unstandardized
residual divided by an
estimate
of its standard deviation that varies point by point
Adjusted
predicted value
The predicted value of the outcome for a case if it is removed/
excluded
Deleted
Residual
The difference between the
adjusted
predicted value and the original
observed
value
Studentized
Deleted Residual
Deleted
residual
divided by
standard
error
Cook
's Distance
A measure of the
overall influence
of a case on the model
Leverage
(hat values)
Gauges
the
influence
of the observed value of the outcome variable over the predicted values
Mahalanobis Distance
Measures the distance of cases from the
mean
(s) of the
predictor
variable(s)
Studentized
Deleted Residual
A measure of the
overall
influence of a case on the model
Cook
's Distance
Gauges
the
influence
of the observed value of the outcome variable over the predicted values
Leverage (hat values)
Measure the
distance
of cases from the mean(s) of the
predictor
variable(s)
Leverage
(hat values)
If there are no influential cases, all
leverage
values should be equal to the
average
value
Investigate cases with values greater than
twice
or
three
times the average
Mahalanobis
Distance
Measures the distance of cases from the
mean
(s) of the predictor variable(s), they have a
chi-square
distribution
Mahalanobis Distance
Cut-off
points are established by looking for the critical value for the desired alpha level
For larger samples (e.g. n=500) with 5 predictors, values >25 → major concern
For smaller samples (e.g. N=100) and fewer predictors (e.g. 3), values >
15
are problematic
For very small samples (e.g. N=30) with 2 predictors, values >
11
should be examined
See all 72 cards