We learnt that the Pearson's correlation coefficient r is the ratio of the covariance and combined standard deviation of x and y (i.e. the product) (1),
it turns out that the simple regression coefficient b1 can be written as (2)
Combining the two expressions
Rewriting the first to get an expression for covariance (1)
Now entering the cov expression above into the second equation for b1 (2)
So the slope b1 is the correlation coefficient r multiplied by the ratio of standard deviations of y and x. If sx and sy are the same (z-scores) r is the regression slope
Sum of the Squared Error SSE
Appropriate variability estimates help to make inferences from Regressions. To do this, we use the residual variance or error variance for the model.
This is how much variability in the DV is not explained by the model.
For each data point, we can calculate a residual e
e i = yi - ypi = yi - (b1xi + b0)
From each of these we can get the sum of the squared errors SSE
The Mean Square Error MSE
From the sum of the squared errors (SSE) we can calculate the Mean Squared Error
Standard Error from the MSE
Regression parameters from a statistical test
From the parameter estimates (b1/b0) and the SEs, we can compute the t-statistic
Remember the t-value has a difference between expected and measured values in the numerator and combined SE in the denominator
In this specific case, we have
tN-p = (b1 – bexp)/SEb , the expected value of b is zero for the Null value
tN-p = (b1 - 0)/SEb
tN-p = b1/SEb
Visualising the sources of variance
Quantifying goodness of fit
As we considered when doing correlations, we can ask how much of the variability between x and y is accounted for, in this case by the model?
This is done with the coefficient of determination R2. For a simple regression with one x variable R2 = r2
This value is the proportion of variance in grades accounted for by study time. Going back to the Sum of Squared errors: