Note that the course pack provided to you in any form is intended only for your use in connection with the course that you are enrolled in. It is not for distribution or sale. Permission should be obtained from your instructor for any use other than for what it is intended.
At the end of this unit, the student will
Perform correlation analysis
Build linear models using simple or multiple linear regression analysis
Perform diagnostic checking on the adopted linear model
Perform remedial measures if necessary
Interpret results of linear regression analysis
Correlation analysis
Determine if two measurements X and Y taken from the same sample or population are associated/related/dependent on each other
Pearson Product-Moment Correlation Coefficient
A measure of the strength of the linear relationship existing between two variables, X and Y, that is independent of their respective scales of measurement
Assumptions for Pearson Product-Moment Correlation Coefficient
Both variables are measured at the interval or ratio level
There should be no significant outliers
The variables should be approximately normally distributed
Correlation coefficient (ρ)
Takes on values between -1 and 1, inclusive
A positive ρ means the line slopes upward to the right, a negative ρ means it slopes downward to the right
When ρ is 1 or -1, there is a perfect linear relationship between X and Y
A ρ close to 1 or -1 indicates a strong linear relationship, but does not necessarily imply X causes Y or Y causes X
If ρ = 0 then there is no linear correlation between X and Y, but there may still be a non-linear association
Pearson product moment coefficient of correlation (r)
Used to estimate ρ based on a random sample
-1 < r < 1
Verbal description of strength of correlation: ±0.00-0.25 no/weak, ±0.26-0.50 moderately weak, ±0.51-0.75 moderately strong, ±0.76-1.00 strong to perfect
Scatterplots with approximate values of r
r ≈ 0, r ≈ ±0.5, r ≈ ±1
Computing Pearson correlation coefficient (r)
1. Calculate Σxi, Σyi, Σxi^2, Σyi^2
2. Plug into formula: r = [n(Σxiyi) - (Σxi)(Σyi)] / √[(n(Σxi^2) - (Σxi)^2)(n(Σyi^2) - (Σyi)^2)]
Testing hypothesis about correlation coefficient
Ho: ρ = ρ0
Ha: ρ < ρ0, ρ > ρ0, ρ ≠ ρ0
Test statistic: t = (r - ρ0)√(n-2) / √(1-r^2)
Critical region: |t| > tα/2(n-2)
Simple linear regression
Predicting a quantitative variable Y based on a single predictor variable X, assuming an approximately linear relationship
General equation of a straight line
y = β0 + β1x, where β0 is the y-intercept and β1 is the slope
Deterministic model
Linear model y = β0 + β1x where a value of x determines the value of y with no error
Probabilistic model
Linear model y = β0 + β1x + ε where ε is a random error and the observed y varies randomly around the mean E(y|X=x) = β0 + β1x
Simple linear regression model
Y = β0 + β1X + ε, where Y is the response variable, X is the explanatory/predictor variable, ε is the random error, β0 is the y-intercept, and β1 is the slope
It is important to first test the model assumptions before reading the result of the linear regression analysis
When extending the simple linear regression to multiple independent variables, multicollinearity or correlation among these predictors should be checked
Example: Investigating the relationship between GPI and starting salary