A measure of the degree of linear relationship between two variables. The emphasis is on the degree to which a linear model may describe the relationship between the variables
Types of Correlation
Residuals
Line of Best Fit
Correlation
Residual
The difference between the predicted value and the observed value. A positive residual is when the observed value is higher than the predicted value. A negative residual is when the observed value is lower than the predicted value. A residual is zero if the observed value is equal to the predicted value
Line of Best Fit
The line which gives rise to the smallest total value of residuals squared. Often referred to as the "least squares" method. It is used to predict one value from another
Correlation Coefficient
A value between plus and minus one. The sign (+, -) defines the direction of the relationship between the two variables as positive or negative
Correlation does NOT infer causation, but tells us whether there is a relationship or association between the variables. A perfect correlation is ±1
The more time I spend revising, the more I remember
Memory, Time Spent Revising, Positive Correlation
The more time I spend practicing using SPSS, the fewer mistakes I will make
Time Spent Practicing using SPSS, Number of Mistakes, Negative Correlation
Types of Correlation
Curvilinear Relationships
Curvilinear Relationships
A linear relationship is not revealed, but the scatter plot shows a pattern in the data. Correlation analyses can still be conducted on data with a curvilinear relationship if it is monotonic
Parametric Assumptions
Homoscedasticity: The assumption that the errors of prediction, for any given predicted value, have equal variances. If unequal variances are present, the data is heteroscedastic and a non-parametric test is needed
Scatterplots are a way to explore data, providing a pictorial representation of the relationship between variables and helping to identify outliers
The p value indicates if something is statistically significant. A p value less than 0.05 suggests a less than 5% chance of rejecting the null hypothesis by mistake
The conventional cut-offs for reporting significance are: 0.05 (less than 5% chance of error), 0.01 (less than 1% chance of error), 0.001 (less than 0.1% chance of error). The exact p value or conventional cut-offs are reported for statistical significance
If p is less than 0.05, then there is a less than 5% chance that we are rejecting the null hypothesis by mistake
The conventional cut-offs for reporting significance are: 0.05 (less than 5% chance of error), 0.01 (less than 1% chance of error), 0.001 (less than 0.1% chance of error)
Correlation coefficients (r or ρ) tell us about the strength of the relationship between our variables
Cohen (1988, pp. 79-81) suggests the following guidelines: small r = .10 to .29, medium r = .30 to .49, large r = .50 to 1.0. These values are independent of positive/negative values (which indicate direction, not strength)
If we square our correlation coefficient (r), we can calculate the amount of variability one variable accounts for in the other variable
Levels of perceived stress
Example taken from Pallant (2020, p. 142)
Linear regression is the next step up after correlation
Dependent variable
The variable we want to predict
Independent variable
The variable used to predict the value of the dependent variable
Uses of Regression
Identifying the strength of the effect that the independent variable(s) have on a dependent variable
Uses of Regression
What is the strength of relationship/effect between dose (IV) and side effects (DV)?
What is the strength of relationship/effect marketing spending (IV) and sales (DV)?
What is the strength of relationship/effect between age (IV) and income (DV)?
Simple linear regression
The process of predicting one variable by assuming a straight-line relationship between this variable and another variable
Parametric Assumptions: Homoscedasticity - assumption that the errors of prediction, for any given predicted value, have equal variances
When analysing markets, a range of assumptions are made about the rationality of economic agents involved in the transactions
The Wealth of Nations was written
1776
Rational
(in classical economic theory) economic agents are able to consider the outcome of their choices and recognise the net benefits of each one
Marginal utility
The additional utility (satisfaction) gained from the consumption of an additional product
R
Correlation coefficient in the bivariate case
R Square
Proportion of variation accounted for by the regression model
Adjusted R Square
Adjusts the R^2 value to account for the entire population, not just the sample data
Standard error of the estimate
Places confidence limits on the predicted value
The ANOVA table tells us whether our overall regression model is significant
When we move on to more complex regression models, this table will tell us the relative importance of each predictor variable