Correlation and regression week 6

Cards (37)

  • Correlation
    A measure of the degree of linear relationship between two variables. The emphasis is on the degree to which a linear model may describe the relationship between the variables
  • Types of Correlation
    • Residuals
    • Line of Best Fit
    • Correlation
  • Residual
    The difference between the predicted value and the observed value. A positive residual is when the observed value is higher than the predicted value. A negative residual is when the observed value is lower than the predicted value. A residual is zero if the observed value is equal to the predicted value
  • Line of Best Fit
    The line which gives rise to the smallest total value of residuals squared. Often referred to as the "least squares" method. It is used to predict one value from another
  • Correlation Coefficient
    A value between plus and minus one. The sign (+, -) defines the direction of the relationship between the two variables as positive or negative
  • Correlation does NOT infer causation, but tells us whether there is a relationship or association between the variables. A perfect correlation is ±1
  • The more time I spend revising, the more I remember

    • Memory, Time Spent Revising, Positive Correlation
  • The more time I spend practicing using SPSS, the fewer mistakes I will make
    • Time Spent Practicing using SPSS, Number of Mistakes, Negative Correlation
  • Types of Correlation
    • Curvilinear Relationships
  • Curvilinear Relationships
    A linear relationship is not revealed, but the scatter plot shows a pattern in the data. Correlation analyses can still be conducted on data with a curvilinear relationship if it is monotonic
  • Parametric Assumptions
    Homoscedasticity: The assumption that the errors of prediction, for any given predicted value, have equal variances. If unequal variances are present, the data is heteroscedastic and a non-parametric test is needed
  • Scatterplots are a way to explore data, providing a pictorial representation of the relationship between variables and helping to identify outliers
  • The p value indicates if something is statistically significant. A p value less than 0.05 suggests a less than 5% chance of rejecting the null hypothesis by mistake
  • The conventional cut-offs for reporting significance are: 0.05 (less than 5% chance of error), 0.01 (less than 1% chance of error), 0.001 (less than 0.1% chance of error). The exact p value or conventional cut-offs are reported for statistical significance
  • If p is less than 0.05, then there is a less than 5% chance that we are rejecting the null hypothesis by mistake
  • The conventional cut-offs for reporting significance are: 0.05 (less than 5% chance of error), 0.01 (less than 1% chance of error), 0.001 (less than 0.1% chance of error)
  • Correlation coefficients (r or ρ) tell us about the strength of the relationship between our variables
  • Cohen (1988, pp. 79-81) suggests the following guidelines: small r = .10 to .29, medium r = .30 to .49, large r = .50 to 1.0. These values are independent of positive/negative values (which indicate direction, not strength)
  • If we square our correlation coefficient (r), we can calculate the amount of variability one variable accounts for in the other variable
  • Levels of perceived stress
    • Example taken from Pallant (2020, p. 142)
  • Linear regression is the next step up after correlation
  • Dependent variable
    The variable we want to predict
  • Independent variable
    The variable used to predict the value of the dependent variable
  • Uses of Regression
    • Identifying the strength of the effect that the independent variable(s) have on a dependent variable
  • Uses of Regression
    • What is the strength of relationship/effect between dose (IV) and side effects (DV)?
    • What is the strength of relationship/effect marketing spending (IV) and sales (DV)?
    • What is the strength of relationship/effect between age (IV) and income (DV)?
  • Simple linear regression
    The process of predicting one variable by assuming a straight-line relationship between this variable and another variable
  • Parametric Assumptions: Homoscedasticity - assumption that the errors of prediction, for any given predicted value, have equal variances
  • When analysing markets, a range of assumptions are made about the rationality of economic agents involved in the transactions
  • The Wealth of Nations was written
    1776
  • Rational
    (in classical economic theory) economic agents are able to consider the outcome of their choices and recognise the net benefits of each one
  • Marginal utility

    The additional utility (satisfaction) gained from the consumption of an additional product
  • R
    Correlation coefficient in the bivariate case
  • R Square
    Proportion of variation accounted for by the regression model
  • Adjusted R Square
    Adjusts the R^2 value to account for the entire population, not just the sample data
  • Standard error of the estimate
    Places confidence limits on the predicted value
  • The ANOVA table tells us whether our overall regression model is significant
  • When we move on to more complex regression models, this table will tell us the relative importance of each predictor variable