C4 - Correlation

Cards (12)

  • Bivariate data is data which has pairs or values for two variables.
  • You can represent bivariate data on a scatter graph.
  • The variable that is controlled is called the independent or explanatory variable (x-axis).
  • The variable that is measured is called the dependent or response variable (y-axis).
  • Correlation describes the nature of the linear relationship between two variables:
    • strong negative
    • weak negative
    • strong positive
    • weak positive
    • no linear correlation
  • For negatively correlated variables, when one variable increases the other decreases.
    For positively correlated variables, when one variable increases the other also increases.
  • Two variables have a casual relationship if a change in one variable causes a change in the other.
  • When a scatter graph shows correlation, you can draw a line of best fit. This is a linear model that approximates the relationship between the variables. One of the best type of line of best fit is a least squares regression line. This is the straight line that minimises the sum of the squares of the distances of each data point from the line.
  • The regression line of y on x is written in the form y = a + bx
  • The coefficient b tells you the change in y for each unit change in x:
    • if the data is positively correlated, b will be positive
    • if the data is negatively correlated, b will be negative
  • If you know a value of the independent variable from a bivariate data set, you can use the regression line to make a prediction or estimate of the corresponding value of the dependent variable.
  • You should only use the regression line to make predictions for values of the dependent variable that are within the range of the given data.