Cards (15)

  • What is the line of best fit defined by?
    y = ax + b
  • What are the fitted values?
    The prediction for y based on the observed value of x
  • What are the residuals?
    The vertical difference from the regression line to the recorded data point
  • When can the outcome be misleading?
    When data doesn't have a linear relationship or contains anomalies
  • When is a line of best fit not useful?
    Typically, the more complex a graph is, the less useful a line of best fit is.
  • What are the 5 steps in linear regression
    1. Plot the data
    2. Consider assumptions
    3. Fit the regression
    4. Diagnostic plots
    5. Plot the results with uncertainties
  • Step 1 in more detail
    When considering regression studies, it's important to consider third variables that may impact bout explanatory and response variables.
  • Step 2 in more detail
    Linearity of expected value, constant variance, independence, normally distributed residuals
  • What is the correlation coefficient?
    R, it is between -1 and 1
    1 means a perfect correlation with a positive gradient and -1 is a negative gradient
  • If y = 2x, what is the correlation between x and y?
    1
  • if y = -0.1, what is the correlation between x and y?
    -1
  • if R=0 what does that mean?
    This means that there is no correlation and no linear relationship between x and y
  • what does R^2 mean?
    This is the fraction of variance explained. If R^2 = 0.8, then 80% of the variance in y is explained by variance in x
  • When is one way analysis of variance (ANOVA) useful?
    When two or more levels in categorical explanatory variables.
    When there are two categories, ANOVA contains an f-test analogous to the two sample t-test (it is not identical).
  • What are the assumptions we make in ANOVA analysis?
    The validity of our data depends on the assumptions we make about our data
    • normally distributed residuals
    • independent errors (if errors are correlated)
    • random sampling - closely related to independent errors
    • homogeneity if variance (i.e residuals could all be sampled from same normal distribution)