Week 5

Created by

shannon reilly

Cards (15)

What is the line of best fit defined by?
y = ax + b
What are the fitted values?
The prediction for y based on the observed value of x
What are the residuals?
The vertical difference from the regression line to the recorded data point
When can the outcome be misleading?
When data doesn't have a linear relationship or contains anomalies
When is a line of best fit not useful?
Typically, the more complex a graph is, the less useful a line of best fit is.
What are the 5 steps in linear regression
Plot the data
Consider assumptions
Fit the regression
Diagnostic plots
Plot the results with uncertainties
Step 1 in more detail
When considering regression studies, it's important to consider third variables that may impact bout explanatory and response variables.
Step 2 in more detail
Linearity of expected value, constant variance, independence, normally distributed residuals
What is the correlation coefficient?
R, it is between -1 and 1
1 means a perfect correlation with a positive gradient and -1 is a negative gradient
If y = 2x, what is the correlation between x and y?
1
if y = -0.1, what is the correlation between x and y?
-1
if R=0 what does that mean?
This means that there is no correlation and no linear relationship between x and y
what does R^2 mean?
This is the fraction of variance explained. If R^2 = 0.8, then 80% of the variance in y is explained by variance in x
When is one way analysis of variance (ANOVA) useful?
When two or more levels in categorical explanatory variables.
When there are two categories, ANOVA contains an f-test analogous to the two sample t-test (it is not identical).
What are the assumptions we make in ANOVA analysis?
The validity of our data depends on the assumptions we make about our data
normally distributed residuals
independent errors (if errors are correlated)
random sampling - closely related to independent errors
homogeneity if variance (i.e residuals could all be sampled from same normal distribution)