Save
How to be Scientifically lit
Statistics and Data Visualisation
Week 5
Save
Share
Learn
Content
Leaderboard
Share
Learn
Created by
shannon reilly
Visit profile
Cards (15)
What is the line of best fit defined by?
y =
ax + b
What are the fitted values?
The
prediction
for y based on the
observed
value of x
What are the residuals?
The vertical difference from the
regression
line to the recorded data point
When can the outcome be misleading?
When data doesn't have a
linear relationship
or contains
anomalies
When is a line of best fit not useful?
Typically, the more
complex
a graph is, the less useful a line of best fit is.
What are the 5 steps in linear regression
Plot the data
Consider
assumptions
Fit the regression
Diagnostic plots
Plot the results with
uncertainties
Step 1 in more detail
When considering
regression studies
, it's important to consider
third variables
that may impact bout
explanatory
and
response variables
.
Step 2 in more detail
Linearity of
expected value
, constant
variance
,
independence
, normally distributed residuals
What is the correlation coefficient?
R
, it is between -
1
and 1
1 means a
perfect correlation
with a
positive gradient
and -1 is a
negative gradient
If y = 2x, what is the correlation between x and y?
1
if y = -0.1, what is the correlation between x and y?
-1
if R=0 what does that mean?
This means that there is no
correlation
and no
linear relationship
between x and y
what does R^2 mean?
This is the fraction of
variance
explained
. If R^2 = 0.8, then
80%
of the variance in y is explained by variance in x
When is one way analysis of variance (ANOVA) useful?
When two or more levels in categorical
explanatory
variables.
When there are two
categories
, ANOVA contains an
f-test
analogous to the
two sample t-test
(it is not identical).
What are the assumptions we make in ANOVA analysis?
The validity of our data depends on the assumptions we make about our data
normally distributed
residuals
independent
errors
(if errors are correlated)
random sampling - closely related to independent errors
homogeneity
if
variance
(i.e residuals could all be sampled from same normal distribution)