A statistical relationship between two variables which may or may not be causal
What is a positive correlation?
Both variables increase
What is a negative correlation?
Both variables decrease while the other increases
What is a co-vary?
When one variable increases or decreases as another variable increases or decreases
What is covariance?
Similar to the concept of variances, but is the product of deviation from the mean of two variables
What is the correlation coefficient?
The standard measure of the relationship between two variables
What is the Pearson correlation coefficient?
Correlation coefficient (r) given by the description above, most common ranges for +1, perfect positive relationship to -1, perfect negative relationship points - falls on a line.
What other definition does the PCC have?
This is a linear estimator of the direction and strength of the relationship.
What is the Spearman Rank Correlation (rho, p,rs)?
Nonparmentic correlation based on ranks of values, used when nominates can be assumed or with ordinal data.
What is the assumption of Pearson Correlation Coefficient?
Continuous data
Pearson Correlation Coefficient is an estimator for what?
Linear
Significance test for correlation is what?
t-distribution
A non-parametric correlation based on ranks of values, used when normality can’t be assumed, or with ordinal data, is know as
Spearman Rank Correlation
Strength levels of correlations - >0.8
very strong
Strength levels of correlations - 0.6-0.8
Strong
Strength levels of correlations - 0.4-0.6
Moderate
Strength levels of correlations - 0.2-0.4
Weak
Strength levels of correlations - <0.2
Very weak
Regression examines the relationship between a (Blank) variable, and (Blank) Variable(s)
Dependent & Independent
Ordinary Least Squares (OLS) that makes it "BLUE" is broken down into what
Best
Linear
Unbiased
Estimator
In OLS "BLUE" The B stands for what
B = it is the most efficient, regression line has the least sample to sample error variation of any estimator
In OLS "BLUE" The L stands for what
L = OLS estimates a straight line
In OLS "BLUE" The U stands for what
U = The mean of n sample estimates is equal to the true population parameter values
In OLS "BLUE" The E stands for what
E = It estimates populations values of Y given sample data for Y and X
What is the formula for linear regression
ŷ = α + βx
What do the different values in the linear regression formula stand for? ŷ = α + βx
ŷ = the predicted values of y
α = line intercept, the value of y when x is zero
β = the slope of the line (B = Change in y/Change in x)
x = is the observed value of x
True or false - In linear regression assumptions, the residuals are dependent which means the value of one error is affected by the value of another.
False the residuals are INDEPENDENT
Residuals are the difference between the (Blank) values of y and the (Blank) values of y.
Predicted & Observed
Regression lines are not designed to exactly (Blank) every observed value of the (Blank) variable.
Predict & Dependent
True or false - Normality can be examined with a histogram and/or Shapiro-Wilk test.
True
Root Mean Square Error is the (Blank Blank)of the error term, and is the square root of the Mean Square Residual (or Error)
Standard deviation
What is linear regression?
Fits a straight line through the data to describe the effect of x on y and how well it predicts x and y
Can the linear regression equation accommodate a bivariate regression
yes - one dependent and one independent variable
What is an error in terms of linear regression
prediction of y has error associated with every observed value of y - i.e difference between the predicted and observed value for every observation
What is the difference between residuals vs fitted values plot
shows the residuals of each observation on the y-axis against the predicted values of the dependent variable for each observation on the x-axis
What are the degrees of freedom
total = # of observation -1
Model = # of predictions -1, the intercept counts
Residual = total df - model df
What is the coefficient of determination (r2 or R squared)
The proportion of variation in the dependent variable explained by the regression model
What is collinearity?
Correlation between independent variables such that they express a linear relationship in a regression model
What are added assumptions in multicollinearity?
No multicollinearity, multiple regression assumption that the independent variable are not highly corrected with each other.
What is a dummy variable?
Binary variable (0/1) that are created to represent nominal categories in a regression model