Explains the correlation between two attributes or variables
Represents how closely the two variables are connected
Simple Linear Regression
Single regressor variable or predictor variable x and a dependent or response variable Y
Random errors corresponding to different observations are assumed to be uncorrelated random variables
Regression model may be thought of as an empirical model
Correlation vs. Causation
Correlation indicates a statistical relationship between two variables
Causation is a cause-and-effect relationship between variables
What does it mean when errors are normally distributed?
It means that the distribution of the errors, or residuals, follows a normaldistribution
Deterministic Relationship
A model that predicts a variable perfectly
Scatter plots
Present the relationship between two variables in a data-set
Represent data points on a two-dimensional plane or on a Cartesian system
Independent variable or attribute on the X-axis, dependent variable on the Y-axis
Also known as scatter graphs or scatter diagrams
Effective in revealing the joint variability of x and y or the nature of relationship between them
Types of Correlation
Positive Correlation
Negative Correlation
No Correlation
RegressionAnalysis
Collection of statistical tools used to model and explore relationships between variables that are related in a non-deterministic manner
Used when the relationship between variables is not deterministic
Correlation Does Not Imply Causation
Errors being normally distributed means
The distribution of errors follows a normal distribution
Discrepancies between observed values and values predicted by a statistical model are symmetrically distributed around the mean
Most errors cluster near the mean with fewer errors occurring further away in both positive and negative directions
Simplifies calculations and allows for the application of many statistical tests
Method of Least Squares
Criterion for estimating the regression coefficients
Used to estimate the parameters of a system by minimizing the sum of the squares of the differences between the observed values and the fitted or predicted values from the system
Normal distribution in statistical models
Simplifies calculations and allows for the application of many statistical tests and procedures that rely on the assumption of normality
Possible Interpretations of ρ
When ρ is equal to zero, there is no correlation
When ρ = 1, there is a perfect, positive, linear relationship
When ρ = -1, there is a perfect, negative, linear relationship
When ρ is between 0 and 1 in absolute value, it reflects the relative strength of the linear relationship
Correlation
The degree of linear association between two random variables X and Y
CoefficientofDetermination
Denoted by r^2, a descriptive measure of the strength of the regression relationship, a measure of how well the regression line fits the data
r^2 is often used to judge the adequacy of a regression model. Its value tells that the model accounts for r2×% of the variability in the data
Correlation coefficient
Indicated by ρ, the population (true) correlation coefficient, estimated by r, the sample correlation coefficient or Pearson product-moment correlation coefficient
Errors in a statistical model
Cluster symmetrically around the mean, with most errors near the mean and fewer errors further away in both positive and negative directions
Good linear fit
Defined by how well it represents the relationship between the independent variable and the dependent variable
Ordinarily, we do not use r^2 for inference about ρ^2
Sample Correlation Coefficient
The estimate of ρ, also referred to as the Pearson product-moment correlation coefficient