EDA-Lesson 5

Cards (21)

  • Scatter Plot
    • Explains the correlation between two attributes or variables
    • Represents how closely the two variables are connected
  • Simple Linear Regression

    • Single regressor variable or predictor variable x and a dependent or response variable Y
    • Random errors corresponding to different observations are assumed to be uncorrelated random variables
    • Regression model may be thought of as an empirical model
  • Correlation vs. Causation
    • Correlation indicates a statistical relationship between two variables
    • Causation is a cause-and-effect relationship between variables
  • What does it mean when errors are normally distributed?
    • It means that the distribution of the errors, or residuals, follows a normal distribution
  • Deterministic Relationship
    • A model that predicts a variable perfectly
  • Scatter plots
    • Present the relationship between two variables in a data-set
    • Represent data points on a two-dimensional plane or on a Cartesian system
    • Independent variable or attribute on the X-axis, dependent variable on the Y-axis
    • Also known as scatter graphs or scatter diagrams
    • Effective in revealing the joint variability of x and y or the nature of relationship between them
  • Types of Correlation
    • Positive Correlation
    • Negative Correlation
    • No Correlation
  • Regression Analysis
    • Collection of statistical tools used to model and explore relationships between variables that are related in a non-deterministic manner
    • Used when the relationship between variables is not deterministic
  • Correlation Does Not Imply Causation
  • Errors being normally distributed means
    • The distribution of errors follows a normal distribution
    • Discrepancies between observed values and values predicted by a statistical model are symmetrically distributed around the mean
    • Most errors cluster near the mean with fewer errors occurring further away in both positive and negative directions
    • Simplifies calculations and allows for the application of many statistical tests
  • Method of Least Squares
    • Criterion for estimating the regression coefficients
    • Used to estimate the parameters of a system by minimizing the sum of the squares of the differences between the observed values and the fitted or predicted values from the system
  • Normal distribution in statistical models

    • Simplifies calculations and allows for the application of many statistical tests and procedures that rely on the assumption of normality
  • Possible Interpretations of ρ
    • When ρ is equal to zero, there is no correlation
    • When ρ = 1, there is a perfect, positive, linear relationship
    • When ρ = -1, there is a perfect, negative, linear relationship
    • When ρ is between 0 and 1 in absolute value, it reflects the relative strength of the linear relationship
  • Correlation
    The degree of linear association between two random variables X and Y
  • Coefficient of Determination
    Denoted by r^2, a descriptive measure of the strength of the regression relationship, a measure of how well the regression line fits the data
  • r^2 is often used to judge the adequacy of a regression model. Its value tells that the model accounts for r2×% of the variability in the data
  • Correlation coefficient
    Indicated by ρ, the population (true) correlation coefficient, estimated by r, the sample correlation coefficient or Pearson product-moment correlation coefficient
  • Errors in a statistical model
    • Cluster symmetrically around the mean, with most errors near the mean and fewer errors further away in both positive and negative directions
  • Good linear fit
    • Defined by how well it represents the relationship between the independent variable and the dependent variable
  • Ordinarily, we do not use r^2 for inference about ρ^2
  • Sample Correlation Coefficient
    The estimate of ρ, also referred to as the Pearson product-moment correlation coefficient