Test One

Cards (52)

  • Response Variable
    Measures an outcome of a study
  • Explanatory Variable
    May explain or influence changes in a response variable
  • Scatterplot
    Shows the relationship between two quantitative variables measured on the same individuals
  • Correlation (R)

    Measures the strength and direction of the linear relationship between two quantitative variables
  • Correlation makes no distinction between explanatory and response variables. It doesn’t matter which variable is explanatory and which is response. In fact, if we switch the variables the value of correlation would be exactly the same. (Fact for Correlation)

  • Correlation requires that both variables be quantitative. The formula would not make any sense if one or both of the variables is a categorical variable because we would not be able to find the mean of that variable or the standard deviation. (Fact for Correlation)

  • The value of correlation (r) does not change if we change the units of measurement of X, Y , or both variables. The reason is because the formula standardizes each observation, so the units won’t matter. Thus r has no units of measurement. (Fact for Correlation)

  • A positive value of correlation (r) indicates that the data have a positive association. A negative value of (r) indicates that the data have a negative association. (So the sign of r tells us the direction of the scatterplot.) (Fact for Correlation)

  • Correlation (r) is always a number between -1 and 1. Values of r near 0 indicate a very weak linear relationship between the variables. Values of r near 1 and -1 indicate a strong linear relationship between the variables. r =1 or r =−1 indicates that the dots in the scatterplot fall exactly along a straight line. (So the value of r tells us the strength of the linear relationship.) Note how the sign of r changes depending on the direction of each scatterplot and how the value of r changes depending on the strength of the linear relationship in each scatterplot. (Fact for Correlation)

  • Correlation measures the strength of only the linear relationship between two variables. (Correlation tells us nothing about the strength of a curved relationship or a relationship with any form other than linear) (Fact for Correlation)

  • The correlation is not resistant, correlation is strongly affected by a few outlying observations. (Fact for Correlation)

  • Correlation Explanation
    Suppose we have data on variables X and Y for n individuals. Remember we have an x-value and a y-value for each individual. We will label the values so that the values for the first individual are x1 and y1, the values for the second individual are x2 and y2, and so on. Label the mean of the x-values with x and the standard deviation of the x-values with sx. Label the mean and standard deviation of the y-values with y and sy .
  • To find correlation
    Notice in the formula that we are standardizing (in other words finding a z-score for) each of the observations, multiplying corresponding z-scores together, and then finding the average of the products of those z-scores.
  • Formula
    R = (x1+x/sx) (y1-y/sy) + (x2-x/sx) (y2+y/sy) + ... /n-1
  • Regression Line
    A line that describes how a response variable Y changes as an explanatory variable X changes. We often use a _____ line to predict the value of Y for a given value of X.
  • Least Squares Regression Line
    The line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible
  • X
    The mean of X values
  • Y
    The mean of Y values
  • Sx
    The standard deviation of X values
  • Sy
    The standard deviation of Y values
  • R
    Correlation
  • The least square regression line equation is

    Y(Hat) = a+bx
  • The slope (b) equation is

    b = r(sy/sx)
  • The intercept (a) formula is 

    a = y-b(x)
  • Y(Hat) is 

    the predicted Y value
  • The distinction between explanatory and response variables is essential in regression. If you look at the formulas for both slope and intercept of the line you can see that switching X and Y will completely change the values obtained by those formulas. (Fact for Least-Squares Regression)

  • There is a close connection between the correlation and the slope of the least-squares regression line. The formula for the slope of the least-squares line makes this clear. In particular the sign of the correlation will determine the sign of the slope (since standard deviations are always nonnegative) (Fact for Least-Squares Regression)

  • The graph of the least-squares regression line always passes through the point (x , y). This follows directly from the equations (plug in x for x and y for y and the given formulas for a and b and simplify) (Fact for Least-Squares Regression) 

  • The square of the correlation (r2) is the fraction (or proportion) of the variation in values of the response variable that is explained by the linear relationship with the explanatory variable. (Fact for Least-Squares Regression)

  • Residuals
    The difference between an observed (or actual) value of the response variable and the value predicted by the least-squares regression line.
  • The residual equation is

    residual = observed Y (Y) - predicted y (Y(Hat))
  • The sum (and so the mean as well) of all of these residuals for the least-squares regression line is always
    zero.
  • Influential Observation
    An observation is influential for a statistical calculation if removing it would greatly change the result of the calculation. Points that are outliers in the x-direction of a scatterplot (much further to the right or to the left than the other dots in the plot) are often influential for the least-squares regression line.
  • Correlation and (linear) regression describe only linear relationships between variables. Correlation and the least-squares regression line are not resistant. Outliers can be influential to both calculations. Always plot the data to be sure it is roughly linear and to detect outliers, in particular outliers that might be influential. (Caution for Correlation and Regression)

  • Avoid extrapolation. Extrapolation is the use of a regression line to make predictions for values that are far outside the range of values that are used to find the line. Such predictions are often not accurate. For example, if we wanted to predict the number of manatees killed in a year when the number of boats registered is 5 million the data, we plotted in Chapter 4 would not be good for that since the highest number of boats in that data is 719,000. (Caution for Correlation and Regression)

  • Correlation based on averages (or means) are usually too high when applied to individuals. So, for example, I would not want to predict how one student will do on the Final Exam from their score on Test 1 based on a correlation that comes from the mean scores for an entire class on Test 1 and the Final from all of the other semesters I have taught the course. This prediction would probably be too high. (Caution for Correlation and Regression)

  • Lurking variables are variables that we do not measure, but they may explain the relationship between the variables we do measure. Correlation and regression can be misleading if we ignore important lurking variables. (Caution for Correlation and Regression)

  • Association (Correlation) does NOT Imply Causation. An association (or correlation) between two variables, even if it is very strong, does not imply that changes in one variable cause changes in the other variable. (Caution for Correlation and Regression)

  • Population
    The entire group of individuals that we want information about (or from).
  • Sample
    Part of the population that we actually examine in order to gather information.