STATS Module 3

Cards (71)

  • Numerical descriptive measures
    Numerically describe the main characteristics of a data set
  • Numerical summary measures
    • Identify the center and spread of a distribution
    • Identify many important features of a distribution
  • Types of data
    • Ungrouped data
    • Grouped data
  • Measures of center
    • Mean
    • Median
    • Mode
  • Measures of dispersion
    • Range
    • Variance & standard deviation
    • Coefficient of variation
  • If we want to know the income of a "typical" family (given by the center of the distribution), the spread of the distribution of incomes, or the relative position of a family with a particular income, the numerical summary measures can provide more detailed information
  • Measure of center
    Gives the center of a histogram or a frequency distribution curve
  • Measures of center
    • Mean
    • Median
    • Weighted mean
    • Mode
  • Mean for ungrouped data

    Obtained by dividing the sum of all values by the number of values in the data set
  • The mean
    Affected by extreme values (outliers)
  • The median is the value that divides a data set that has been ranked in increasing order in two equal halves
  • Median
    • If the data set has an odd number of values, the median is the middle value
    • If the data set has an even number of values, the median is the average of the two middle values
  • Median

    Less sensitive than the mean to extreme values
  • Mode
    The value that occurs with the highest frequency in a data set
  • Mode
    A data set may have none or more than one mode
  • Mode
    Can be calculated for both quantitative and qualitative data
  • Data set
    • 23
    • 36
    • 14
    • 23
    • 47
    • 32
    • 8
    • 14
    • 26
    • 31
    • 18
    • 28
  • Find the mode for these data
  • The ages of 10 randomly selected students from a class are 21, 19, 27, 22, 29, 19, 25, 21, 22 and 30 years, respectively
  • This data set has three modes: 19, 21 and 22. Each of these three values occurs with a (highest) frequency of 2
  • Find the mode
  • Mode
    One advantage is that it can be calculated for both quantitative and qualitative data, whereas the mean and median can be calculated for only quantitative data
  • The status of five students who are members of the student senate at a college are senior, sophomore, senior, junior, and senior, respectively
  • Senior occurs more frequently than the other categories, so it is the mode for this data set
  • We cannot calculate the mean and median for this data set
  • Weighted mean
    When different values of a data set occur with different frequencies, that is, each value of a data set is assigned different weight, then we calculate the weighted mean to find the center of the given data set
  • To calculate the weighted mean
    1. Denote the variable by x and the weights by w
    2. Add all the weights and denote this sum by ∑w
    3. Multiply each value of x by the corresponding value of w
    4. The sum of the resulting products gives ∑xw
    5. Dividing ∑xw by ∑w gives the weighted mean
  • Laura bought gas for her car four times during June 2018
  • She bought 10 gallons at a price of $2.60 a gallon, 13 gallons at a price of $2.80 a gallon, 8 gallons at a price of $2.70 a gallon, and 15 gallons at a price of $2.75 a gallon
  • What is the average price that Laura paid for gas during June 2018?
  • The variable is the price of gas per gallon, and we will denote it by x
  • The weights are the number of gallons bought each time, and we will denote these weights by w
  • We list the values of x and w in Table 3.3, and find ∑w
  • Then we multiply each value of x by the corresponding value of w and obtain ∑xw by adding the resulting values
  • Finally, we divide ∑xw by ∑w to find the weighted mean
  • Laura paid an average of $2.72 a gallon for the gas she bought in June 2018
  • Relationships among the mean, median, and mode
    • For a symmetric histogram and frequency distribution with one peak, the values of the mean, median, and mode are identical, and they lie at the center of the distribution
    • For a histogram and a frequency distribution curve skewed to the right, the value of the mean is the largest, that of the mode is the smallest, and the value of the median lies between these two
    • If a histogram and a frequency distribution curve are skewed to the left, the value of the mean is the smallest and that of the mode is the largest, with the value of the median lying between these two
  • Measures of variation
    Give information on the spread or variability or dispersion of the data values
  • Range
    Difference between the largest and the smallest values
  • The range can be misleading as it does not account for how the data are distributed and is sensitive to outliers