Numerically describe the main characteristics of a data set
Numerical summary measures
Identify the center and spread of a distribution
Identify many important features of a distribution
Types of data
Ungrouped data
Grouped data
Measures of center
Mean
Median
Mode
Measures of dispersion
Range
Variance & standard deviation
Coefficient of variation
If we want to know the income of a "typical" family (given by the center of the distribution), the spread of the distribution of incomes, or the relative position of a family with a particular income, the numerical summary measures can provide more detailed information
Measure of center
Gives the center of a histogram or a frequency distribution curve
Measures of center
Mean
Median
Weighted mean
Mode
Mean for ungrouped data
Obtained by dividing the sum of all values by the number of values in the data set
The mean
Affected by extreme values (outliers)
The median is the value that divides a data set that has been ranked in increasing order in two equal halves
Median
If the data set has an odd number of values, the median is the middle value
If the data set has an even number of values, the median is the average of the two middle values
Median
Less sensitive than the mean to extreme values
Mode
The value that occurs with the highest frequency in a data set
Mode
A data set may have none or more than one mode
Mode
Can be calculated for both quantitative and qualitative data
Data set
23
36
14
23
47
32
8
14
26
31
18
28
Find the mode for these data
The ages of 10 randomly selected students from a class are 21, 19, 27, 22, 29, 19, 25, 21, 22 and 30 years, respectively
This data set has three modes: 19, 21 and 22. Each of these three values occurs with a (highest) frequency of 2
Find the mode
Mode
One advantage is that it can be calculated for both quantitative and qualitative data, whereas the mean and median can be calculated for only quantitative data
The status of five students who are members of the student senate at a college are senior, sophomore, senior, junior, and senior, respectively
Senior occurs more frequently than the other categories, so it is the mode for this data set
We cannot calculate the mean and median for this data set
Weighted mean
When different values of a data set occur with different frequencies, that is, each value of a data set is assigned different weight, then we calculate the weighted mean to find the center of the given data set
To calculate the weighted mean
1. Denote the variable by x and the weights by w
2. Add all the weights and denote this sum by ∑w
3. Multiply each value of x by the corresponding value of w
4. The sum of the resulting products gives ∑xw
5. Dividing ∑xw by ∑w gives the weighted mean
Laura bought gas for her car four times during June 2018
She bought 10 gallons at a price of $2.60 a gallon, 13 gallons at a price of $2.80 a gallon, 8 gallons at a price of $2.70 a gallon, and 15 gallons at a price of $2.75 a gallon
What is the average price that Laura paid for gas during June 2018?
The variable is the price of gas per gallon, and we will denote it by x
The weights are the number of gallons bought each time, and we will denote these weights by w
We list the values of x and w in Table 3.3, and find ∑w
Then we multiply each value of x by the corresponding value of w and obtain ∑xw by adding the resulting values
Finally, we divide ∑xw by ∑w to find the weighted mean
Laura paid an average of $2.72 a gallon for the gas she bought in June 2018
Relationships among the mean, median, and mode
For a symmetric histogram and frequency distribution with one peak, the values of the mean, median, and mode are identical, and they lie at the center of the distribution
For a histogram and a frequency distribution curve skewed to the right, the value of the mean is the largest, that of the mode is the smallest, and the value of the median lies between these two
If a histogram and a frequency distribution curve are skewed to the left, the value of the mean is the smallest and that of the mode is the largest, with the value of the median lying between these two
Measures of variation
Give information on the spread or variability or dispersion of the data values
Range
Difference between the largest and the smallest values
The range can be misleading as it does not account for how the data are distributed and is sensitive to outliers