C3 - Representations of Data

Cards (10)

  • An outlier is an extreme value that lies outside the overall pattern of data.
  • A common definition of an outlier is any value that is:
    • either greater than Q3+k(Q3-Q1)
    • or less than Q1-k(Q3-Q1)
  • Some outliers are legitimate values which could still be correct. However, there are occasions where an outlier should be removed from the data since it is clearly an error and it would be misleading to keep it in. These data values are known as anomolies.
  • The process of removing anomalies from a data set is known as cleaning the data.
  • Anomalies can be the result of experimental or recording error, or could be data values which are not relevant to the investigation.
  • If you are given data in a grouped frequency table, you are not able to find the exact values of the median and quartiles. You can draw a cumulative frequency diagram and use it to help find estimates for the median, quartiles and percentiles.
  • Grouped continuous data can be represented in a histogram. In a histogram, the area of the bar is proportional to the frequency in each class. This allows you to use a histogram to represent grouped data with unequal class intervals.
  • frequency density = frequency / class width
  • When comparing data sets, you can comment on:
    • a measure of location
    • a measure of spread
  • Joining the middle of the top of each bar in a histogram with equal class widths forms a frequency polygon.