Lesson 2: Describing and Exploring Data

Cards (19)

  • Is a way of organizing data into logical order.
    Frequency distribution
  • the simplest type of graph, in which a rectangle or bar, is erected above each value of X; this is appropriate when the values of X come from a discrete rather than continuous scale.
    Bar graph
  • a graphical display of data using bars of different heights. It is similar to a Bar Chart, but a histogram groups numbers into ranges. The height of each bar shows how many falls into each range.
    Histogram
  • is a graph constructed by using lines to join the midpoints of each interval, or bin. The heights of the points represent the frequencies. It can be created from the histogram or by calculating the midpoints of the bins from the frequency distribution table.
    Frequency polygon
  • a summary measure that attempts to describe a whole set of data with a single value that represents the middle or centre of its distribution.

    measure of central tendency
  • three main measures of central tendency:
    mode, median, and mean.
  • the most commonly occurring value in a distribution.
    mode
  • the middle value in distribution when the values are arranged in ascending or descending order.
    median
  • the sum of the value of each observation in a dataset divided by the number of observations. This is also known as the arithmetic average.
    mean
  • the mode, median and mean are all in the middle of the distribution.
    Symmetrical Distribution
  • the median is often a preferred measure of central tendency, as the mean is not usually in the middle of the distribution.
    skewed distribution
  • it is common for the mean to be ‘pulled’ toward the right tail of the distribution. Although there are exceptions to this rule, generally, most of the values, including the median value, tend to be less than the mean value.
    positively skewed
  • it is common for the mean to be ‘pulled’ toward the left tail of the distribution. Although there are exceptions to this rule, generally, most of the values, including the median value, tend to be greater than the mean value.
    negatively skewed
  • is the extent to which a distribution is stretch or squeezed.
    Dispersion
  • is a measure of distance, namely the distance from the lowest to the highest score.
    range
  • is a measure of variability, based on dividing a data set into quartiles.
    interquartile range
  • a measure of where the beginning and end in a set
    range
  • is the average of the squared differences from the Mean.
    variance
  • (s or σ) is defined as the positive square root of the variance and, for a sample, is symbolized as s (with a subscript identifying the variable if necessary) or, occasionally, as SD.
    standard deviation