data presentation

    Cards (54)

    • Bivariate data

      Data which has pairs of values for two variables
    • Scatter diagram

      • Represents bivariate data
    • Independent or explanatory variable

      Something which the researcher can control, usually plotted on the x-axis
    • Dependent or response variable

      Measured by the researcher, usually plotted on the y-axis
    • Correlation
      Describes the nature of linear relationships between two variables
    • Negative correlation

      One variable decreases when the other increases
    • Positive correlation

      One variable increases with the increase of the other variable
    • Causal relationship
      A change in one variable causes a change in the other
    • Example 1: Study of a city
      • Distance from city centre (km)
      • Population density (people/hectare)
    • As distance from the centre increases

      The population density decreases
    • Regression line
      Line of best fit which minimises the sum of the squares of the distances of each datapoint from the line
    • Regression line equation

      y = a + bx
    • Coefficient b

      Tells you the change in y for each unit change in x
    • For positively correlated data, b is positive
    • For negatively correlated data, b is negative
    • Interpolation should only be done within the range of data given
    • Extrapolation out of the data range gives a much less reliable estimate
    • Example 2: Daily mean windspeed and daily maximum gust

      • Regression line equation: g = 7.23 + 1.82w
    • The daily maximum gust is expected to increase by approximately 1.8 knots when the daily mean windspeed increases by 1 knot
    • Measure of central tendency

      Describes the centre of the data
    • Predicted daily maximum gust when daily mean speed is 16 knots is 36.35 knots
    • Measures of central tendency

      • Mode or modal class
      • Median
      • Mean
    • Mode or modal class

      The value or class which occurs most often
    • Median
      The middle value when the data values are put in order
    • Mean

      Calculated using Σx/n
    • Calculating mean from cumulative frequency table

      Σxf/Σf
    • Mean uses all values in the data therefore it gives a true measure of data, but it is affected by extreme values
    • Calculating mean, median and modal class for continuous data in grouped frequency table

      Find the midpoint of each class interval
    • Median (Q2)

      Splits the data into two equal halves (50%)
    • Lower quartile (Q1)

      One quarter of the way through the dataset
    • Upper quartile (Q3)

      Three quarters of the way through the dataset
    • Percentiles
      Split the data set into 100 parts
    • Finding lower and upper quartiles for discrete data

      Divide n by 4 (lower quartile) or find 3/4 of n (upper quartile)
      2. If this is a whole number, the lower or upper quartile is the midpoint between this data point and the number above. If it is not, round up and pick this number.
    • Calculating standard deviation from grouped frequency table
      Find Σfx^2, Σfx and Σf
      2. Use formula σ^2 = Σfx^2/Σf - (Σfx/Σf)^2
      3. Square root variance to find standard deviation
    • Coding
      Each value in the data can be coded to give a new set of values, which is easier to work with
    • Coding using formula y = (x-a)/b

      Mean of coded data: y_bar = (x_bar-a)/b
      Standard deviation of coded data: σ_y = σ_x/b
    • Estimating medians, quartiles and percentiles from grouped frequency table

      Assume data values are evenly distributed within each class
      Q1 = (n/4)th data value
      Q2 = (n/2)th data value
      Q3 = (3n/4)th data value
    • Measures of spread

      • Range
      • Interquartile range (IQR)
      • Interpercentile range
    • Variance (σ^2)

      Shows how spread out the data is
    • Calculating variance
      σ^2 = Σ(x-x_bar)^2/n
      2. σ^2 = Σx^2/n - (Σx/n)^2
      3. σ^2 = Σf(x-x_bar)^2/Σf = Σfx^2/Σf - (Σfx/Σf)^2
    See similar decks