Statistics

Cards (75)

  • Whats descriptive statistics?
    organize, summarize, and present data in a meaningful way. They do not make predictions or generalizations—just describe the data at hand.
    Uses numbers, tables, and graphs
  • Whats inference statistics?
    analyze a sample to make conclusions about a larger population.
    • Uses probability theory to make predictions
    • Generalizes findings from a sample to a population
    • Involves hypothesis testing - Is the effect real or due to chance?) - T-test, Chi-square test
  • Give an example of a inferential statistic?
    • A study of 100 patients finds that a new drug lowers blood pressure.
    • Inference: The drug will likely lower blood pressure in the general population.
    • Statistical test: A t-test shows a p-value < 0.05, meaning the effect is statistically significant.
  • Give an example of descriptive statistics?
    • The average age of students in a class is 22 years.
    • The highest test score is 98, and the lowest is 50.
    • A histogram shows that most students scored between 70 and 90.
  • What are the types of descriptive statistic?
    Where is the centre of the data?
    • Mean (Average)
    • Median (Middle value)
    • Mode (Most frequent value
    Measures of Dispersion (Spread) (How variable is the data?)
    • Range (Max – Min)
    • Variance (Average squared deviation from the mean)
    • Standard Deviation (SD) (Spread of data around the mean)
    Discrete (counting number of tabs) or categorical data (age, ethnicity, race):
    tables
    Graphs
  • Whats a blob diagram?
    A visual representation of data before calculating anything.
    For example if you have data set: 58.2, 61.0, 56.6, 61.5, 53.8, 56.9.
    You put smallest value (50) and highest value at the ends (62) and then make a mark to correspond to each data point.
    You can see from the diagram that majority of the data is disperse, no clusttering
  • Whats a stem and leaf plot?
    A way to display data and see the frequency of the values to see the distribution and probability.
    For example, you draw a stem leaf diagram but the lowest number at the top and highest number at the bottom. Then going through each data point you plot onto the table. For example, for 5.3 you go to 5 and then put 3 next to it and then a commar. Then 7.1, go to 7 and put a 1 next to it etc.
    Then redraw plot to put values in ascending order.
  • How do you calculate median?

    Put values in ascending order and then find the middle value
  • How do you find the mode?
    Find the most common value, so 90
  • How do you find the range?
    The difference between the largest and smallest value
  • Whats quantiles?
    i = q (n+1)
    n = number of data values
    q = the quantile (0.25,0.50,0.75)
  • Calculate the 1st & 3rd quartile for this data set?
    Put data in ascending order:
    i= q (n+1)
    1st quartile: 0.25 x (6+1) = 1.75
    This means the 1st quartile lies between the 1st and 2nd value: 53.8 & 56.6.
    Then find the difference: 56.6-53.8 =2.8
    next do 2.8 x 0.75= 2.1, as 0.75 was what we calculated in the 1st equation.
    Then use the lower value so 53.8 +2.1 = 55.9
    3rd quartile: 0.75 x (6+1) = 5.25
    61.5-61.0= 0.5
    0.5 x 0.25= 0.125
    0.125+ 61.0 = 61.125
  • Why are box plots used?
    To help indicate whether a distribution is skewed and whether there are any unusual observations.
    Data that is skewed to the Right : have more higher values than lower values around the mean value
    Data is skewed to the Left : have more lower values than higher values around the mean
  • Whats standard deviation?
    To Measure the spread of data
  • Calculate the SD for the six copper values (58.2 61.0 56.6 61.5 53.8 56.9)?
    ∑ x2 = 20226.1 - you have the sum of the square of each of the 6 values.
    ∑ x = 348 - this is the sum of all the values
    Then you have to square that: 348^2= 121104
    Then 121104/n which the number of values you have (6)
    20226.1 - 20184 = 42.1
    42.1/ (6-1) = 8.42
    s= √8.42 = 2.90
  • Define accuracy?
    how close a value is to the true value
  • Define precision?
    The measure of how close repeated measurements are to each other.
    refers to the variability of the data
    High variability means less precise
  • Describe this data?
    The dotted line represents the true value.
    Smith: his data is distributed around the true conc, with small variation so good precision and accuracy.
    Jones: his data is near the mean value but his data is more spread out, less precise but same accuracy as smith.
    Brown: his data is more precise but not accurate as its not near the true value. Therefore, machine was not calibrated correctly which explains the shift to the right.
    Lee: His values are very close to the true value, so is accurate and precise but he has an outlier
  • How do you calculate standard error?
    s= standard deviation
    n = number of data points
  • Calculate the standard error when x = 60, s = 2.54
    2.54/ sr of 60 = 0.328
  • What are the 3 ways you can describe data plotted on a graph?
    Whether its linear or non-linear
    Positive correlation (data going upwards) or negative correlation (data going downwards)
    Strong/ weak - so how close the data is to the line
  • Describe these data points?
    1 linear positivee, relatively weak
    2 linear, 0 (horizontal), strong
    3 linear, negative, strong
    4 non-linear, positive, weak
    5 non-linear, negative, strong, with gap
    6 non-linear, 0 weak
  • Why is absorbance on they y-axis and conc on x-axis?
    Concentration is known
    Absorbance is unknown
  • How do you calculate the line of best fit?
    b = slope
    a = y-intercept: -y - b x bar
    -y= mean values for y
    -x = mean values of x
    b= Sxy/Sxx

    RSS= Syy - b2Sxx
  • How do you calculate Sxx, Syy & Sxy?
    Sxx: (the square of each data point)- the total value squared over the number of data points.
    Syy: the same thing but for the Y values
    Sxy: Each data point of x & y multiplied by each other- (the total number of values of x and y multiplied by each other)/ the total number of data points
  • How do you calculate the line of best fit?
    b = slope
    a = y-intercept: ˉy​ - bxˉ
    ˉ​y= mean values for y
    = mean values of x
    b= Sxy/Sxx
    RSS= Syy - b2Sxx
    RSS tells you how well the line fits the data; smaller RSS means a better fit.
  • Calculate the slope, intercept, and RSS?
    b= 1.058
    a=0.030
    RSS= 0.0990
    The RSS value is good because its close to 0
  • What 's observed, caclculated and residual value mean?
    • Observed Values (y): These are the actual measured values you provided:
    • 5.4, 10.4, 16.1, 21.1, 26.5
    • True Values (x): These are the "true" or actual concentration values:
    • 5, 10, 15, 20, 25
    • Known values so plotted on x- axis
    Residual value: is the difference between observed and true. A residual close to zero suggests that the observed value is close to the predicted value.
  • Whats the normal curve?
    a symmetric, bell-shaped curve that describes how values of a variable are distributed.
    Its also unimodal = single peak
    Left side: represents the data points that fall below the mean
    Right side: represents the data point that fall above the mean
    Middle: most of the data points are located near the mean
  • What does μ & σ mean?

    • μ (mu): Represents the mean or the central value of the distribution. It’s where the peak of the curve occurs. Tells you the location of the peak.
    • σ (sigma): Represents the standard deviation. On the y- axis it represents the intervals of each value, so what it goes up in. If you have an SD of 1, then the y-axis will go up in 1.
  • What does the X value represent?
    A specific value in the distribution that you're analysing.A specific value in the distribution that you're analysing.
  • Whats the Z value?
    It represents the number of standard deviations a particular X value is away from the mean
  • Whats the equation to calculate the Z value?
    Z = number of SD that you're away from the mean
    X= specific value you're measuring
    μ = mean
    o = SD
  • Whats the probability density function?
    Tells you the probability of getting a specific value (X)
    The first part of the equation(so 1/) is a constant that ensures the area under the curve equals to 1 or 100%.
    The second part of the equation (so e) tells you how far the the value X is from the mean.
    • f(X) is the probability density for a specific value X
    • μ is the mean
    • σ is the standard deviation .
    • X is the value at which you're calculating the probability density.
    • e is Euler's number (approximately 2.71828).
  • Both μ and -X represent the mean how do you know which symbol to use?
    -X is used to represent data from a sample
    u is used to describe data from a population
    The same goes for SD:
    you use S for sample SD
    sigma for population SD
  • If you have a large or small SD what does that mean for the curve?
    If you have a large SD then the data will be more spread - the curve will be wider
    If you have a small SD theres less spread- the curve will get taller.
  • What does the total area under the curve equal too?
    Must always be 100% or 1
  • Whats the 68-95-99.7% rule?
    For instance if you have a SD of 0.5 and a mean of 5.5, if you were to plot that the middle value will be 5.5 and the rest of the x- axis will go up/down in 0.5 intervals.
    If you go 1 SD away from the mean (5.0-6.0) the total area = 68%. This means that 68% of the population is between 5 and 6 ft tall
    If you go 2 SD away from the mean (4.5-6.5) = 95% have a height between 4.5 and 6.5.
    If you go 3 SD = 99.7%
  • The normal distribution below has a SD of 10. Approximately what area is contained between 70 and 90?
    between 70 & 90 is 2 SD away from the mean.
    2 SD = 95%, but 95% represents between 50 -90 as 50 is also 2 SD away from the mean (70). But the Q is asking for 70-90.
    Therefore you divide 95%/2 = 47.5%
  • For the normal distribution below, approximately what area is contained between -2 and 1?
    0 = mean value.
    1SD = 68%
    2 SD = 95%
    Between 0 and 1 = 68%/2 = 34%
    Between 0 and -2 = 95%/2 = 47.5
    to get the total distribution between -2 to 1: 34+ 47.5= 81.5%