Statistics

Created by

sarah

Cards (75)

Whats descriptive statistics?
organize, summarize, and present data in a meaningful way. They do not make predictions or generalizations—just describe the data at hand.
Uses numbers, tables, and graphs
Whats inference statistics?
analyze a sample to make conclusions about a larger population.
Uses probability theory to make predictions
Generalizes findings from a sample to a population
Involves hypothesis testing - Is the effect real or due to chance?) - T-test, Chi-square test
Give an example of a inferential statistic?
A study of 100 patients finds that a new drug lowers blood pressure.
Inference: The drug will likely lower blood pressure in the general population.
Statistical test: A t-test shows a p-value < 0.05, meaning the effect is statistically significant.
Give an example of descriptive statistics?
The average age of students in a class is 22 years.
The highest test score is 98, and the lowest is 50.
A histogram shows that most students scored between 70 and 90.
What are the types of descriptive statistic?
Where is the centre of the data?
Mean (Average)
Median (Middle value)
Mode (Most frequent value
Measures of Dispersion (Spread) (How variable is the data?)
Range (Max – Min)
Variance (Average squared deviation from the mean)
Standard Deviation (SD) (Spread of data around the mean)
Discrete (counting number of tabs) or categorical data (age, ethnicity, race):
tables
Graphs
Whats a blob diagram?
A visual representation of data before calculating anything.
For example if you have data set: 58.2, 61.0, 56.6, 61.5, 53.8, 56.9.
You put smallest value (50) and highest value at the ends (62) and then make a mark to correspond to each data point.
You can see from the diagram that majority of the data is disperse, no clusttering
Whats a stem and leaf plot?
A way to display data and see the frequency of the values to see the distribution and probability.
For example, you draw a stem leaf diagram but the lowest number at the top and highest number at the bottom. Then going through each data point you plot onto the table. For example, for 5.3 you go to 5 and then put 3 next to it and then a commar. Then 7.1, go to 7 and put a 1 next to it etc.
Then redraw plot to put values in ascending order.
How do you calculate median? 
Put values in ascending order and then find the middle value
How do you find the mode?
Find the most common value, so 90
How do you find the range?
The difference between the largest and smallest value
Whats quantiles?
i = q (n+1)
n = number of data values
q = the quantile (0.25,0.50,0.75)
Calculate the 1st & 3rd quartile for this data set?
Put data in ascending order:
i= q (n+1)
1st quartile: 0.25 x (6+1) = 1.75
This means the 1st quartile lies between the 1st and 2nd value: 53.8 & 56.6.
Then find the difference: 56.6-53.8 =2.8
next do 2.8 x 0.75= 2.1, as 0.75 was what we calculated in the 1st equation.
Then use the lower value so 53.8 +2.1 = 55.9
3rd quartile: 0.75 x (6+1) = 5.25
61.5-61.0= 0.5
0.5 x 0.25= 0.125
0.125+ 61.0 = 61.125
Why are box plots used?
To help indicate whether a distribution is skewed and whether there are any unusual observations.
Data that is skewed to the Right : have more higher values than lower values around the mean value
Data is skewed to the Left : have more lower values than higher values around the mean
Whats standard deviation?
To Measure the spread of data
Calculate the SD for the six copper values (58.2 61.0 56.6 61.5 53.8 56.9)?
∑ x2 = 20226.1 - you have the sum of the square of each of the 6 values.
∑ x = 348 - this is the sum of all the values
Then you have to square that: 348^2= 121104
Then 121104/n which the number of values you have (6)
20226.1 - 20184 = 42.1
42.1/ (6-1) = 8.42
s= √8.42 = 2.90
Define accuracy?
how close a value is to the true value
Define precision? 
The measure of how close repeated measurements are to each other.
refers to the variability of the data
High variability means less precise
Describe this data?
The dotted line represents the true value.
Smith: his data is distributed around the true conc, with small variation so good precision and accuracy.
Jones: his data is near the mean value but his data is more spread out, less precise but same accuracy as smith.
Brown: his data is more precise but not accurate as its not near the true value. Therefore, machine was not calibrated correctly which explains the shift to the right.
Lee: His values are very close to the true value, so is accurate and precise but he has an outlier
How do you calculate standard error?
s= standard deviation
n = number of data points
Calculate the standard error when x = 60, s = 2.54
2.54/ sr of 60 = 0.328
What are the 3 ways you can describe data plotted on a graph?
Whether its linear or non-linear
Positive correlation (data going upwards) or negative correlation (data going downwards)
Strong/ weak - so how close the data is to the line
Describe these data points?
1 linear positivee, relatively weak
2 linear, 0 (horizontal), strong
3 linear, negative, strong
4 non-linear, positive, weak
5 non-linear, negative, strong, with gap
6 non-linear, 0 weak
Why is absorbance on they y-axis and conc on x-axis?
Concentration is known
Absorbance is unknown
How do you calculate the line of best fit? 
b = slope
a = y-intercept: -y - b x bar
-y= mean values for y
-x = mean values of x
b= Sxy/Sxx

RSS= Syy - b2Sxx
How do you calculate Sxx, Syy & Sxy?
Sxx: (the square of each data point)- the total value squared over the number of data points.
Syy: the same thing but for the Y values
Sxy: Each data point of x & y multiplied by each other- (the total number of values of x and y multiplied by each other)/ the total number of data points
How do you calculate the line of best fit?
b = slope
a = y-intercept: ˉy - bxˉ
ˉy= mean values for y
xˉ= mean values of x
b= Sxy/Sxx
RSS= Syy - b2Sxx
RSS tells you how well the line fits the data; smaller RSS means a better fit.
Calculate the slope, intercept, and RSS?
b= 1.058
a=0.030
RSS= 0.0990
The RSS value is good because its close to 0
What 's observed, caclculated and residual value mean?
Observed Values (y): These are the actual measured values you provided:
5.4, 10.4, 16.1, 21.1, 26.5
True Values (x): These are the "true" or actual concentration values:
5, 10, 15, 20, 25
Known values so plotted on x- axis
Residual value: is the difference between observed and true. A residual close to zero suggests that the observed value is close to the predicted value.
Whats the normal curve?
a symmetric, bell-shaped curve that describes how values of a variable are distributed.
Its also unimodal = single peak
Left side: represents the data points that fall below the mean
Right side: represents the data point that fall above the mean
Middle: most of the data points are located near the mean
What does μ & σ mean? 
μ (mu): Represents the mean or the central value of the distribution. It’s where the peak of the curve occurs. Tells you the location of the peak.
σ (sigma): Represents the standard deviation. On the y- axis it represents the intervals of each value, so what it goes up in. If you have an SD of 1, then the y-axis will go up in 1.
What does the X value represent?
A specific value in the distribution that you're analysing.A specific value in the distribution that you're analysing.
Whats the Z value?
It represents the number of standard deviations a particular X value is away from the mean
Whats the equation to calculate the Z value?
Z = number of SD that you're away from the mean
X= specific value you're measuring
μ = mean
o = SD
Whats the probability density function?
Tells you the probability of getting a specific value (X)
The first part of the equation(so 1/) is a constant that ensures the area under the curve equals to 1 or 100%.
The second part of the equation (so e) tells you how far the the value X is from the mean.
f(X) is the probability density for a specific value X
μ is the mean
σ is the standard deviation .
X is the value at which you're calculating the probability density.
e is Euler's number (approximately 2.71828).
Both μ and -X represent the mean how do you know which symbol to use?
-X is used to represent data from a sample
u is used to describe data from a population
The same goes for SD:
you use S for sample SD
sigma for population SD
If you have a large or small SD what does that mean for the curve?
If you have a large SD then the data will be more spread - the curve will be wider
If you have a small SD theres less spread- the curve will get taller.
What does the total area under the curve equal too?
Must always be 100% or 1
Whats the 68-95-99.7% rule?
For instance if you have a SD of 0.5 and a mean of 5.5, if you were to plot that the middle value will be 5.5 and the rest of the x- axis will go up/down in 0.5 intervals.
If you go 1 SD away from the mean (5.0-6.0) the total area = 68%. This means that 68% of the population is between 5 and 6 ft tall
If you go 2 SD away from the mean (4.5-6.5) = 95% have a height between 4.5 and 6.5.
If you go 3 SD = 99.7%
The normal distribution below has a SD of 10. Approximately what area is contained between 70 and 90?
between 70 & 90 is 2 SD away from the mean.
2 SD = 95%, but 95% represents between 50 -90 as 50 is also 2 SD away from the mean (70). But the Q is asking for 70-90.
Therefore you divide 95%/2 = 47.5%
For the normal distribution below, approximately what area is contained between -2 and 1?
0 = mean value.
1SD = 68%
2 SD = 95%
Between 0 and 1 = 68%/2 = 34%
Between 0 and -2 = 95%/2 = 47.5
to get the total distribution between -2 to 1: 34+ 47.5= 81.5%