+Less time consuming and mor inexpensive than a census
+fewer people have to respond
+less to process
-less accurate
-may not be large enough to consider/give information about small groups within the population.
Sampling Unit
an individual unit of a population
Sampling Frame
Sampling units are individually named or numbered to form a list
3 Types of Non-Random sampling
Simple Random sampling
Systematic sampling
Stratified sampling
Process of Simple Random sampling?
Create a sampling frame for the population and use it as the sample size (range) in a RNG
Generate and select the corresponding unit.
If a number is repeated, ignore it and repick.
Repeat until desired sample size is achieved.
Simple Random sampling: Advantages and Disadvantages
+Free of bias
+Cheap and easy for small populations
+Each sampling unit has an equal chance of selection.
-Not suitable for large populations
-Large populations can be time consuming and expensive
-Sampling fram required.
Process of Systematic sampling?
Create a sampling frame
Calculate a regular interval to choose values from according to sample size required. eg: sample size 20 from total 100, 100/20= 5, pick every 5.
Use a RNG with range of sampling frame to choose the first person
Systematic sampling: Advantages and Disadvantages
+simple and quick to use
+suitable for large samples and large populations
-sample frame is needed
-can introduce bias if sampling frame is not random
Process of Stratified sampling?
Divide the population into mutually exclusive strata (groups)
ensure proportion taken to sample from each stratum is the same.
By creating a sampling frame for each stratum and using the sample size as a range for an RNG, do simple random sample of each until desired sample size is reached.
Stratified sampling equation
number to be taken from stratum= (number in stratum/number in population)x desired sample size
Stratified sampling: Advantages and Disadvantages
+sample accurately reflects population
+guarantees proportional representation of groups within a population
-distinct, mutually exclusive traits required.
-selection within strata have same disadvantages as simple random sampling.
2 types of non-random sampling
quota sampling
opportunity sampling
Process of Quota sampling?
divide the population into groups based on relevant quotas e.g. age, income, gender
Identify proportions for strata
Recruit sampling units until quota has been reached.
If a person refuses for interview or quota they fit is full, ignore and move on until all quotas are met.
Quota sampling : Advantages and Disadvantages
+Allows a small sample to still be representative of a population
+No sampling frame required
+quick, easy, inexpensive
+easy comparison between strata
-non random sampling can introduce bias
-population must be divided
-can be costly or inaccurate
-increasing scope of study = more groups =more time and money
Process of Opportunity sampling?
Choose a criteria.
Only choose people who fit the criteria
Continue asking people who fit this criteria until ideal sample size is achieved.
Opportunity sampling: Advantages and Disadvantages
+Easy to carry out.( You pick people available at the time)
+Inexpensive
-Unlikely to provide a representative sample
-Highly dependent on individual researcher.
Measures of Central Tendency
Median
Mean
Mode
Measures of Spread
Range
Standard Deviation
IQR
Variance
Quantitative data definition?
Anything numerical that you can number and count.
Qualitative data definition?
Non-numeric data represent by information or labels. eg. colours
Stratified sampling equation?
tosample=(stratumsize/population)overallsample
Continuous data definition?
Data that can takeany value in a given range.
e.g. temperature, time - there is a range
Discrete data definition?
Data that can only take on specific values in a given range.
e.g. shoe size, number of pages in a book
For coded data, what happens to standard deviation when added or subtracted?
Nothing, it stays the same.
For coded data, what happens to standard deviation when multiplied or divided?
It changes based in what was multiplied or divided.
This is because standard deviation measures range.
For coded data, why is the average affected by both adding/subtracting and multiplying/dividing?
Average is a measure of central tendency. As it isn't measuring a range, if all values +100, so would the mean
Outlier equation?
any value that is less than Q1-k(Q3-Q1)
any value that is greater than Q3+k(Q3-Q1)
Frequency density formula for histograms?
Frequency density = frequency/class width
What is cleaning of data?
Removal of any anomalies.
What are box plots used for?
Representing quartiles, maximum and minimum values and outliers.
What type of data should cumulative frequency diagrams be used for?
Data in a grouped frequency table.
You use the cumulative frequency diagram to help find estimates for quartiles and percentiles etc.
What type of data should histograms be used for?
Grouped continuous data.
area of bar is proportional to frequency.
you have a rough shape of data spread.
5 ways to describe correlation?
strong negative correlation
weak negative correlation
no linear correlation
strong positive correlation
weak positive correlation
Mutually exclusive definition?
Events which cannot occur at the same time. No outcomes in common.