A selection of observations taken from a subset of the population which is used to find out information about the population as a whole
Sampling frame
When sampling units of a population are individually named or numbered to form a list
Qualitative data
Variables or data associated with non-numerical observations
Quantitative data
Variables or data associated with numerical observations
Continuous variable
A variable that can take any value in a given range
Discrete variable
A variable that can take only specific values in a given range
Name the types of random sampling
Simple random sampling
Systematic sampling
Stratified sampling
Name the types of non-random sampling
Opportunity sampling
Quota sampling
Interpolation
Making an estimate of the value of 'y' within the range of given data
Extrapolation
Making an estimate of the value of 'y' outside the range of given data
What are the conditions for which X can be modelled as a binomial distribution B(n,p)
Fixed number of trials, n
Two possible outcomes (success or failure)
Fixed probability of success, p
Trials are independent of each other
Bivariate data
Data which has pairs of values for two variables
What is the multiplication rule for probability?
P(AUB)= P(A)+P(B)-P(ANB)
Independent events
When one event has noeffect on the other
Mutually exclusive events
When two events cannot occur at the same time/ have no outcome in common
Random variable
A variable whose value depends on the outcome of a random event. They are denoted by capital letters e.g. X, Y, A, B
Sample space
The range of values that a random variable can take
Discrete uniform distribution
When there's an equal chance for all outcomes e.g. a fair-sided dice
Null hypothesis
H0, is the hypothesis that we assume to be correct
Alternative hypothesis
H1, is the hypothesis that tells you about the parameter if your assumption is shown to be wrong
Critical region
The range of values of a test statistic "X" that would lead you to reject the null hypothesis
Critical values
The boundary values of the critical region
Acceptance region
The area in which we accept the null hypothesis
Under what circumstance do we reject the null hypothesis?
If the p-value is lower than the significance level
Under what circumstance do we accept the null hypothesis?
If the p-value is greater than the significance level
Actual significance level
The probability of incorrectly rejecting the null hypothesis; calculated by adding the probabilities within the critical region
Product moment correlation coefficient (PMCC)
Describes the linear correlation between two variables; takes values between -1 and 1
What are the meanings of different PMCC values?
r= 1; perfect positive correlation
r= -1; perfect negative correlation
r= 0; no correlation
r= +/-0.8 onwards; strong correlation
What is meant by P(B|A)?
The probability that B occurs given that A has already occurred
For independent events what is P(A|B) and P(B|A)?
P(A|B) = P(A|B') = P(A)
P(B|A) = P(B|A') = P(B)
Under what conditions can you approximate binomial as normal?
If n is large
If p is close to 0.5
Characteristics of a normal distribution
Parameters, μ the mean and σ2 the variance
Symmetrical mean=median=mode
Total area under curve= 1
Points of inflection, μ +/- σ
Has P(X=a)=0 for any a; true for any continuous distribution
Bell-shaped curve with asymptotes at either end
Stratified sampling
Divide the population into homogeneous groups (strata) and randomly select samples from each group
Calculate the number of people/items you need from each strata using the formula: Number sampled in stratum= number in stratum/number in population x overall samplesize
Allocate each person a unique number
Carry out a simplerandom sample for each stratum
Simple random sampling
A sample of size "n" where every sample of size "n" has an equal chance of being selected
Form a sampling frame (obtain a list of items/people)
Assign each item/person a unique number
Using a random number generator, select n of these
Systematic sampling
The required elements are chosen at regular intervals from an ordered list
Form a sampling frame/ calculate how many people or items needed in the sample
Allocate each person/ item a unique number
Randomly select a number from a given/ calculated range (e.g. 1-10) using a random number generator
Then select every 10th (for example) until the required amount of people/ items are selected
Opportunity sampling
Consists of taking the sample from people who are available at the time the study is carried out and who fit the criteria you are looking for
Quota sampling
An interviewer or researcher selects a sample that reflects the characteristics of the whole population