statistics

Cards (48)

  • primary data - collect yourself
    secondary data - collected by someone else
  • discrete data - counted data
    continuous data - measured data
  • quantitative - numerical
    qualitative - words
  • census - entire population
    sample - part of population
  • census
    advantage - limited/no bias
    disadvantage - time consuming,expensive and difficult to collect
  • Sample
    advantage - cheaper,less time consuming easier
    disadvantage - could be biased
  • simple random sampling - every member of the sample has an equal chance of being selected
  • Non-random sampling where the sample selection is based on factors other than just random chance,in other words it is biased in nature
  • Non random sampling types are - opportunity sampling,cluster sampling,quota sampling,systematic sampling,stratified sampling
  • Cluster sampling
    The population is split into smaller groups called clusters. One or more clusters are chosen at random and the sample is everyone in those clusters
  • Opportunity sampling advantages
    • Easy to do
    • Quick
    • Cheap
  • Cluster sampling advantages
    • Representative of the population if clusters are representative of the population
    • Time and cost efficient
  • Opportunity sampling disadvantage
    • Can be biased based on where you chose to stand/time of day
  • Cluster sampling disadvantage
    • High sampling error
    • Complex
  • Opportunity sampling
    A sampling type where you pick a location to stand and ask people at that location
  • Quota sampling
    Where you select people of a certain type for your sample
  • Systematic sampling
    The sample is chosen from the sample frame by picking randomly and choosing the rest of the sample at regular intervals
  • Stratified sampling
    Each sample matches the proportion of the entire population
  • Quota Advantages
    quick and easy
    cheap
    representative of target population
  • Quota disadvantages
    large potential for bias
    Not generalisable to population
  • Systematic advantages
    Easy to do
    Easy to use for a large population
  • Systematic disadvantages
    Potential for bias
  • How to conduct a random sample
    Number the population in a list
    Randomly select n members using a random number generator
    Ignore,repeats continue until you have n unique numbers
  • When getting the midpoint of age from a table you have to add 0.5 to the midpoint
  • For discrete data
    LQ=n+1/4
    Median=n+1/2
    UQ=3(n+1)/4
  • For continuous
    LQ=n/4
    Median=n/2
    UQ=3n/4
  • Cumulative frequency graph
    Plot the cumulative frequency against the END POINTS
    start from 0
    Join all points curve
  • Histogram
    Frequency=frequency density x class width
  • Box plots
    No skew median is in middle mode=median=mean
    Positive skew median closer to LQ mode<median<mean
    Negative skew meadian closer to UQ mode>median>mean
  • When comparing data from box plots you need to comment on the median,the spread(IQR or range) and skewness and also add context
  • Any number more than 1.5 IQR's away from the nearest quartile is an outlier
    You still need to include any outliers as x or .
    Anything more than 2 standard deviations away from the mean is also considered an outlier.
  • Regression line
    In the form y=a+bx
    the co-efficient tells you the change in y for each unit change in x
  • Mutually exclusive is two events that can't happen at the same time
  • AUB
    Everything in A and B
  • AnB
    The intersection of A and B
  • A'
    Everything but A
  • Discrete - typically integer value
    E.g shoe size,number of students
  • Continuous - typically fractions or d.p
    E.g foot length,height,weight,time,temperature,age
  • Binomial Distribution is used when:
    An experiment is repeated a given number of times
    When there are only 2 outcomes (fail/success)
    The trials are independent from each other,so the probability of success is the same each time
  • Binomial PD if (X=5)
    Binomial CD if (X<or equal to 5)