s1 w5 Confidence intervals

Cards (92)

  • Confidence intervals
    Statistical method for estimating a population parameter from a sample
  • Topics covered
    • Review and Preview
    • Sampling
    • The Central Limit Theorem
    • Estimating a Population Proportion
    • Estimating a Population Mean
    • Estimating a Population Standard Deviation or Variance
  • Sampling frame
    List of subjects in the population from which the sample is taken
  • Simple random sample
    A sample in which each possible sample of that size has the same chance of being selected
  • Selecting a simple random sample
    1. Number the subjects in the sampling frame
    2. Generate a set of those numbers randomly
    3. Sample the subjects whose numbers were generated
  • Sampling bias
    • Results from the sample are not representative of the population
    • Undercoverage - having a sampling frame that lacks representation from parts of the population
  • Nonresponse bias
    • Sampled subjects cannot be reached or refuse to participate or fail to answer some questions
  • Response bias
    • Subject gives an incorrect response or the way the interviewer asks the questions (or wording of a question in print) is confusing or misleading
  • Convenience sample
    A type of survey sample that is easy to obtain relatively cheaply, but unlikely to be representative of the population
  • Volunteer sample
    Most common type of convenience sample, where subjects volunteer for the sample, but volunteers do not tend to be representative of the entire population
  • A simple random sample of 100 people is better than a volunteer sample of thousands of people
  • Steps for conducting a sample survey
    1. Identify the population of all subjects of interest
    2. Construct a sampling frame
    3. Use a random sampling design to select n subjects
    4. Be cautious about sampling bias, response bias, and nonresponse bias
  • Random sampling methods
    • Simple random sampling
    • Cluster random sampling
    • Stratified random sampling
  • Cluster random sampling
    • Divide the population into a large number of clusters, select a simple random sample of the clusters, and use the subjects in those clusters as the sample
    • Preferable if a reliable sampling frame is not available or the cost of selecting a simple random sample is excessive
    • Usually requires a larger sample size for the same level of precision
    • Selecting a small number of clusters might result in a more homogeneous sample than the population
  • Stratified random sampling
    • Divide the population into separate groups (strata) based on some attribute, then select a simple random sample from each stratum
    • Advantage is you can include enough subjects from each group you want to evaluate
    • Disadvantage is you must have a sampling frame and know the stratum of each subject
  • The Central Limit Theorem states that for a population with any distribution, the distribution of the sample means approaches a normal distribution as the sample size increases
  • Central Limit Theorem
    Tells us that for a population with any distribution, the distribution of the sample means approaches a normal distribution as the sample size increases
  • If the original population is normally distributed, then for any sample size n, the sample means will be normally distributed
  • Mean of the sample means
    Equal to the population mean μ
  • Standard deviation of the sample means
    Equal to σ/√n, where σ is the population standard deviation
  • For samples of size n larger than 30, the distribution of the sample means can be approximated reasonably well by a normal distribution
  • As the sample size increases, the sampling distribution of sample means approaches a normal distribution
  • Suppose an elevator has a maximum capacity of 16 passengers with a total weight of 2500 lb. Assuming a worst case scenario in which the passengers are all male, what are the chances the elevator is overloaded? Assume male weights follow a normal distribution with a mean of 182.9 lb and a standard deviation of 40.8 lb.
  • As we proceed from n = 1 to n = 50
    The distribution of sample means is approaching the shape of a normal distribution
  • Elevator capacity
    Maximum capacity of 16 passengers with a total weight of 2500 lb
  • Male weights
    Follow a normal distribution with a mean of 182.9 lb and a standard deviation of 40.8 lb
  • If the elevator is filled to capacity with all males, there is a very good chance the safe weight capacity of 2500 lb will be exceeded
  • Finite population correction factor
    When sampling without replacement and the sample size n is greater than 5% of the finite population of size N, adjust the standard deviation of sample means by multiplying it by the finite population correction factor
  • Topics covered
    • Sampling
    • The Central Limit Theorem
    • Estimating a Population Proportion
    • Estimating a Population Mean
    • Estimating a Population Standard Deviation or Variance
  • Point estimate
    A single value (or point) used to approximate a population parameter
  • Sample proportion
    The best point estimate of the population proportion
  • Confidence interval
    A range (or an interval) of values used to estimate the true value of a population parameter
  • Confidence level
    The probability 1-α (often expressed as the equivalent percentage value) that the confidence interval actually does contain the population parameter, assuming that the estimation process is repeated a large number of times
  • The correct interpretation of a confidence interval is that we are X% confident that the interval contains the true value of the population parameter
  • Critical value
    The number on the borderline separating sample statistics that are likely to occur from those that are unlikely to occur
  • The z score separating the right-tail region is commonly denoted by zα/2 and is referred to as a critical value
  • Critical Values
    • 90% confidence level, zα/2 = 1.645
    • 95% confidence level, zα/2 = 1.96
    • 99% confidence level, zα/2 = 2.575
  • Margin of error
    The maximum likely difference (with probability 1-α) between the observed proportion and the true value of the population proportion
  • Steps to find the margin of error and construct a confidence interval
    1. Verify assumptions
    2. Find the critical value zα/2
    3. Evaluate the margin of error
    4. Find the confidence interval limits
    5. Round the confidence interval limits
  • Based on the 95% confidence interval, we can conclude that more than 75% of adults know what Twitter is