s1 w5 Confidence intervals

Created by

Aaliyah Bocus

Cards (92)

Confidence intervals
Statistical method for estimating a population parameter from a sample
Topics covered
Review and Preview
Sampling
The Central Limit Theorem
Estimating a Population Proportion
Estimating a Population Mean
Estimating a Population Standard Deviation or Variance
Sampling frame
List of subjects in the population from which the sample is taken
Simple random sample
A sample in which each possible sample of that size has the same chance of being selected
Selecting a simple random sample
1. Number the subjects in the sampling frame
2. Generate a set of those numbers randomly
3. Sample the subjects whose numbers were generated
Sampling bias
Results from the sample are not representative of the population
Undercoverage - having a sampling frame that lacks representation from parts of the population
Nonresponse bias
Sampled subjects cannot be reached or refuse to participate or fail to answer some questions
Response bias
Subject gives an incorrect response or the way the interviewer asks the questions (or wording of a question in print) is confusing or misleading
Convenience sample
A type of survey sample that is easy to obtain relatively cheaply, but unlikely to be representative of the population
Volunteer sample
Most common type of convenience sample, where subjects volunteer for the sample, but volunteers do not tend to be representative of the entire population
A simple random sample of 100 people is better than a volunteer sample of thousands of people
Steps for conducting a sample survey
1. Identify the population of all subjects of interest
2. Construct a sampling frame
3. Use a random sampling design to select n subjects
4. Be cautious about sampling bias, response bias, and nonresponse bias
Random sampling methods
Simple random sampling
Cluster random sampling
Stratified random sampling
Cluster random sampling
Divide the population into a large number of clusters, select a simple random sample of the clusters, and use the subjects in those clusters as the sample
Preferable if a reliable sampling frame is not available or the cost of selecting a simple random sample is excessive
Usually requires a larger sample size for the same level of precision
Selecting a small number of clusters might result in a more homogeneous sample than the population
Stratified random sampling
Divide the population into separate groups (strata) based on some attribute, then select a simple random sample from each stratum
Advantage is you can include enough subjects from each group you want to evaluate
Disadvantage is you must have a sampling frame and know the stratum of each subject
The Central Limit Theorem states that for a population with any distribution, the distribution of the sample means approaches a normal distribution as the sample size increases
Central Limit Theorem
Tells us that for a population with any distribution, the distribution of the sample means approaches a normal distribution as the sample size increases
If the original population is normally distributed, then for any sample size n, the sample means will be normally distributed
Mean of the sample means
Equal to the population mean μ
Standard deviation of the sample means
Equal to σ/√n, where σ is the population standard deviation
For samples of size n larger than 30, the distribution of the sample means can be approximated reasonably well by a normal distribution
As the sample size increases, the sampling distribution of sample means approaches a normal distribution
Suppose an elevator has a maximum capacity of 16 passengers with a total weight of 2500 lb. Assuming a worst case scenario in which the passengers are all male, what are the chances the elevator is overloaded? Assume male weights follow a normal distribution with a mean of 182.9 lb and a standard deviation of 40.8 lb.
As we proceed from n = 1 to n = 50
The distribution of sample means is approaching the shape of a normal distribution
Elevator capacity
Maximum capacity of 16 passengers with a total weight of 2500 lb
Male weights
Follow a normal distribution with a mean of 182.9 lb and a standard deviation of 40.8 lb
If the elevator is filled to capacity with all males, there is a very good chance the safe weight capacity of 2500 lb will be exceeded
Finite population correction factor
When sampling without replacement and the sample size n is greater than 5% of the finite population of size N, adjust the standard deviation of sample means by multiplying it by the finite population correction factor
Topics covered
Sampling
The Central Limit Theorem
Estimating a Population Proportion
Estimating a Population Mean
Estimating a Population Standard Deviation or Variance
Point estimate
A single value (or point) used to approximate a population parameter
Sample proportion
The best point estimate of the population proportion
Confidence interval
A range (or an interval) of values used to estimate the true value of a population parameter
Confidence level
The probability 1-α (often expressed as the equivalent percentage value) that the confidence interval actually does contain the population parameter, assuming that the estimation process is repeated a large number of times
The correct interpretation of a confidence interval is that we are X% confident that the interval contains the true value of the population parameter
Critical value
The number on the borderline separating sample statistics that are likely to occur from those that are unlikely to occur
The z score separating the right-tail region is commonly denoted by zα/2 and is referred to as a critical value
Critical Values
90% confidence level, zα/2 = 1.645
95% confidence level, zα/2 = 1.96
99% confidence level, zα/2 = 2.575
Margin of error
The maximum likely difference (with probability 1-α) between the observed proportion and the true value of the population proportion
Steps to find the margin of error and construct a confidence interval
1. Verify assumptions
2. Find the critical value zα/2
3. Evaluate the margin of error
4. Find the confidence interval limits
5. Round the confidence interval limits
Based on the 95% confidence interval, we can conclude that more than 75% of adults know what Twitter is