advance stats

Cards (58)

  • Random variables can be either discrete or continuous.
  • Continuous random variables can assume all values between any two given values of the variables. Many continuous variables have distributions that are bell-shaped and are called approximately normally distributed variables.
  • The shape and position of the normal distribution curve depends on two parameters, the mean and the standard deviation.
  • Each normally distributed variable has its own normal distribution curve, which depends on the values of the variable’s mean and standard deviation.
  • The larger the standard deviation, the more dispersed or spread out the distribution becomes.
  • Properties of the Normal Distribution 1. The normal distribution curve is bell- shaped. 2. The mean, median and mode are equal and located at the center of the distribution. 3. The normal distribution curve is unimodal. 4. The curve is symmetrical about the mean. 5. The curve is continuous. 6. The curve never touches the x- axis. 7. The total area under the normal distribution curve is equal to 1 or 100%
  • Because each normally distributed variable has its own mean and standard deviation, the shape and location of these curves will vary. Therefore, different table of values of areas under each curve will be needed for each variable.
  • The standard normal distribution is a normal distribution with the mean of 0 and a standard deviation of 1.
  • A four- decimal place number in the table gives the area under the standard normal curve between o and a specified number z.
  • Because of the importance of areas under the standard normal curve, table of those areas have been constructed.
  • Although the z values can be negative, areas must be positive.
  • As illustrated in the previous section, the area under the standard normal curve can be determined using the z table. However, in reality, a continuous random variable may have a normal distribution with the value of the mean and standard deviation different from 0 and 1, respectively. The normal distribution must be converted to a standard normal distribution. This procedure is called standardizing a normal distribution.
  • The normal distribution curve can be used as a probability distribution curve for normally distributed variables. The area under the curve corresponds to a probability.
  • Hypothesis testing is a decision-making process for evaluating claims about a population.
  • There are two specific statistical tests for hypothesis testing on means: z test and the t test.
  • Every hypothesis testing begins with the statement of a hypothesis.
  • A statistical hypothesis is an inference about a population parameter. This inference may or may not be true.
  • The only certain way of finding the truth or falsity of a hypothesis is by examining the entire population. Because this is not always feasible, a sample is instead examined for the purpose of drawing conclusions.
  • The null hypothesis, symbolized as H0, states that there is no difference between a parameter and a specific value. The alternative hypothesis, symbolized as Ha, states a specific difference between a parameter and a specific value.
  • In order to state the hypothesis correctly, the researcher must translate correctly the claim into mathematical symbols. There are three possible sets of statistical hypotheses.
  • 𝐻0: π‘π‘Žπ‘Ÿπ‘Žπ‘šπ‘’π‘‘π‘’π‘Ÿ = 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐 π‘£π‘Žπ‘™π‘’π‘’ π»π‘Ž: π‘π‘Žπ‘Ÿπ‘Žπ‘šπ‘’π‘‘π‘’π‘Ÿ β‰  𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐 π‘£π‘Žπ‘™π‘’π‘’. This is a two-tailed test.
  • 𝐻0: π‘π‘Žπ‘Ÿπ‘Žπ‘šπ‘’π‘‘π‘’π‘Ÿ = 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐 π‘£π‘Žπ‘™π‘’π‘’ π»π‘Ž: π‘π‘Žπ‘Ÿπ‘Žπ‘šπ‘’π‘‘π‘’π‘Ÿ < 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐 π‘£π‘Žπ‘™π‘’π‘’. This is a left-tailed test.
  • 𝐻0: π‘π‘Žπ‘Ÿπ‘Žπ‘šπ‘’π‘‘π‘’π‘Ÿ = 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐 π‘£π‘Žπ‘™π‘’π‘’ π»π‘Ž: π‘π‘Žπ‘Ÿπ‘Žπ‘šπ‘’π‘‘π‘’π‘Ÿ > 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐 π‘£π‘Žπ‘™π‘’π‘’. This is a right-tailed test.
  • In the hypothesis testing, there are four possible outcomes as shown in the table. In reality, the null hypothesis may or may not be true. The decision to reject or not to reject is on the basis of the data obtained from the sample of the population.
  • A type I error occurs if one rejects the null hypothesis when it is true. A type II error occurs if one does not reject the null hypothesis when it is false.
  • The level of significance is the maximum probability of committing a type I error.
  • This probability is symbolized by 𝛼 (Greek letter alpha). That is, P(type I error)= 𝛼. The probability of type II error is symbolized by 𝛽 (πΊπ‘Ÿπ‘’π‘’π‘˜ π‘™π‘’π‘‘π‘‘π‘’π‘Ÿ π‘π‘’π‘‘π‘Ž). That is, P(type II error)=𝛽. Although, in most hypothesis testing situations, 𝛽 cannot be computed.
  • Generally, statisticians agree on using the arbitrary significance levels: the 0.10, 0.05, and 0.01 level. That is, if the null hypothesis is rejected, the probability of a type I error will be 10%, 5% or 1% and the probability of a correct decision will be 90%, 95% or 99%, depending on which level of significance is used.
  • when 𝛼 = 0.05, there is a 5% chance of rejecting a true null hypothesis.
  • In a hypothesis-testing situation, the researcher decides what level of significance to use. It does not have to be the levels mentioned above. It can be any level, depending on the seriousness of the type I error.
  • The critical value determines the critical and the noncritical regions. The critical region or the rejection region is the range of values of the test value that indicates that there is a significant difference and that the null hypothesis should be rejected.
  • The noncritical or nonrejection region is the range of values of the test value that indicates that the difference was probably due to chance and that the null hypothesis should not be rejected.
  • The rejection region can be located on both sides with the nonrejection region in the middle or it can be on the left side or the right side of the nonrejection region.
  • A test with two rejection regions is called a two-tailed test
  • A one-tailed test indicates that the null hypothesis should be rejected when the test values is in the critical region on one side of the parameter.
  • A one-tailed test is either right-tailed when the inequality in the alternative hypothesis is greater than (>) or left-tailed when the inequality is less than (<).
  • If the test is two-tailed, the critical value will be either positive or negative. If the test is left-tailed, the critical value will be negative. If the test is right-tailed, the critical value will be positive.
  • When the population standard deviation is unknown and the sample size is less than 30, the z test is inappropriate for testing hypothesis involving means. A different test called the t test is used.
  • T test It is a statistical test for the mean of a population and is used when the population is normally distributed or approximately normally distributed, 𝜎 is unknown and 𝑛 < 30.
  • Another area of Statistics involves determining whether a relationship between two or more numerical or quantitative variable exists. The statistical method to be used is correlation.