STAT603 CH4: Basic Statistical Inference

    Cards (165)

    • Descriptive statistics
      The collection, organisation, summarization and presentation of data
    • Inferential statistics
      Involves using samples to draw conclusions about a population and express the results in language of probability
    • Population
      All items with a characteristic of interest (size N)
    • Census
      A study of all items in the population
    • Sample
      A subset of the population (size n)
    • Parameter
      A measure of description from a Population
    • Statistic
      A measure of description from a Sample
    • Statistical Inference
      • Hypothesis Tests
      • Estimation
    • Point Estimate
      A single value that estimates the parameter
    • Interval Estimate
      A range of values that estimate the parameter, associated with some chance that the parameter lies in this interval
    • A sampling distribution arises when repeated samples of the same size are drawn from a particular population (distribution) and a statistic (numerical measure of description of sample data, e.g. a mean, variance or proportion) is calculated for each sample
    • The interest is then focused on the probability distribution (called the sampling distribution) of the statistic
    • Sampling distributions arise in the context of statistical inference i.e. when statements are made about a population on the basis of random samples drawn from it
    • The mean and variance of the sampling distribution of the sample mean (X-bar) are: E(X-bar) = μ and Var(X-bar) = σ^2/n
    • Central Limit Theorem (CLT)

      If X1, X2, ..., Xn are a random sample of size n drawn from a population (with any distribution) with a population mean μ and variance σ^2, then for a sufficiently large n, the mean of the sample (X-bar) will be approximately normally distributed with a mean μ and a variance σ^2/n
    • The size of n depends on the distribution of the population: for a normal distribution, the CLT holds for any value of n; for an 'almost' normal distribution, n should be larger than 30; if the distribution is substantially different from normal, a much larger value of n will be needed for the CLT to hold
    • The basis of many statistical inference methods (hypothesis tests, confidence intervals, statistical models) is formed from the normal distribution, hence such methods require normality (an assumption is that the underlying population is normal)
    • When the assumption of normality is not met, these methods will not be accurate, and other methods such as non-parametric methods or machine learning methods that do not require normality can be considered
    • Interval estimate
      A range of values that estimate a parameter, associated with a percentage of confidence that the range will contain the parameter
    • An interval estimate is more appropriate and useful than a point estimate, since a point estimate can differ each time depending on the sample obtained
    • Point estimate
      A single value that estimates a parameter
    • Interval estimate
      A range of values from L (lower value) to U (upper value) that estimate a parameter
    • Confidence interval
      A range of values from L (lower value) to U (upper value) that estimates a population parameter θ with (1-α)100% confidence
    • Confidence interval example
      • Mean service time of 1.637 minutes to 4.009 minutes
    • Population parameter θ
      Can be μ, σ2 or p
    • Determining confidence interval for population mean μ (population variance σ2 known)
      1. Point estimate
      2. Error E
      3. Interval estimate (ẋ-E, ẋ+E)
    • Narrower confidence interval is more informative
    • Factors affecting width of confidence interval for μ
      • Z (based on α)
      • Standard error (σ/√n)
    • Calculating 95% and 99% confidence intervals
      • Given: σ=5, n=30, ẋ=498.5
      95% CI: (496.71, 500.29)
      99% CI: (496.15, 500.85)
    • Confidence interval for μ (σ2 unknown)

      ± t(n-1)(s/√n)
      1. distribution
      • Bell-shaped, symmetric around μ=0, σ2>1, approaches normal distribution as degrees of freedom increase
    • Most statistical software reports confidence intervals based on t-values
    • Interpreting confidence interval
      We are (1-α)100% confident that the mean falls between L and U
    • Parameter
      The specific value (or range of values) of a population characteristic that is known/assumed
    • Statistical hypothesis
      An assertion (claim) made about the value(s) of a population parameter
    • The conclusion about the truth of a claim is not stated with absolute certainty, but rather in terms of the language of probability
    • Claims to be tested
      • A supermarket receives complaints that the mean content of "1 kilogram" sugar bags is less than 1 kilogram
      • An electrical firm claims the average lifetime of their light bulbs is more than 780 hours
      • A construction company believes the average compressive strength of its concrete is at the required level of 4000psi
    • Null hypothesis (H0)
      A statement concerning the exact value of the population parameter of interest (θ) from the claim that is made
    • Alternative hypothesis (H1)

      A statement concerning the possible range of values of the population parameter θ that is believed to be true if H0 is not true
    • Null and alternative hypotheses for the examples
      • Example 1: H0: μ = 1, H1: μ < 1
      • Example 2: H0: μ = 780, H1: μ > 780
      • Example 3: H0: μ = 4000, H1: μ ≠ 4000
    See similar decks