L2

Cards (123)

  • We see data types, descriptive statistics, correlations, probability theory, and probability distributions. These are very useful to understand the data and information!
  • Science and engineering work through hypotheses and questions, and need sound analyses and accurate information!
  • Hypothesis
    A proposition made as a basis for reasoning, without any assumption of its truth
  • We often need to assess if an outcome is credible and how much credible
  • For example: If I tell you that there is a safety measure working 9 out of 10 times with a standard deviation of 1.5, would you rely on this measure?
  • We often need to test if a hypothesis is likely to be true or not
  • Or, how much would you rely on this measure?
  • Washington, S., M. G. Karlaftis, and F. L. Mannering: 'Statistical and Econometric Methods for Transportation Data Analysis, 2nd ed. Chapman & Hall/CRC Press, Boca Raton, Florida. 2010'
  • We need techniques and methods to formulate, test, and make informed decisions regarding engineering questions and hypotheses
  • Confidence intervals
    • CI is a range of values which we are fairly sure that our true value lies in
  • Hypothesis testing
    • HT is a way for you to test the results of a survey or experiment to see if you have meaningful results
  • Cross-validation
    • CV is used for assessing how the results of a statistical analysis will generalize to an independent data set
  • CI is a range of values which we are fairly sure that our true value lies in
  • HT is a way for you to test the results of a survey or experiment to see if you have meaningful results
  • CV is used for assessing how the results of a statistical analysis will generalize to an independent data set
  • How much confidence you put on your estimate: In practice, we have samples to calculate statistics like mean and other parameters
  • Confidence Interval (CI) is used to make interval estimates with a lower and upper boundary within which an unknown parameter will lie with a prespecified level of confidence
  • An interval calculated using sample data contains the true population parameter with some level of confidence
  • Confidence Intervals (CIs) can be constructed with any level of confidence like 90%, 95%, or 60%
  • The wider a CI, the more confident the researcher is that it contains the population parameter
  • Lower value is the lower confidence limit (LCL) and upper value is the upper confidence limit (UCL)
  • Types of confidence intervals
    • Confidence Interval for mean (μ) with known variance (σ2)
    • Confidence Interval for mean (μ) with unknown variance (σ2)
    • Confidence Interval for a population proportion
    • Confidence Interval for a population variance
  • Wider CI means there is room for larger variability in estimate, but it also means estimate has more uncertainty (so less reliable), it is a trade-off
  • Central Limit Theorem (CLT) states that a sufficiently large random sample drawn from any population with mean 𝜇 and standard deviation 𝜎, the sample mean ത𝑋 is approximately normally distributed with mean 𝜇 and standard deviation 𝜎/ 𝑚
  • When a random sample is drawn from any population with mean 𝜇 and standard deviation 𝜎, the sample mean 𝑋 is approximately normally distributed with mean 𝜇 and standard deviation 𝜎/ 𝑚
  • CI can be calculated using CLT
  • For example, let’s say we are interested in the probability Pr of an estimate X between two values A and B, the notation is: 𝑃 𝑨 < 𝑿 < 𝑩 = 𝑷
  • Let’s say 𝑷 = 0.95 meaning that there is a 95% chance that X is between values A and B
  • CONFIDENCE INTERVAL – For μ known σ2
  • So what are the values a and b →
  • CONFIDENCE INTERVAL – For μ unknown σ2
  • In practice: We do not know population variance
  • Replace Z with t
  • There is a distribution that can be used when we do not know variance: Student’s t-distribution
  • Student’s t-distribution
    A distribution used when population variance is unknown
  • Replace Z with t and σ with s
  • Confidence Interval for μ known and unknown σ2
  • An example: A 90% Confidence Interval is desired for the mean vehicular speed on roads. Sample size n = 80, sample mean ത𝑋𝑋 = 60. What are the UCL and LCL?
  • Say: The population standard deviation is σ = 5.5
  • For known σ: Calculate Confidence Interval using Z-table