We see data types, descriptive statistics, correlations, probability theory, and probability distributions. These are very useful to understand the data and information!
Science and engineering work through hypotheses and questions, and need sound analyses and accurate information!
Hypothesis
A proposition made as a basis for reasoning, without any assumption of its truth
We often need to assess if an outcome is credible and how much credible
For example: If I tell you that there is a safety measure working 9 out of 10 times with a standard deviation of 1.5, would you rely on this measure?
We often need to test if a hypothesis is likely to be true or not
Or, how much would you rely on this measure?
Washington, S., M. G. Karlaftis, and F. L. Mannering: 'Statistical and Econometric Methods for Transportation Data Analysis, 2nd ed. Chapman & Hall/CRC Press, Boca Raton, Florida. 2010'
We need techniques and methods to formulate, test, and make informed decisions regarding engineering questions and hypotheses
Confidence intervals
CI is a range of values which we are fairly sure that our true value lies in
Hypothesis testing
HT is a way for you to test the results of a survey or experiment to see if you have meaningful results
Cross-validation
CV is used for assessing how the results of a statistical analysis will generalize to an independent data set
CI is a range of values which we are fairly sure that our true value lies in
HT is a way for you to test the results of a survey or experiment to see if you have meaningful results
CV is used for assessing how the results of a statistical analysis will generalize to an independent data set
How much confidence you put on your estimate: In practice, we have samples to calculate statistics like mean and other parameters
Confidence Interval (CI) is used to make interval estimates with a lower and upper boundary within which an unknown parameter will lie with a prespecified level of confidence
An interval calculated using sample data contains the true population parameter with some level of confidence
Confidence Intervals (CIs) can be constructed with any level of confidence like 90%, 95%, or 60%
The wider a CI, the more confident the researcher is that it contains the population parameter
Lower value is the lower confidence limit (LCL) and upper value is the upper confidence limit (UCL)
Types of confidence intervals
Confidence Interval for mean (μ) with known variance (σ2)
Confidence Interval for mean (μ) with unknown variance (σ2)
Confidence Interval for a population proportion
Confidence Interval for a population variance
Wider CI means there is room for larger variability in estimate, but it also means estimate has more uncertainty (so less reliable), it is a trade-off
Central Limit Theorem (CLT) states that a sufficiently large random sample drawn from any population with mean 𝜇 and standard deviation 𝜎, the sample mean ത𝑋 is approximately normally distributed with mean 𝜇 and standard deviation 𝜎/ 𝑚
When a random sample is drawn from any population with mean 𝜇 and standard deviation 𝜎, the sample mean 𝑋 is approximately normally distributed with mean 𝜇 and standard deviation 𝜎/ 𝑚
CI can be calculated using CLT
For example, let’s say we are interested in the probability Pr of an estimate X between two values A and B, the notation is: 𝑃 𝑨 < 𝑿 < 𝑩 = 𝑷
Let’s say 𝑷 = 0.95 meaning that there is a 95% chance that X is between values A and B
CONFIDENCE INTERVAL – For μ known σ2
So what are the values a and b →
CONFIDENCE INTERVAL – For μ unknown σ2
In practice: We do not know population variance
Replace Z with t
There is a distribution that can be used when we do not know variance: Student’s t-distribution
Student’s t-distribution
A distribution used when population variance is unknown
Replace Z with t and σ with s
Confidence Interval for μ known and unknown σ2
An example: A 90% Confidence Interval is desired for the mean vehicular speed on roads. Sample size n = 80, sample mean ത𝑋𝑋 = 60. What are the UCL and LCL?
Say: The population standard deviation is σ = 5.5
For known σ: Calculate Confidence Interval using Z-table