Data: Characteristics or information, usually numerical, collected through observation
Population: Collection of all items of interest to a study
Sample: Subset of the population
DESCRIPTIVE STATISTICS
Alternative descriptive statistics
Advantages and disadvantages of alternative descriptive statistics
Comparing and contrasting different data using descriptive statistics
Advantages and disadvantages of alternative methods of displaying data
CORRELATION VS. CAUSATION
Recognizing the difference between correlation and causation
Employing correlation to evaluate the relationship between two variables
DATA – What exactly is it?
Etymology of Data: Latin (verb “dare”), plural of Datum, something given or admitted especially as a basis for reasoning or inference
Definition of Data: Characteristics or information, usually numerical, that are collected through observation
Context of Data: Data is the physical representation of information in a manner suitable for communication, interpretation, or processing by human beings or by automatic means
DATA – Whattypes are there?
Data Types
Quantitative (Numerical)
Qualitative (Categorical)
Discrete
Continuous
Nominal
Ordinal
Examples of Data
Covid-19 cases and fatalities per day
Total flights per day worldwide
Total vehicle-miles travelled in major roadways
Number of rainy and sunny days per country
Traffic crashes around the Netherlands
Population vs. Sample
Population: Collection of all items of interest to a study
Sample: Subset of the population
Population (N): A population is the collection of all items of interest to a study. It does not only refer to people or animate creatures but also objects, events, procedures, or observations
Sample (n): A sample is a subset of the population. An investigation is often restricted to one or more samples drawn from the population
Data Analysis methods
Descriptive Statistics
Statistical Inference I
Descriptive Statistics: methods and techniques for summarizing and interpreting data
Descriptive Statistics are used to get to know a “population” by estimating a single value
Descriptive Statistics components
Mean
Median
Mode
Skewness & Kurtosis
Observations can be ordered from smallest to largest magnitude. The boundaries of the data to be defined and supports comparisons of the relative position of specific observations
A percentile is defined as that value below which lies P% of the values in the remaining sample
Median constitutes a measure of the centrality of the observations, or central tendency
Central Tendency measures
Mean
Median
Mode
Percentiles are useful for understanding the relative standing of data points
Descriptive Statistics
Central Tendency – Mean, Median, Mode
Percentiles
Measures of the relative positions of points, with the middle percentile or "median" representing the centrality of the observations or central tendency
Arithmetic mean
The most popular and useful measure of central tendency, also known as the sample mean or expectation
Sample mean or expectation of X
x1, x2, ..., xn
Population mean
μ = 1/N * Σ(xi)
The value that occurs most frequently, or the most commonly occurring outcome. Only applicable to Count data
Advantages and Disadvantages of mean, median, and mode
When data are symmetric and unimodal, the mode, median, and mean are approximately equal
If the data are qualitative: mean or median, but mode
Variability – Variance, Standard Deviation: Used to describe and quantify the spread or dispersion around the centre (e.g., mean)
If the data are qualitative → mean:
Used to describe and quantify the spread or dispersion around the centre (e.g., mean)
average or expected value of a sample is not sufficient to understand data
Can we measure variability by the difference between minimum and maximum observation?
Or, what about difference between 25th and 75th percentiles?