L1 part 1

Cards (86)

  • DATA, POPULATION, & SAMPLE
    • Data: Characteristics or information, usually numerical, collected through observation
    • Population: Collection of all items of interest to a study
    • Sample: Subset of the population
  • DESCRIPTIVE STATISTICS
    • Alternative descriptive statistics
    • Advantages and disadvantages of alternative descriptive statistics
    • Comparing and contrasting different data using descriptive statistics
    • Advantages and disadvantages of alternative methods of displaying data
  • CORRELATION VS. CAUSATION
    • Recognizing the difference between correlation and causation
    • Employing correlation to evaluate the relationship between two variables
  • DATA – What exactly is it?
  • Etymology of Data: Latin (verb “dare”), plural of Datum, something given or admitted especially as a basis for reasoning or inference
  • Definition of Data: Characteristics or information, usually numerical, that are collected through observation
  • Context of Data: Data is the physical representation of information in a manner suitable for communication, interpretation, or processing by human beings or by automatic means
  • DATA – What types are there?
  • Data Types
    • Quantitative (Numerical)
    • Qualitative (Categorical)
    • Discrete
    • Continuous
    • Nominal
    • Ordinal
  • Examples of Data
    • Covid-19 cases and fatalities per day
    • Total flights per day worldwide
    • Total vehicle-miles travelled in major roadways
    • Number of rainy and sunny days per country
    • Traffic crashes around the Netherlands
  • Population vs. Sample
    • Population: Collection of all items of interest to a study
    • Sample: Subset of the population
  • Population (N): A population is the collection of all items of interest to a study. It does not only refer to people or animate creatures but also objects, events, procedures, or observations
  • Sample (n): A sample is a subset of the population. An investigation is often restricted to one or more samples drawn from the population
  • Data Analysis methods
    • Descriptive Statistics
    • Statistical Inference I
  • Descriptive Statistics: methods and techniques for summarizing and interpreting data
  • Descriptive Statistics are used to get to know a “population” by estimating a single value
  • Descriptive Statistics components
    • Mean
    • Median
    • Mode
    • Skewness & Kurtosis
  • Observations can be ordered from smallest to largest magnitude. The boundaries of the data to be defined and supports comparisons of the relative position of specific observations
  • A percentile is defined as that value below which lies P% of the values in the remaining sample
  • Percentile calculation formula: %iles = (R / n) * 100 where n=16
  • What are the 25th, 50th, and 75th percentiles?

    1/4, 1/2, 3/4
  • Percentiles
    Measures of the relative positions of points
  • Median constitutes a measure of the centrality of the observations, or central tendency
  • Central Tendency measures
    • Mean
    • Median
    • Mode
  • Percentiles are useful for understanding the relative standing of data points
  • Descriptive Statistics
    • Central Tendency – Mean, Median, Mode
  • Percentiles
    Measures of the relative positions of points, with the middle percentile or "median" representing the centrality of the observations or central tendency
  • Arithmetic mean
    The most popular and useful measure of central tendency, also known as the sample mean or expectation
  • Sample mean or expectation of X
    x1, x2, ..., xn
  • Population mean
    μ = 1/N * Σ(xi)
  • The value that occurs most frequently, or the most commonly occurring outcome. Only applicable to Count data
  • Advantages and Disadvantages of mean, median, and mode
  • When data are symmetric and unimodal, the mode, median, and mean are approximately equal
  • If the data are qualitative: mean or median, but mode
  • Variability – Variance, Standard Deviation: Used to describe and quantify the spread or dispersion around the centre (e.g., mean)
  • If the data are qualitativemean:
  • Used to describe and quantify the spread or dispersion around the centre (e.g., mean)
  • average or expected value of a sample is not sufficient to understand data
  • Can we measure variability by the difference between minimum and maximum observation?
  • Or, what about difference between 25th and 75th percentiles?