Stat

Subdecks (1)

Cards (42)

  • Statistics refers to a set of mathematical procedures that deals with the collection, organization, presentation, analysis, and interpretation of data in order to make a decision
  • Data is the set of individual values associated with a variable
  • Variable is a characteristic of an item or individual that can change or take different values
  • Descriptive procedures involve collecting, presenting, and describing data
  • Inferential procedures involve drawing conclusions and/or making decisions concerning a population based only on sample data
  • Goal of statistics: Convert data into meaningful information
  • Levels of Measurements:
    • Nominal Scale: Classifies data into distinct categories
    • Ordinal Scale: Classifies data into distinct categories with ranking implied
    • Interval Scale: Ordered scale where the difference between measurements is meaningful but lacks a true zero point
    • Ratio Scale: Ordered scale where the difference between measurements is meaningful and has a true zero point
  • Sources of Data:
    • Primary Sources: Data collected from surveys, experiments, or observations
    • Secondary Sources: Data analyzed by someone other than the data collector, such as census data or data from print journals
  • Sampling Frame is a listing of items that make up the population, such as population lists, directories, or maps
  • Probability Sampling involves random selection, allowing for strong statistical inferences about the whole group
    • Simple Random Sampling: Every possible sample of a given size has an equal chance of being selected
    • Stratified Random Sampling: Divides the population into subgroups and ensures every subgroup is properly represented in the sample
    • Systematic Random Sampling: Selects every kth individual after randomly selecting the first individual
    • Cluster Sampling: Divides the population into clusters and randomly selects entire clusters
  • Non-probability Sampling involves non-random selection based on convenience or other criteria
    • Convenience Sampling: Sample selected based on ease, cost, or convenience
    • Voluntary Response Sampling: Participants volunteer themselves, leading to self-selection bias
    • Snowball Sampling: Participants recruit other participants
    • Purposive Sampling: Researcher selects a sample based on expertise and specific criteria
    • Quota Sampling: Non-random selection of a predetermined number or proportion of units called a quota
  • Evaluating Survey Worthiness:
    • Consider the purpose of the survey
    • Check if the survey is based on a probability sample
    • Watch out for coverage error, nonresponse error, measurement error, and sampling error
  • Data Cleaning is Necessary:
    • Identify and address irregularities in the data, such as typographical errors, missing values, and outliers
    • Recode variables to supplement or replace the original variable
  • Things to Consider in Potential Sources of Data:
    • Structured data follows an organizing principle, while unstructured data does not
    • Electronic data formatting and encoding should be considered
  • References:
    • Various textbooks on statistics
  • Techniques for Inferential Procedures
    Estimation e.g., Estimate the population mean weight
    using the sample mean weight
    Hypothesis Testing e.g., Use sample evidence to test the claim that
    the population mean weight is 120 pounds
  • Populations and Samples
    • A population is the entire collection of things under consideration
    and referred to as the frame
    • The sampling unit is each object or individual in the frame
    • A parameter is a summary measure computed to describe a
    characteristic of the population
    • A sample is a subset of the population selected for analysis
    • A statistic is a summary measure computed to describe a
    characteristic of the sample drawn from the population
  • Why Sample?
    • Less time consuming than a census
    • Less costly to administer than a census
    • It is possible to obtain statistical results of a sufficiently high
    precision based on samples
    Strive for representative samples to reflect the population of interest accurately!