EDA-Lesson 2-3-4

Cards (24)

  • Statistics deals with the collection, presentation, analysis, and use of data to make decisions, solve problems, and design products and processes
  • Descriptive Statistics (DS) involves describing the characteristics and properties of a group of persons, places, or things based on easily verifiable facts
  • Inferential Statistics (IS) draws inferences about a population based on data gathered from samples using techniques of Descriptive Statistics
  • Sample is a small group taken from the population, also known as a statistic
  • Variables are the parameters being studied in statistics
  • Qualitative Variables are non-numeric data answered with qualitative information
  • Population refers to the totality of all observations from which the data set is acquired, also known as a parameter
  • Discrete Data are countable quantities with finite equal intervals, like the number of individuals
  • Independent Variable is a naturally occurring phenomenon that can be altered by changing its magnitude
  • Dependent Variable is observed upon applying changes to the independent variable
  • Controlled Variable is kept constant to check for external effects on the dependent variable
  • Quantitative Variables are countable or measurable quantities
  • Continuous Data are measurable quantities with infinite values between intervals, like height and weight
  • Scales of Measurement:
    • Nominal: Categorical data assigned to numbers
    • Ordinal: Numbers designate the rank order of data
    • Interval: Constant range between numeric values, addition and subtraction applicable
    • Ratio: All basic mathematical operations can be performed, non-arbitrary zero point
  • Sampling is the process of taking samples from the population
  • Probability Sampling eliminates biases against certain events by listing all possible events and selecting them randomly
    • Simple Random Sampling
    • Systematic Sampling
    • Stratified Sampling
    • Cluster Sampling
  • Non-Probability Sampling has certain or no chance of an individual being selected
    • Convenience Sampling
    • Quota Sampling
    • Purposive Sampling
  • Types of Data Presentation:
    • Textual Form
    • Tabular Form
    • Graphical Form
  • Univariate Analysis includes:
    • Measure of Central Tendency
    • Measure of Position
    • Measure of Variation
    • Measure of Shape
  • Mean is the most widely used parameter for describing ratio data and can be arithmetic, geometric, harmonic, trimmed, or root mean square
  • Median is the midpoint of values when ordered from smallest to largest, unaffected by extreme values
  • Mode is the most frequently occurring value, used for nominal data and polls
  • Quantiles are points taken at regular intervals from the cumulative distribution function of a random variable, including Quartiles, Deciles, and Percentiles
  • Measures of Variation include Range, Mean Absolute Deviation, Variance, Standard Deviation, and Coefficient of Variation