LESSON 6

Cards (14)

  • Exploratory Data Analysis
    The critical process of performing INITIAL INVESTIGATION on data so as to discover patterns, to spot anomalies, to test hypothesis and to check assumptions with the help of summary statistics and graphical representations
  • data value
    is a piece of INFORMATION, such as a number or a date
  • Data Variable
    is a characteristic that you can MEASURE, such as weight or income
  • Distribution
    The __ of a dataset is how the dataset is SPREAD OUT. You can visualize a dataset's distribution by observing its shape on a graph
  • Outlier
    is a data value that is SIGNIFICANTLY DIFFERENT, including much higher or lower, from the rest of a dataset
  • Data model
    method of ORGANIZING data and relationships between values in a dataset
  • The hflights Dataset includes data on all flights that departed Houston, TX in 2011
  • Categorical data
    Data that fits into categories (e.g. Gender, Country)
  • Quantitative data
    NUMERICAL DATA which represents a numerical value (e.g. age, sales, population)
  • Converting to Factors
    1. Factor variables are categorical variables that can be either numeric or string variables
    2. Convert Origin, DayOfWeek, Month to factors
  • Univariate analysis
    Analysis of a SINGLE VARIABLE with no cause-effect relationship
  • Bivariate analysis
    ANALYSIS OF TEO VARIABLE to determine relationships between them, with one variable dependent and the other independent
  • Types of bivariate data analysis
    1. Numerical and Numerical
    2. Categorical and Categorical
    3. Numerical and Categorical
  • Factor variables
    are categorical variables that can be either numeric or string
    variables