Topic 6.3.1

Cards (7)

  • Data wrangling

    Also known as data munging or data preparation, refers to the process of refining, organizing, and enhancing raw data to make it more appropriate for analysis
  • Essential steps in data wrangling
    • Structuring
    • Cleaning
    • Enriching
    • Validating
    • Generating output
  • Structuring
    1. Arrange the raw data into a format suitable for analysis
    2. Convert data types
    3. Reorder columns
    4. Address missing values
  • Cleaning
    1. Identify and address errors, inconsistencies, or outliers within the dataset
    2. Remove duplicate records
    3. Rectify typos
    4. Manage missing or incomplete information
  • Enriching
    1. Enhance the dataset by adding relevant information from other sources
    2. Merge datasets
    3. Extract additional features
    4. Incorporate external data to provide more context and depth to the analysis
  • Validating
    1. Ensure that the data adheres to specific rules, standards, or expectations
    2. Identify any remaining errors or inconsistencies
    3. Check for outliers
    4. Verify data integrity
    5. Ensure that values fall within expected ranges
  • Output
    1. Prepare the data for analysis or presentation
    2. Format the data into a specific structure
    3. Export it to a particular file format
    4. Integrate it into a database for further use