PANDAS

Cards (34)

  • Pandas
    Python programming library used for working with data sets
  • Pandas
    • Has functions for analyzing, cleaning, exploring, and manipulating data
  • Panel Data
    Reference to the name "Pandas"
  • Python Data Analysis
    Reference to the name "Pandas"
  • Pandas was created by Wes McKinney

    2008
  • Pandas
    • Allows us to analyze big data and make conclusions based on statistical theories
    • Can clean messy data sets, and make them readable and relevant
  • Relevant data is very important in data science
  • Pandas
    • Gives you answers about the data like: Is there a correlation between two or more columns? What is average value? Max value? Min value?
  • Pip
    The most popular package manager for Python, developed in 2008, the standard tool for installing Python packages and their dependencies in a secure manner
  • Alias
    Pandas is usually imported under the pd alias
  • Series
    A Pandas Series is like a column in a table, a one-dimensional array holding data of any type
  • Labels
    If nothing else is specified, the values are labeled with their index number. This label can be used to access a specified value
  • Key/Value Objects as Series
    You can also use a key/value object, like a dictionary, when creating a Series
  • DataFrames
    Data sets in Pandas are usually multi-dimensional tables, Series is like a column, a DataFrame is the whole table
  • Locate Row
    Pandas use the loc attribute to return one or more specified row(s)
  • Named Indexes
    With the index argument, you can name your own indexes
  • Load Files Into a DataFrame
    If your data sets are stored in a file, Pandas can load them into a DataFrame
  • tail()
    The tail() method returns a specified number of last rows. The tail() method returns the last 5 rows if a number is not specified
  • head()
    The head() method returns a specified number of first rows
  • max_rows
    The number of rows returned is defined in Pandas option settings. You can check your system's maximum rows with the pd.options.display.max_rows statement
  • percentiles
    The describe() method returns description of the data in the DataFrame, including count, mean, std, min, 25%, 50%, 75%, max
  • Null Values
    The info() method tells us how many Non-Null values there are present in each column
  • Data Cleaning
    • Fixing bad data in your data set, such as empty cells, data in wrong format, wrong data, duplicates
  • Remove Rows
    One way to deal with empty cells is to remove rows that contain empty cells
  • fillna()
    The fillna() method allows us to replace empty cells with a value
  • Replace Empty Values
    Replace empty cells with a new value instead of deleting entire rows
  • Replace Only For Specified Columns

    Replace empty values for one column, specify the column name for the DataFrame
  • Replace Using Mean, Median, or Mode
    Replace empty cells with the mean, median or mode value of the column
  • Mean
    The average value (the sum of all values divided by number of values)
  • Median
    The value in the middle, after you have sorted all values ascending
  • Mode
    The value that appears most frequently
  • Convert Into a Correct Format
    Fix cells with data of wrong format by converting all cells in the columns into the same format
  • coerce
    Invalid parsing will be set as NaN
  • ignore
    Invalid parsing will return the input