Python programming library used for working with data sets
Pandas
Has functions for analyzing, cleaning, exploring, and manipulating data
Panel Data
Reference to the name "Pandas"
Python Data Analysis
Reference to the name "Pandas"
Pandas was created by Wes McKinney
2008
Pandas
Allows us to analyze big data and make conclusions based on statistical theories
Can clean messy data sets, and make them readable and relevant
Relevant data is very important in data science
Pandas
Gives you answers about the data like: Is there a correlation between two or more columns? What is average value? Max value? Min value?
Pip
The most popular package manager for Python, developed in 2008, the standard tool for installing Python packages and their dependencies in a secure manner
Alias
Pandas is usually imported under the pd alias
Series
A Pandas Series is like a column in a table, a one-dimensional array holding data of any type
Labels
If nothing else is specified, the values are labeled with their index number. This label can be used to access a specified value
Key/Value Objects as Series
You can also use a key/value object, like a dictionary, when creating a Series
DataFrames
Data sets in Pandas are usually multi-dimensional tables, Series is like a column, a DataFrame is the whole table
Locate Row
Pandas use the loc attribute to return one or more specified row(s)
Named Indexes
With the index argument, you can name your own indexes
Load Files Into a DataFrame
If your data sets are stored in a file, Pandas can load them into a DataFrame
tail()
The tail() method returns a specified number of last rows. The tail() method returns the last 5 rows if a number is not specified
head()
The head() method returns a specified number of first rows
max_rows
The number of rows returned is defined in Pandas option settings. You can check your system's maximum rows with the pd.options.display.max_rows statement
percentiles
The describe() method returns description of the data in the DataFrame, including count, mean, std, min, 25%, 50%, 75%, max
Null Values
The info() method tells us how many Non-Null values there are present in each column
Data Cleaning
Fixing bad data in your data set, such as empty cells, data in wrong format, wrong data, duplicates
Remove Rows
One way to deal with empty cells is to remove rows that contain empty cells
fillna()
The fillna() method allows us to replace empty cells with a value
Replace Empty Values
Replace empty cells with a new value instead of deleting entire rows
Replace Only For Specified Columns
Replace empty values for one column, specify the column name for the DataFrame
Replace Using Mean, Median, or Mode
Replace empty cells with the mean, median or mode value of the column
Mean
The average value (the sum of all values divided by number of values)
Median
The value in the middle, after you have sorted all values ascending
Mode
The value that appears most frequently
Convert Into a Correct Format
Fix cells with data of wrong format by converting all cells in the columns into the same format