The describe() method can compute several summary measures for all numerical data in a dataframe
The groupby() method allows you to group data together according to a particular variable
For inferential statistical tests, we'll use the SciPy library
Pearson's correlation is used to evaluate the linear relationship between two sets of continuous values
The r value from Pearson's correlation ranges from -1 to 1 and indicates the direction and strength of the correlation
The p value from Pearson's correlation indicates whether the result is statistically significant, with p < 0.05 being the standard threshold
Independent samples t-test is used to compare two unrelated datasets
Paired t-test is used to compare two related datasets from the same group
The t value from a t-test indicates the size of the difference between the two datasets
Summary method that will show the number of observations per unique values in a specific column.
Syntax: movies_df['fandango'].value_counts()
Quantile method syntax: movies_df['fandango'].quantile([.25, .75]) This will print the 25th and 75th percentiles for the ratings in the fandango column
Describe method: A single command to compute descriptive statistics for numerical data movies_df.describe()
Use the groupby() method to group the data according to the drug column and the mean() method. This will average across the different time periods. Syntax: drugs_df.groupby('drug').mean(numeric_only = True)
Use the groupby() method to group the data according to the time_period column, once again using the mean() method. Syntax: drugs_df.groupby('time_period').mean(numeric_only = True)
Syntax for standard deviation: df_name[‘col_name’].std()
Number of unique values for a given column df_name[‘col_name’].nunique()
Number of observations for a given column df_name[‘col_name’].count()
To group by more than one column, provide a list: df_name.groupby([‘col_1’, ‘col_2’]).method_name(numeric_only = True)
Syntax to import stats module from SciPy: from scipy import stats
To use a function within stats, you do not need to include the word scipy. You only need to stats.func_name()