ENGR 120 Quiz 7 (9a Statistics)

Created by

Thistle Dristle

Subdecks (3)

Cards (53)

The describe() method can compute several summary measures for all numerical data in a dataframe
View source
The groupby() method allows you to group data together according to a particular variable
View source
For inferential statistical tests, we'll use the SciPy library
View source
Pearson's correlation is used to evaluate the linear relationship between two sets of continuous values
View source
The r value from Pearson's correlation ranges from -1 to 1 and indicates the direction and strength of the correlation
View source
The p value from Pearson's correlation indicates whether the result is statistically significant, with p < 0.05 being the standard threshold
View source
Independent samples t-test is used to compare two unrelated datasets
View source
Paired t-test is used to compare two related datasets from the same group
View source
The t value from a t-test indicates the size of the difference between the two datasets
View source
Summary method that will show the number of observations per unique values in a specific column.
Syntax: movies_df['fandango'].value_counts()
Quantile method syntax: movies_df['fandango'].quantile([.25, .75]) This will print the 25th and 75th percentiles for the ratings in the fandango column
Describe method: A single command to compute descriptive statistics for numerical data movies_df.describe()
Use the groupby() method to group the data according to the drug column and the mean() method. This will average across the different time periods. Syntax: drugs_df.groupby('drug').mean(numeric_only = True)
Use the groupby() method to group the data according to the time_period column, once again using the mean() method. Syntax: drugs_df.groupby('time_period').mean(numeric_only = True)
Syntax for standard deviation: df_name[‘col_name’].std()
Number of unique values for a given column df_name[‘col_name’].nunique()
Number of observations for a given column df_name[‘col_name’].count()
To group by more than one column, provide a list: df_name.groupby([‘col_1’, ‘col_2’]).method_name(numeric_only = True)
Syntax to import stats module from SciPy: from scipy import stats
To use a function within stats, you do not need to include the word scipy. You only need to stats.func_name()
Pearson correlation syntax: stats.pearsonr(dataframe['column1'], dataframe ['column2'])
read in a file with pandas: dataframe = pd.read_csv(file outpath)
random selection of 5 rows: dataframe.sample(n = 5)
Teams with more than than 300 respondents or fewer than 50 Independentsnfl_df[(nfl_df.tot_respondents > 300) | (nfl_df.independent < 50)]
Percent democrat#Teams for which more than 25% (i.e., 0.25) of respondents are Democrats
nfl_df[(nfl_df.democrat / nfl_df.tot_respondents) > 0.25]
Sort the dataframe by Independents, in descending order, such that the change to the dataframe is permanent.
nfl_df.sort_values(by = 'independent', ascending = False, inplace = True)

ENGR 120 Quiz 7 (9a Statistics)

Subdecks (3)

10a incorporating data into chatbots

8c Pandas III

9b Data Visualization

Cards (53)