Final

Cards (30)

  • Random
    Each entity of the population has an equal chance of getting chosen
  • Systematic random

    Starting points with fixed intervalues
  • Stratified
    Population is divided into groups or strata, sampling occurs within
  • Kruskal-Wallis test

    Compares ordinal and non-normal variables for more than two groups
  • Outcome variables
    Dependent = plausible results, if plausible then independent not important
  • Machine Learning
    Subdiscipline of AI, study of computer algorithms that improve through use and data
  • Supervised learning
    Builds mathematical model for data with independent and dependent variables
  • Unsupervised learning
    Finds structures of data that contains only inputs
  • Decision trees
    Non-parametric supervised learning method used for classification and regression
  • Classification trees

    Decision trees where target can have a discrete set of variables
  • Regression trees

    Decision trees where target variables can take continuous values
  • Ensemble models
    Machine learning algorithms that use an ensemble of decision trees
  • Partial dependence plot

    Plot of relationship between predicted values and explanatory variables
  • Factor analysis
    Collapses columns of data set to create a smaller number to indicate new linear combinations
  • Dimensionality reduction
    Process of reducing the number of attributes in data while keeping substantial amounts of original data
  • Ordination
    Any operation of data matrix that reduces the dimensionality, species composition, abundance in each column
  • Cluster analysis
    Collapses data row-wise by data that are similar to one another
  • Partitioning
    Number of groups and algorithm divided by sample into given number of groups
  • Hierarchical
    Determining the best number of groups itself, distinct based on distances
  • Eigen analysis
    Tries to find non-correlated linear combinations of original variables, new variables to explain variances within data by axis
  • Direct gradient

    Uses environmental variables to determine environmental gradients
  • Logistic regression
    Used when the dependent variable is binary
  • Log Pseudo-likelihood
    Measures how well the model fits the data, only meaningful/comparable within each species
  • Model Likelihood Ratio Chi-Square test + P-value
    Tests how other models fit and the Probability of Obtaining the Chi-squared test
  • Pseudo R-Squared
    Logistic Regression does not have an R-squared as in OLS regression
  • Odds ratio regression coefficient
    Expressed as Probability (ratio of Probability of Presence to Probability of Absence)
  • Non-Parametric tests
    Distribution or free tests, do not assume anything about the underlying distribution
  • Count data
    Discrete and bound by Zero, no negatives so you cannot get over dispersion
  • Over dispersion
    Data with variables higher than expected
  • Generalized Linear Models
    Umbrella for wide range of regressions