statbio

Cards (18)

  • Several genes contribute to the final phenotype of a given trait, with each gene playing a small role in the expression of the trait, allowing for significant variation from person to person
  • Polygenic traits are controlled by more than one gene, leading to a spectrum where the upper end shows the trait fully expressed, the lower end with the gene not expressed, and the middle ground having a higher probability due to environmental factors affecting both ends
  • Factors like sampling method, personal bias, limitations in equipment, and the use of estimates can limit our ability to observe and analyze genetic traits
  • Types of biases in experiments:
    • Design Bias: occurs in the initial phase of the experiment and in creating the data collection process
    • Selection Bias: arises during the collection of samples and is due to the non-randomized selection of samples
    • Procedural Bias: related to the methodology of the study and creating the procedure of the experiment
    • Reporting Bias: happens during the reporting phase and involves not including all data and results that were collected
    • Data Collection Bias: occurs post-experiment and involves excluding negative results and only including positive results
  • Types of data:
    • Measurements (Ratio): parametric test uses normally distributed data, can be continuous (e.g., temperature) or discrete (e.g., number of pills)
    • Measurements (Intervals): no true zero, includes data like temperature and test scores
    • Ranks (Ordinal Data): removes inherent gap in variability, like ranking test scores, and is non-parametric
    • Frequencies (Nominal Data): involves labeling variables like hair color and names
  • Forming the hypothesis:
    • Biological Null Hypothesis: identify the variables (biological aspect) and assume no relationship between them
    • Statistical Null Hypothesis: specific data in the given dataset, assuming a relationship between the variables
  • Visualization, test statistics, significance, & inference:
    • Visualization: shows correlation between two variables using histograms, bar charts, and box-whisker plots
    • Test Statistics: measures effect size of difference relative to variability, like statistical tests (e.g., Chi-Square)
    • Significance: probability of the effect by chance if the Null hypothesis is true, indicated by critical value or p-value
    • Inference: obtained information after the experiment
  • Confounding variables:
    • Affect both the dependent and independent variables, may introduce errors and biases
    • Dependent variable (x-axis) is correlated with the independent variable (y-axis) and may introduce errors
  • Types of errors:
    • Type 1 Error: when the Null hypothesis is wrongly rejected
    • Type 2 Error: when the Null hypothesis is accepted when it is false
  • Dealing with the problem of variability:
    • Variability is inherent among organisms and biases and errors can be encountered during observation
    • Steps to obtain useful quantitative information about the population despite variation:
    • Examine data distribution
    • Understand why variation occurred
    • Describe how to quantify the variation
  • The probability density function is a graph showing the probability of a given measurement occurring, with the x-axis displaying possible measurements and the y-axis showing the probability of each measurement occurring
  • The probability density function is irregular, not a smooth curve, indicating that the probability of a measurement occurring varies depending on the measurement, with higher probability near the mean
  • A table showing the distribution of news articles by length, ordered from shortest to longest, revealing that most articles are short and the distribution is skewed towards shorter articles
  • A slide from a presentation about descriptive statistics, explaining how to present descriptive statistics using error bars if the data is normally distributed, and defining confidence limits and their use
  • Parameters like sample mean, variance, standard deviation, degrees of freedom, standard error, and confidence limits are used to quantify variation and summarize non-normally distributed data in descriptive statistics
  • A box plot is used to present descriptive statistics when data is not normally distributed, displaying interquartile ranges to visualize variability and distribution of non-normal data
  • The sample mean always perfectly reflects the population mean. Hence, smaller sample sizes are preferable because they are manageable
  • The mean, median, and mode of a data set will always have the same value