Several genes contribute to the final phenotype of a given trait, with each gene playing a small role in the expression of the trait, allowing for significant variation from person to person
Polygenic traits are controlled by more than one gene, leading to a spectrum where the upper end shows the trait fully expressed, the lower end with the gene not expressed, and the middle ground having a higher probability due to environmental factors affecting both ends
Factors like sampling method, personal bias, limitations in equipment, and the use of estimates can limit our ability to observe and analyze genetic traits
Types of biases in experiments:
Design Bias: occurs in the initial phase of the experiment and in creating the data collection process
Selection Bias: arises during the collection of samples and is due to the non-randomized selection of samples
Procedural Bias: related to the methodology of the study and creating the procedure of the experiment
Reporting Bias: happens during the reporting phase and involves not including all data and results that were collected
Data Collection Bias: occurs post-experiment and involves excluding negative results and only including positive results
Types of data:
Measurements (Ratio): parametric test uses normally distributed data, can be continuous (e.g., temperature) or discrete (e.g., number of pills)
Measurements (Intervals): no true zero, includes data like temperature and test scores
Ranks (Ordinal Data): removes inherent gap in variability, like ranking test scores, and is non-parametric
Frequencies (Nominal Data): involves labeling variables like hair color and names
Forming the hypothesis:
Biological Null Hypothesis: identify the variables (biological aspect) and assume no relationship between them
Statistical Null Hypothesis: specific data in the given dataset, assuming a relationship between the variables
Visualization, test statistics, significance, & inference:
Visualization: shows correlation between two variables using histograms, bar charts, and box-whisker plots
Test Statistics: measures effect size of difference relative to variability, like statistical tests (e.g., Chi-Square)
Significance: probability of the effect by chance if the Null hypothesis is true, indicated by critical value or p-value
Inference: obtained information after the experiment
Confounding variables:
Affect both the dependent and independent variables, may introduce errors and biases
Dependent variable (x-axis) is correlated with the independent variable (y-axis) and may introduce errors
Types of errors:
Type 1 Error: when the Null hypothesis is wrongly rejected
Type 2 Error: when the Null hypothesis is accepted when it is false
Dealing with the problem of variability:
Variability is inherent among organisms and biases and errors can be encountered during observation
Steps to obtain useful quantitative information about the population despite variation:
Examine data distribution
Understand why variation occurred
Describe how to quantify the variation
The probability density function is a graph showing the probability of a given measurement occurring, with the x-axis displaying possible measurements and the y-axis showing the probability of each measurement occurring
The probability density function is irregular, not a smooth curve, indicating that the probability of a measurement occurring varies depending on the measurement, with higher probability near the mean
A table showing the distribution of news articles by length, ordered from shortest to longest, revealing that most articles are short and the distribution is skewed towards shorter articles
A slide from a presentation about descriptive statistics, explaining how to present descriptive statistics using error bars if the data is normally distributed, and defining confidence limits and their use
Parameters like sample mean, variance, standard deviation, degrees of freedom, standard error, and confidence limits are used to quantify variation and summarize non-normally distributed data in descriptive statistics
A box plot is used to present descriptive statistics when data is not normally distributed, displaying interquartile ranges to visualize variability and distribution of non-normal data
The sample mean always perfectly reflects the population mean. Hence, smaller sample sizes are preferable because they are manageable
The mean, median, and mode of a data set will always have the same value