BIOSTATISTICS

Subdecks (6)

Cards (271)

  • Linear Regression
    A model where a variation of one variable (y) is considered to be a consequence of the other (predictor) variable (x)
  • Linear Regression
    • Determines the expected value of a dependent variable (y) for the given values of one or more independent variables (x)
    • Quantifies how mean difference in the outcome (y) changes with a one-unit difference x
  • Applicable when
    The relationship between variable X and variable Y can be described by a straight line (identified through computation of the Pearson correlation coefficient during correlation analysis)
  • Multiple Linear Regression
    Effects of two or more independent (X) variables on one dependent (Y) variable are simultaneously considered
  • Simple Linear Regression

    Involves one independent variable and one dependent variable only
  • Dependent variable must be quantitative
  • Assumptions of Linear Regression
    • Linearity
    • Independence
    • Normality
    • Homoscedasticity
  • Linearity
    Means of the subpopulation Y all rest on the same straight line
  • Independence
    Value of Y at one value of X, does not depend on, and is not affected by the value of Y at another value of X
  • Normality
    For any fixed value of X, Y has a normal distribution
  • Homoscedasticity
    Variance of Y is the same for any value of X
  • Formulas of Linear Regression

    • y = mx + b
    • y = β0 + β1x1 + ε
  • y
    Dependent variable
  • x
    Independent variable
  • m
    Slope of the line; Δy/Δx
  • b
    1. intercept
  • β1
    Slope coefficient
  • β0
    1. intercept
  • ε
    Random error
  • b (y-intercept)

    Value of Y when X is 0
  • m (slope)
    • Intercept of the regression line
    • Determines the unit increase in the mean of y
    • Represents as increase/decrease in the mean of Y associated with 1 increase/decrease of X
  • Simple Linear Regression
    Used to model the relationship between one independent (X) and one dependent (Y) variable; only be used if the variables have linear relationship
  • Fitted using
    Least squares: Ridge regression (L1) and Lasso (L2)
  • Coefficient of Determination (r2)
    • Percentage reduction in the variance of variable Y due to variable X
    • Value of 1: all observations fall on the regression line
  • Both the independent and dependent variable should be quantitative variable with scale
  • The outliers should be removed
  • The dataset should be normally distributed
  • Scedasticity
    • Homoscedasticity: the variance of the residual errors is the same across all values of the predictor
    • Heteroscedasticity: the variance of the residual errors is not the same across all values of the predictor
  • Steps on Simple Linear Regression
    1. Determine whether or not assumptions underlying a linear relationship are met in the data available for analysis (Linearity, Independence¸ Normality, Homoscedasticity)
    2. Obtain the equation for the line (y = mx + b) that best fits the sample data
    3. Evaluate the equation to obtain some idea of the strength of the relationship and usefulness of the equation in predicting and estimating. Compute for the Coefficient of determination.
    4. If data appear to conform satisfactorily to the linear model, use equation obtained from the sample data to predict and estimate
  • Formula of the Slope (m)

    m = n(∑xy) - (∑x)(∑y) / [(n∑x2) - (∑x)2]
  • Formula of the y-intercept (b)
    b = y - mx
  • Coefficient of determination (r2)

    r2 = [(n(∑xy) - (∑x)(∑y)) / √[(n(∑x2) - (∑x)2)(n(∑y2) - (∑y)2)]]2
  • Relationship between mental ability scores (Y) and % Recommended Dietary Allowance (RDA) for Calorie (X) of Grade 6 pupils
  • Components of the formula
    • X (independent values)
    • Y (dependent values)
    • x (mean of the X values)
    • y (mean of the Y values)
    • x2 (square of the X values)
    • ∑x2 (sum of the squares of the X values)
    • y2 (square of the Y values)
    • ∑y2 (sum of the squares of the Y values)
    • XY (product of the X & Y values)
    • ∑XY (sum of the products of the X & Y values)
  • Tabulation of the results
  • Computation of the Pearson correlation coefficient
  • Data
    • 1444
    • 1687.2
    • 6
    • 60.7
    • 71
    • 3684.49
    • 5041
    • 4309.7
    • 7
    • 73.1
    • 48
    • 5343.61
    • 2304
    • 3508.8
    • 8
    • 98.8
    • 69
    • 9761.44
    • 4761
    • 6817.2
    • 9
    • 52.6
    • 30
    • 2766.76
    • 900
    • 1578.0
    • 10
    • 85.8
    • 59
    • 7361.64
    • 3481
    • 5062.2
  • Total (∑) ∑ x = 737.5 ∑ y = 502 ∑ x2 = 57, 250.93 ∑ y2 = 27, 936 ∑ XY = 38, 278.9
  • Means
    𝒙 = 𝟕𝟑��. 𝟓
    𝟏𝟎
    = 𝟕𝟑. 𝟕𝟓
    𝒚 = 𝟓𝟎𝟐
    𝟏𝟎 = 𝟓��. 𝟐