cours 4

Created by

Houda

Cards (91)

Regression Analysis 
A set of statistical techniques that allow one to assess the relationship between one dependent variable (DV) and several independent variables (IVs)
View source
Why Use Regression? 
Prediction: fitting a predictive model to an observed dataset, then using that model to make predictions about an outcome from a new set of explanatory variables
Explanation: fit a model to explain the relationships between a set of variables
View source
Explanatory variable (x) 
Exposure, predictor, independent variable
View source
Dependent variable (y) 
Outcome, response
View source
Univariate (aka, simple) linear regression 
Single explanatory variable
View source
Multivariate (multiple) regression 
Multiple explanatory variables
View source
Linear Regression Characteristics 
Dependent variable must be continuous
Explanatory variables can be continuous or categorical
View source
Purpose of regression 
Quantitative Description and Explanation of Relationships
Estimating (Predicting) Unknown Values of the Dependent Variable
View source
Factors Affecting Regression
View source
Linear Regression Models 
A specific type of data modeling, where a straight line is fit "neatly in the middle" of the data to explain the relationship between the variables
View source
Linear Regression Model 
1. y = b0 + b1x
2. y is the predicted value (the criterion)
3. b1x is the slope of the line
4. b0 is the y-intercept when x = 0
5. x represents the value of the predictor
View source
Linear Regression Models 
Indicate a "trend" in the data based on a regression line
Ideal for making predictions: once we have a line, we can make predictions for the Y value (the criterion) for each X value (the predictor)
Commonly used for prediction and trend description
View source
How to draw the regression line 
LS Method: The regression line is the "best fit line" that minimizes the sum of the squared deviations between each point and the line
View source
Ordinary Least Squares (OLS) 
A foundational method for fitting a regression line<|>Principle: Minimize the sum of squared differences between observed values and those predicted by the line
View source
OLS Process 
1. Calculate residuals (differences between observed and predicted values)
2. Square residuals and sum them up
3. Choose line parameters (slope, intercept) that minimize this sum
View source
Simple Linear Regression Formula 
y = β0 + β1 X + ϵ<|>Y is the outcome variable<|>X is the predictor variable<|>β0 is the intercept (the value of Y when X = 0)<|>β1 is the slope (the change in Y for a one-unit increase in X)<|>ϵ is the error term (the difference between the observed and predicted values of Y)
View source
Standardized vs. Unstandardized Regression Coefficients
View source
Interpreting Standardized vs. Unstandardized Regression Coefficients in Simple Regression
Unstandardized Coefficients (b): Direct interpretation in original units
Standardized Coefficients (β): Interpretation in terms of standard deviations
View source
squared (Coefficient of Determination) 
Represents the proportion of variance in the dependent variable (Y) explained by the independent variable (X)<|>Ranges from 0 to 1<|>Higher R-squared values indicate a stronger relationship between the variables
View source
Adjusted R-squared 
Takes into account the number of predictors in the model<|>Adjusts for the inclusion of additional predictors that may not improve the model's explanatory power<|>More useful in multiple regression, but can also be used in simple regression for comparison purposes
View source
Assumptions of Linear Regression
Linearity: There is a linear relationship between the independent variables and the dependent variable
Independence: Observations are independent of each other
Homoscedasticity: The variance of the error terms is constant across all levels of the independent variables
Normality: The error terms are normally distributed
No multicollinearity: In multiple regression, the independent variables are not highly correlated with each other
View source
Violations of linear regression assumptions can lead to inaccurate or biased estimates
View source
It is important to check and address these assumptions when performing linear regression analysis
View source
Linearity 
The relationship between the independent variables and the dependent variable is linear
Check with scatterplots or residual plots
Address violations with data transformations or non-linear models
View source
Independence 
Observations are independent of each other
Often assumed in random samples or experiments
Check with the Durbin-Watson test for time series data
Address violations with alternative models (e.g., time series models)
View source
Homoscedasticity 
The variance of the error terms is constant across all levels of the independent variables
Check with residual plots
Address violations with weighted least squares, data transformations, or robust regression methods
View source
Normality 
The error terms are normally distributed
Check with histograms, Q-Q plots, or normality tests (e.g., Shapiro-Wilk test)
Address violations with data transformations or robust regression methods
View source
No multicollinearity 
In multiple regression, the independent variables are not highly correlated with each other
Check with correlation coefficients or the Variance Inflation Factor (VIF)
Address violations by removing or combining highly correlated variables, or using dimensionality reduction techniques (e.g., PCA)
View source
Simple linear regression 
Models the relationship between a single predictor and a response variable
View source
Multiple regression 
Extends simple linear regression to include multiple predictors, allowing for a more comprehensive understanding of relationships and improving prediction accuracy
View source
Benefits of multiple regression 
Assess the impact of multiple factors on a response variable
Control for confounding variables
Build more robust and accurate models
View source
Simple linear regression model 
y = β0 + β1X + ϵ
View source
Multiple regression model 
y = β0 + β1X1 + β2X2 + · · · + βnXn + ϵ
View source
Control variables 
Variables included in the model to account for potential confounding factors
View source
Implementing control in regression using Frisch-Waugh-Lovell Theorem
1. Regress Control Variables (Z) on IV (X): Obtain the residuals
2. Regress Control Variables (Z) on DV (Y): Obtain the residuals
3. Regress Residuals of X on Residuals of Y: The slope of this regression provides the effect of X on Y, controlling for Z
View source
Regression coefficients 
β0 (Intercept): Expected value of y when all predictors are zero<|>β1, β2, . . . , βn (Coefficients): Expected change in y for a one-unit increase in the corresponding predictor, holding all other predictors constant
View source
Holding predictors constant is an important conceptual framework for understanding the unique contribution of each predictor in multiple regression
View source
Adding control blocks in multiple regression
1. Identify primary predictors of interest and potential control variables
2. Group control variables into meaningful blocks
3. Add control blocks sequentially to the regression model and evaluate changes in primary predictor coefficients
View source
Benefits of control blocks in multiple regression
Enhanced understanding of the relationships between predictors and the dependent variable<|>Identification of potential confounding factors<|>Systematic approach to adding control variables in the model<|>Improved model interpretability
View source
Additive models assume the effects of predictors are independent, while interaction models allow the effects of predictors to depend on each other
View source