Linear regression and statistical inference

Cards (30)

  • Frisch-Waugh-Lovell theorem: coefficient on X1 can be recovered in a two stage process - regress X1 on X2 to obtain the error (the part of X1 uncorrelated with X2) and then regress Y on Xtilda1(the error)
  • Perfect multicollinearity arises when one regression variable is perfectly explained by the other regression variable(s)
  • If you have m mutually exclusive and collectively exhaustive categories, include only m-1 membership dummies
  • Standard error of regression (SER) is the estimated standard deviation of u
  • SER is standardised by n-k-1 to ensure unbiasedness, large-sample properties unaffected
  • TSS = ESS + SSR
  • R^2 = ESS/TSS=1-SSR/TSS
  • R^2 may increase even if the added regressor is irrelevant
  • adjusted R^2 = 1 - (n-1)/(n-k-1) . SSR/TSS
  • Least Squares Assumptions: u satisfies MI/OR, Yi and Xki are i.i.d, finite fourth moments of Y and X, no perfect multicollinearity
  • having little variability around Beta1 as possible = efficiency
  • under normally distributed errors, OLS has the smallest variance amongst all unbiased estimators
  • Under the LSA, OLS is consistence
  • Homoskedastic: conditional variance of u does not depend on the regressors
  • there is rarely a good reason to assume homoskedasticity
  • t-stat under the null is a standard normal distribution
  • t-stat diverges under the alternative hypothesis - large values of t-stat are evidence against the null
  • type 1 error: reject the null when it is true
  • type 2 error: accept the null when it is false
  • size of a test = type 1 error rate
  • power of a test: ability to detect a false null = 1 - type 2 error rate
  • p-value: the smallest significance level at which we would have rejected the null, on the basis of the sample OR the probability, under the null, of obtaining a value of the test statistic 'at least as unfavourable to the null' as the test statistic calculated
  • 95% of 95% confidence intervals will contain the true value of the parameter
  • use the F-statistic to test multiple hypotheses at once
  • Idea of the F test: estimate regression model with null imposed (restricted model) and without (unrestricted model) and measure how much worse the restricted model 'fits' the data
  • F = (SSR[rs]-SSR[un])/SSR[un] . n-k-1/q
  • Can use an F-statistic to test the null of linearity against an rth degree polynomial
  • Linear-log model: 'a 1% increase in X has an 0.01 x Beta1 effect on Y'
  • Log-linear model: 'a 1 unit increase in X increases Y by 100 x Beta1%'
  • Log-Log model: Beta1 is the elasticity of Y w.r.t X