Linear regression and statistical inference

    Cards (30)

    • Frisch-Waugh-Lovell theorem: coefficient on X1 can be recovered in a two stage process - regress X1 on X2 to obtain the error (the part of X1 uncorrelated with X2) and then regress Y on Xtilda1(the error)
    • Perfect multicollinearity arises when one regression variable is perfectly explained by the other regression variable(s)
    • If you have m mutually exclusive and collectively exhaustive categories, include only m-1 membership dummies
    • Standard error of regression (SER) is the estimated standard deviation of u
    • SER is standardised by n-k-1 to ensure unbiasedness, large-sample properties unaffected
    • TSS = ESS + SSR
    • R^2 = ESS/TSS=1-SSR/TSS
    • R^2 may increase even if the added regressor is irrelevant
    • adjusted R^2 = 1 - (n-1)/(n-k-1) . SSR/TSS
    • Least Squares Assumptions: u satisfies MI/OR, Yi and Xki are i.i.d, finite fourth moments of Y and X, no perfect multicollinearity
    • having little variability around Beta1 as possible = efficiency
    • under normally distributed errors, OLS has the smallest variance amongst all unbiased estimators
    • Under the LSA, OLS is consistence
    • Homoskedastic: conditional variance of u does not depend on the regressors
    • there is rarely a good reason to assume homoskedasticity
    • t-stat under the null is a standard normal distribution
    • t-stat diverges under the alternative hypothesis - large values of t-stat are evidence against the null
    • type 1 error: reject the null when it is true
    • type 2 error: accept the null when it is false
    • size of a test = type 1 error rate
    • power of a test: ability to detect a false null = 1 - type 2 error rate
    • p-value: the smallest significance level at which we would have rejected the null, on the basis of the sample OR the probability, under the null, of obtaining a value of the test statistic 'at least as unfavourable to the null' as the test statistic calculated
    • 95% of 95% confidence intervals will contain the true value of the parameter
    • use the F-statistic to test multiple hypotheses at once
    • Idea of the F test: estimate regression model with null imposed (restricted model) and without (unrestricted model) and measure how much worse the restricted model 'fits' the data
    • F = (SSR[rs]-SSR[un])/SSR[un] . n-k-1/q
    • Can use an F-statistic to test the null of linearity against an rth degree polynomial
    • Linear-log model: 'a 1% increase in X has an 0.01 x Beta1 effect on Y'
    • Log-linear model: 'a 1 unit increase in X increases Y by 100 x Beta1%'
    • Log-Log model: Beta1 is the elasticity of Y w.r.t X
    See similar decks