multiple regression - Q&A 1

Hi again on these first days of December!

As promised last time, there are several questions needed to be answered regarding multiple linear regression described in my previous post. Let me start with:

How to determine whether there is a relationship between the response and the predictors?

In order to verify that, we will use F-statistic with the null hypothesis: H0: β1 = β2 = … = βn = 0 and the alternative hypothesis will be: at least one of coefficients is non-zero.

Hope you remember TSS used in R­­² statistics, so the formula for F is as follows: F=[(TSS-RSS)/p]/[RSS/(n-p-1)], where (!) p – number of predictors and n – number of observations in our sample. When to reject the null hypothesis and when not? When n is large, F-statistics that is just a little larger than 1 might still provide evidence to reject the null hypothesis. In contrast, a larger F-statistics is needed to reject H0 if n is small. As in the previously described statistic, we might also look into p-value for that one and then decide what to do with hypothesis.

In general, F-tests are used for comparing two samples and their variances, as in our case, for a sample of predicted values and sample of given data. When errors have normal distribution, the F-statistics follows F-distribution.

Okay, but we might want to check whether just set of coefficients equals zero and whether just set is important in our model. In that case we will consider null hypothesis for a subset of q coefficients, where variables chosen for exclusion are at the end: H0: βn-q+1 = βn-q+2 = … = βn = 0

Then we will be considering a model that uses all the variables except those last q. RRS for that model will be assigned to RRS0, and F-statistics will be given by: F=[(RSS0-RSS)/q]/[RSS/(n-p-1)].


Todo, wait for more!

hugs&kisses,

szarki9