multiple regression - Q&A 1 | first steps in DS ocean

Hi again on these first days of December!

As promised last time, there are several questions needed to be answered regarding multiple linear regression described in my previous post. Let me start with:

How to determine whether there is a relationship between the response and the predictors?

In order to verify that, we will use F-statistic with the null hypothesis: H₀: β₁ = β₂ = … = β_n = 0 and the alternative hypothesis will be: at least one of coefficients is non-zero.

Hope you remember TSS used in R² statistics, so the formula for F is as follows: F=[(TSS-RSS)/p]/[RSS/(n-p-1)], where (!) p – number of predictors and n – number of observations in our sample. When to reject the null hypothesis and when not? When n is large, F-statistics that is just a little larger than 1 might still provide evidence to reject the null hypothesis. In contrast, a larger F-statistics is needed to reject H₀if n is small. As in the previously described statistic, we might also look into p-value for that one and then decide what to do with hypothesis.

In general, F-tests are used for comparing two samples and their variances, as in our case, for a sample of predicted values and sample of given data. When errors have normal distribution, the F-statistics follows F-distribution.

Okay, but we might want to check whether just set of coefficients equals zero and whether just set is important in our model. In that case we will consider null hypothesis for a subset of q coefficients, where variables chosen for exclusion are at the end: H₀: β_n-q+1 = β_n-q+2 = … = β_n = 0

Then we will be considering a model that uses all the variables except those last q. RRS for that model will be assigned to RRS₀, and F-statistics will be given by: F=[(RSS₀-RSS)/q]/[RSS/(n-p-1)].

Todo, wait for more!

hugs&kisses,

szarki9

multiple regression - Q&A 1

You may also enjoy...

multiple regression - Q&A 3

multiple regression - Q&A 2