## Multiple regression Biostatistics and Research Methodology Theory

Multiple regression analysis is used to see if there is a statistically significant relationship between sets of variables. It’s used to find trends in those sets of data.

Multiple regression analysis is *almost* the same as simple linear regression. The only difference between simple linear regression and multiple regression is in the number of predictors (“x” variables) used in the regression.

- Simple regression analysis uses a single x variable for each dependent “y” variable. For example: (x
_{1}, Y_{1}). - Multiple regression uses multiple “x” variables for each independent variable: (x1)
_{1}, (x2)_{1}, (x3)_{1}, Y_{1}).

In one-variable linear regression, you would input one dependent variable (i.e. “sales”) against an independent variable (i.e. “profit”). But you might be interested in how **different types** of sales effect the regression. You could set your X_{1} as one type of sales, your X_{2} as another type of sales and so on

From this graph, it might appear there is a relationship between the life expectancy of women and the number of doctors in the population. In fact, that’s probably true and you could say it’s a simple fix: put more doctors into the population to increase life expectancy. But the reality is you would have to look at other factors like the possibility that doctors in rural areas might have less education or experience. Or perhaps they have a lack of access to medical facilities like trauma centres.

The addition of those extra factors would cause you to add additional dependent variables to your regression analysis and create multiple regression analysis models.

## Multiple Regression Analysis Output.

Regression analysis is always performed in software, like Excel or SPSS. The output differs according to how many variables you have but it’s essentially the same type of output you would find in a simple linear regression. There’s just more of it:

- Simple regression: Y = b
_{0}+ b_{1}x. - Multiple regression: Y = b
_{0}+ b_{1}x1 + b_{0}+ b_{1}x2…b_{0}…b_{1}xn.

The output would include a summary, similar to a summary for simple linear regression, that includes:

- R (the multiple correlation coefficient),
- R squared (the coefficient of determination),
- adjusted R-squared,
- The standard error of the estimate.

These statistics help you figure out how well a regression model fits the data. The ANOVA table in the output would give you the p-value and f-statistic.

Suggested Readings: