Regression: Biostatistics and Research Methodology Theory, Notes, PDF, Books
Regression analysis is a way of mathematically sorting out which of those variables does indeed have an impact. It answers the questions: Which factors matter most? Which can we ignore? How do those factors interact with each other? And, perhaps most importantly, how certain are we about all of these factors?
In regression analysis, those factors are called variables. You have your dependent variable— the main factor that you’re trying to understand or predict. In Redman’s example above, the dependent variable is monthly sales. And then you have your independent variables — the factors you suspect have an impact on your dependent variable.
In statistics, it’s hard to stare at a set of random numbers in a table and try to make any sense of it. For example, global warming may be reducing average snowfall in your town and you are asked to predict how much snow you think will fall this year. Looking at the following table you might guess somewhere around 10-20 inches. That’s a good guess, but you could make a better guess, by using regression.
Essentially, regression is the “best guess” at using a set of data to make some kind of prediction. It’s fitting a set of points to a graph. There’s a whole host of tools that can run the regression for you, including Excel, which I used here to help make sense of that snowfall data:
Just by looking at the regression line running down through the data, you can fine tune your best guess a bit. You can see that the original guess (20 inches or so) was way off. For 2015, it looks like the line will be somewhere between 5 and 10 inches! That might be “good enough”, but regression also gives you a useful equation, which for this chart is:
y = -2.2923x + 4624.4.
What that means is you can plug in an x value (the year) and get a pretty good estimate of snowfall for any year. For example, 2005:
y = -2.2923(2005) + 4624.4 = 28.3385 inches, which is pretty close to the actual figure of 30 inches for that year.
Best of all, you can use the equation to make predictions. For example, how much snow will fall in 2017?
y = 2.2923(2017) + 4624.4 = 0.8 inches.
Regression also gives you an R squared value, which for this graph is 0.702. This number tells you how good your model is. The values range from 0 to 1, with 0 being a terrible model and 1 being a perfect model. As you can probably see, 0.7 is a fairly decent model so you can be fairly confident in your weather prediction!