Skip to content
RBJLabs ®

Regression analysis

The regression analysis is a very interesting topic that many times the utility that we see is null, some people like you do see a great utility to regression analysis since statistics is a very useful tool for modeling equations. And in the regression analysis we will analyze an experiment, so remember that the ultimate goal of an experiment is to predict the behavior of a certain phenomenon, which can be represented in a regression graph.

When performing an experiment, we obtain pairs of data (X, Y) that are placed in a Cartesian plane and thus these points form a scatter diagram.

The points can approach a curve in such a way that it adjusts to the behavior of the phenomena. In addition, this curve can be of many types, such as parabola, exponential or geometric.

Simple linear regression

This type of regression uses the independent variable (x) and a dependent variable (y) for a population.

Suppose that the relation between both is a straight line, therefore, the line can be written in the following way:

Y = \beta_{0} + \beta_{1}x

An experiment consists of random results, that is, we can not predict exactly what value will be measured from the independent variable. To which it refers that since we are using samples obtained by an experiment, the data refer to estimates, resulting in the following graph:

\hat{y} =\hat{\beta}_{0} + \hat{\beta}_{1}x

How to calculate the regression coefficients

\hat{y} is the regression line, but that \hat{y} is matched to other things, which are \hat{\beta}_{0} plus \hat{\beta}_{1} x . To find the values of \hat{\beta}_0 and \hat{\beta}_{1}, we will give you at once the formulas for you to calculate those values called regression coefficients. Calm down, the formulas are a little tedious where they come from, but they are not so tedious to solve:

\hat{\beta}_{1} = \cfrac{n\sum xy - \left[ \left( \sum x\right)\left(\sum y \right)\right]}{n\sum x^{2} - \left( \sum x \right)^{2}}

\hat{\beta}_{0} = \bar{y} - \hat{\beta}_{1}\bar{x}

Some considerations of the regression line

  • Do not use the least squares line when the data is not linear.
  • Estimators are not the same as true values.
  • That there is a relationship does not indicate that there is causality between the two.

Example of regression analysis

Let’s go with the statement:

In this regression line example that we will see, we measured the inertial weight (in tons) and the fuel savings (in miles/gallon) for a sample of seven diesel trucks, predicts how different are the truck’s mileage if they have a difference of 5 tons. The following table presents the results:

\begin{array}{|c|c|}
\hline
\text{Weight }(x) & \text{Mileage } (y) \\
\hline
8\text{.}00 & 7\text{.}69 \\
24\text{.}50 & 4\text{.}97 \\
27\text{.}00 & 4\text{.}56 \\
14\text{.}50 & 6\text{.}49 \\
28\text{.}50 & 4\text{.}34 \\
12\text{.}75 & 6\text{.}24 \\
21\text{.}25 & 4\text{.}45 \\
\hline
\end{array}

Very well, what can be observed is that the weight is the independent variable x and the mileage is the dependent variable y. Now, to make the calculation of the regression line, we need the elements of the regression coefficients, so we need \sum x, \sum y, \sum xy, \sum x^{2} and \left(\sum x\right)^{2}

Let’s first determine the sum of x and y:

\begin{array}{| c | c |}
\hline
\text{Weight } (x) & \text{Mileage } (y) \\
\hline
8\text{.}00 & 7\text{.}69 \\
24\text{.}50 & 4\text{.}97 \\
27\text{.}00 & 4\text{.}56 \\
14\text{.}50 & 6\text{.}49 \\
28\text{.}50 & 4\text{.}34 \\
12\text{.}75 & 6\text{.}24 \\
21\text{.}25 & 4\text{.}45 \\
\hline
\sum x = 136\text{.}5 & \sum y = 38.74\\
\hline
\end{array}

Now let’s multiply x by y and we will calculate the sum:

\begin{array}{| c | c | c |}
\hline
\text{Weight }(x) & \text{Mileage } (y) & xy \\
\hline
8\text{.}00 & 7\text{.}69 & 61\text{.}52\\
24\text{.}50 & 4\text{.}97 & 121\text{.}765\\
27\text{.}00 & 4\text{.}56 & 123\text{.}12 \\
14\text{.}50 & 6\text{.}49 & 94\text{.}105 \\
28\text{.}50 & 4\text{.}34 & 123\text{.}69 \\
12\text{.}75 & 6\text{.}24 & 79\text{.}56 \\
21\text{.}25 & 4\text{.}45 & 94\text{.}5625 \\
\hline
& & \sum xy = 698\text{.}3225 \\
\hline
\end{array}

Then we will calculate the sum of x squared:

\begin{array}{| c | c | c |}
\hline
\text{Weight }(x) & \text{Mileage }(y) & x^{2} \\
\hline
8\text{.}00 & 7\text{.}69 & 64 \\
24\text{.}50 & 4\text{.}97 & 600\text{.}25 \\
27\text{.}00 & 4\text{.}56 & 729 \\
14\text{.}50 & 6\text{.}49 & 210\text{.}25 \\
28\text{.}50 & 4\text{.}34 & 812\text{.}25 \\
12\text{.}75 & 6\text{.}24 & 162\text{.}5625 \\
21\text{.}25 & 4\text{.}45 & 451\text{.}5625 \\
\hline
& & \sum x^{2} = 3029\text{.}875 \\
\hline
\end{array}

Next we are going to calculate the sum of y squared:

\begin{array}{| c | c | c |}
\hline
\text{Weight }(x) & \text{Mileage }(y) & y^{2} \\
\hline
8\text{.}00 & 7\text{.}69 & 59\text{.}1361 \\
24\text{.}50 & 4\text{.}97 & 24\text{.}7009 \\
27\text{.}00 & 4\text{.}56 & 20\text{.}7936 \\
14\text{.}50 & 6\text{.}49 & 42\text{.}1201 \\
28\text{.}50 & 4\text{.}34 & 18\text{.}8356 \\
12\text{.}75 & 6\text{.}24 & 38\text{.}9376 \\
21\text{.}25 & 4\text{.}45 & 19\text{.}8025 \\
\hline
& & \sum y^{2} = 224\text{.}3264 \\
\hline
\end{array}

We need to calculate the average of x and the average of y, let’s calculate it:

\bar{y} = \cfrac{\sum y}{n} = \cfrac{38.74}{7} = 5.5342

\bar{x} = \cfrac{\sum x}{n} = \cfrac{136.5}{7} = 19.5

Finally, taking the equations of the regression coefficients, substitute all the values in the formulas:

\hat{\beta}_{1} = \cfrac{7(698.3225)-[(136.5)(38.74)]}{7(3029.875) - 18632.25} = - 0.1551

\hat{\beta}_{0} = \bar{y} - \hat{\beta}_{1}\hat{x} = 5.5342 - ( - 0.1551)(19.5) = 8.5593

Our regression line will be as follows:

\hat{y} = 8.5593 - 0.1551x

Predict how different the mileage of two 5-ton difference trucks is

To do this part of the exercise, two values of x were taken with a difference of 5 tons and substituted in the equation of the line that was found, so values of x = 9 and x = 14 were taken:

x = 9

\hat{y} = 8.5593 - 0.1551(9) = 7.1631

x = 14

\hat{y} = 8.5593 - 0.1551(14) = 6.3875

Then you have to make the difference of the two values obtained:

7.1631 - 6.3875 = 0.7756

So it is predicted that your mileage has 0.7756 difference if the trucks have 5 tons difference.

Thank you for being at this moment with us:)