You are here: CTSPedia>CTSpedia Web>ResearchTopics>LinearRegression (01 May 2010, ErinEsp)EditAttach

- How much does a chosen variable contribute to reducing the remaining variation in Y after allowance is made for the contributions of other predictor variables that have been included in the model.
- The importance of the variable as a causal agen in the process under analysis.
- The degree to which observations on the variable can be obtained more accurately, or quickly, or economically than on other variables.
- The degree to which the variable can be controlled.

- The mean of the probability distribution of the error term is 0.
- The variance of the probability distribution of the error term is constant for all values of the predictor X.
- The probability distribution of the error term is normal.
- The values of the error term associated with any two observed values of y are independent. That is, the value of the error term associated with one value of Y has no effect on any of the values of the error associated with any other Y value.

First we want to determine whether or not a linear regression function is appropriate for the data being analyzed. This assumption can be studied from a plot of the residuals (y axis) versus the predictor variable (x axis) or, equivalently, from a plot of the residuals versus the fitted values. When we plot the residuals versus the predictor variable we are looking for an even spread around the line Y = 0, this would satisfy the assumption that mean of the error terms is 0. In the same plot if the residuals appear to have an even spread around the line Y = 0 then this would indicate a constant variance which would satisfy another of our assumptions. Lastly, if the residuals fall along the line in a normal probability plot then this would validate our assumption of normality.

The goal is to estimate the regression function based on the above data. In order to determine the type of relationship between X and Y we should first look at a scatter plot. Consider the following plot:

From the above plot we see that there appears to be an upward sloping linear trend between the number of times the carton was transferred and the number of ampules that were broken. Hence to obtain the estimated regression function, we will use a simple linear regression model of the form:

To interpret this model we say that as number of times the cartons are transfered from one aircraft to another increases by 1, on average the number of ampules broken increases by 4. Also if the cartons are never transfered to a different aircraft we can estimate that approximately 10 ampules will be broken.

Next we want to check the model assumptions in order to verify the above interpretation of the model. Let's first consider a plot of the residual values versus the fitted values, given below.

From the above plot we can see that the residuals seem to be evenly clustered around zero and the spread of the residuals appears to be constant. In order to be more confident in this conclusion we would prefer to have more data observations, however based on the scatter plot and the above residual plot it appears that a simple linear model is a good fit for this particular applications. Next let's consider a normal probability plot of the residuals.

Since the points appear to follow the line in the normal probability plot we can conclude that our assumption of normality is reasonable.

-- ErinEsp - 30 Apr 2010

Topic revision: r11 - 01 May 2010 - 01:08:09 - ErinEsp

Copyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding CTSPedia? Send feedback

Ideas, requests, problems regarding CTSPedia? Send feedback