James is a biologist who is currently researching beluga whales. Currently he is interested in the relationship between the velocity at which the beluga whale swims and the tail-beat frequency of the whale. James took a sample of 20 whales and measured the swimming velocity, in units of body-lengths per second, and tail-beat frequency, in units of hertz. The following table shows the observed values for each variable.

James wants to visually determine what type of relationship (if any) exists between the swimming velocity and the tail-beat frequency. A scatter plot, which is simply a plot of the predictor variable on the x-axis and the response variable on the y-axis, is typically used for this purpose. James believes that the more tail-beats a whale has the faster that whale will swim. Thus the predictor variable in James' study is the tail-beat frequency and the response variable is the swimming velocity. A scatter plot of the relationship between the two variables is provided below.

From the above scatter plot, James noticed that there is an observation with an unusually large velocity compared to the others. He wants to know if this observation is an outlier and, if so, what his options are for dealing with it.

James thinks that his observed velocity of 2.0 may be an outlier. In order to calculate the z-score, James first needs to calculate the mean and standard deviation of the observed velocities. He calculates the mean as 0.6549541 and the standard deviation as 0.4165770. Thus the z-score when x = 2.0 is z = (2 - 0.6549541)/0.4165770 = 3.228805. Since this value is greater than 3 James will consider this observation an outlier.

In order to determine whether his observation is an outlier by this criteria, James will need to calculate the quartiles, the interquartile range, and the fences.

Lower Quartile = 0.3984

Upper Quartile = 0.7647

IQR = 0.7647 - 0.3984 = 0.3663

Inner Fences = 0.3984 - 1.5(0.3663) = -0.15105

= 0.7647 + 1.5(0.3663) = 1.31415

Outer Fences = 0.3984 - 3(0.3663) = -0.7005

= 0.7647 + 3(0.3663) = 1.8636

From these values, James notices that not only is the velocity of 2.0 outside of the inner fences, implying its an outlier, but is also outside of the outer fences. Hence he concludes that his observation is an extreme outlier.

James has no such evidence and thus he will keep the observation for his analysis.

To interpret this model we say that as the number of tail-beats increases by one, then on average the velocity will increase by 1.27341 body-lengths per second. The above fitted model can be highly distorted by outliers hence James is interested in seeing how much his outlier is affecting the fit of the model. An observation is considered influential if its exclusion causes major changes in the fitted regression function. James will use two different measures to determine if his outlier is infuential.

- Influence on Single Fitted Value -- DFITS
- Influence on All Fitted Values -- Cook's Distance

In order to measure the influence of the observation (1.9, 2.0) on all the fitted values James calculates Cook's Distance. This calculation is a bit more computationally intense and thus we will simply give the calculated value. James finds the Cook's Distance of the above observation to be 3.789618 which is greater than 1. Hence the observation (1.9, 2.0) has high influence.

Based on the two measures of influence, James has concluded that the outlier has a high influence on his regression model. Because of this, and since James is unsure as to whether or not the observation is a mistake or from a different population, any inference he makes may be highly inaccurate.

A possible alternative approach would be to use a nonparametric method to model the relationship between the number of tail-beats and the swimming velocity of the beluga whale.

-- ErinEsp - 01 May 2010

Topic revision: r15 - 07 Sep 2011 - 17:03:26 - MaryBanach

Copyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding CTSPedia? Send feedback

Ideas, requests, problems regarding CTSPedia? Send feedback