Top Twelve Tip #2
Treat Outliers Like Children: correct them when needed, but never throw them out
Many scientists perform outlier tests such as Grubbs or Rosner tests to determine whether an observation is an outlier. Then they toss the observation away if it is. Outlier tests determine only whether an observation is likely to have been generated from a normal distribution. Most field data in environmental sciences are skewed, and do not look like a normal distribution. The simple fact that most data collected in the field have a zero lower bound introduces skewness, just as in the data here. There is no reason to suspect that they should look like a normal distribution. Rejecting that the top one or few observations come from a normal distribution is no reason to label those observations ‘bad’ and toss them away. They probably cost a great deal of time and money to collect, and are the product of that scientist’s good work.
A box plot of 23 observations shows one ‘outlier’ by the Dixon test.
If logarithms or cube roots of the data were taken, the top observation would not test as a significant outlier. Can an observation be ‘bad’ in one measurement scale but ‘good’ in another? The top observations often rate as not coming from a normal distribution. For environmental science, that’s ‘normal’. Don’t be quick to toss your data away, and most importantly, base that decision on science (where and when the data were collected, perhaps) rather than on a statistical test.
There is no test for ‘badness’ in statistics.
<—- Back to the Top 12 Tips Listing page