Top Twelve Tip #6
Understand the assumptions behind your methods
Correlation is seen as answering a general question – "Is there a systematic relation between these two variables?" The most commonly used correlation coefficient is Pearson's r. Pearson’s coefficient should be remembered as “Pearson’s LINEAR correlation coefficient”. It measures the strength of a LINEAR association between two variables. There are other types of correlation, and correlation coefficients. The graph on the left below shows two variables generated to have a linear relationship. In addition to Pearson’s r, Spearman’s rho and Kendall’s tau also measure correlation -- monotonic (heading in the same direction, but not necessarily at a constant rate) correlation. All three coefficients do an outstanding job of seeing the linear correlation in the left graph below. Note: Kendall’s tau is measured on an alternate scale approximately 0.15 to 0.20 below the others two coefficients for the same strength of correlation (like Centigrade vs Fahrenheit).
Now look to the right. The Y variable was cubed to produce a nonlinear relation -- the slope between Y and X now changes as X increases. Pearson’s r (0.50) cannot see the relation as well because the pattern is not linear. Rho and tau remain exactly the same because the order of observations stays the same. Rho and tau are not dependent on a linear model of the relationship. They measure both linear and nonlinear correlation.
Make sure that you understand the assumptions of the methods you use, and use the tool that best fits your objective. If you want a general measure of "systematic relation", rho or tau would be more appropriate than Pearson's r. Kendall’s tau is commonly used in trend analysis as a flexible measure of change versus time. Do you really want to only look for linear patterns?
<—- Back to the Top 12 Tips Listing page