Data Transformations: statistical voodoo or truth serum for your data?

Anyone that has taken a statistics class has probably learned about transforming data, at one time or another (although you may be in denial about it). In short, you may want to transform your data if you need to perform a parametric analysis, but the inherent assumptions are violated in your dataset. While this seems simple enough, many researchers are hesitant to employ this tactic of handling non-normally distributed data. Often, they can't quite put their finger on and what bothers them about it, but the idea of "artificially changing their data" leaves many feeling uneasy.

As a researcher myself, I can relate, as many of the same people that preach the importance of considering analysis assumptions also teach us to feverishly protect the integrity of our data. However, for many of us I believe this may be where our methodological good -intentions lead us astray. While protecting the integrity of our data is indeed of paramount importance, transforming variables for the purposes of reaching normality of distribution is unfairly characterized as a threat to this integrity. In fact, one could easily argue that making inferences from results that are biased by non-normally distributed variables may be a greater threat to the integrity of your data and analysis than any transformation. It is perfectly reasonable for a well-intentioned researcher to worry about the consequences of transformation and for them to be wary of a transformation that subsequently brings a significant result where one once was not present. However, that researcher should rest their worries on the realization that an appropriately applied transformation of data generally raises the likelihood that their test of significance is UNBIASED, while it DOES NOT typically raise the likelihood that one will find significance in the absence of a true relationship (type I error). When no true relationship exists between two variables, a test is no more likely to find significance when its variables are transformed than it is when they are not.

From a pragmatic perspective, using transformation offers the benefit of allowing a researcher to utilize the techniques that they are familiar with and more likely to apply in their current situation, while minimizing the potential bias non-normally distributed data. In many cases, depending on the type of analysis being used and the design of your study, alternative and possibly more sophisticated techniques may exist for dealing with non-normal data. However, many of these techniques require substantial statistical experience and may be intimidating to many seeking to deal with their assumption problems. While transformation is surely not the answer to all problems with assumptions, or even non-normal data, it is far from "voodoo" and is an attractive alternative to turning a blind-eye to the distribution of the data.

Stats Make Me Cry Blog EntriesJeremy J. TaylorApril 26, 2010assumptions, normal distribution, normality, parametric, regression11 Comments