Question:

How to do Statistical hypothesis test when sample can't be assumed normal?

by | earlier

I am doing T test on two samples to compare if they are different, but one of them fails chi square good-of-fitness test for normal assumption (normal plot also shows obvious deviation).

Since this violates assumption underlying T test, what should I do to compare these two data sets?

Tags:

Report

Answer The Question I've Same Question Too

Follow Question

2 ANSWERS

Sort By: Date | Rating

Use a Boot strapping, or permutation test.  This type of test does not rely on any assumption about the underlying distribution of the data.

here is a link to details

http://www-stat.stanford.edu/~susan/cour...

the over view of a permutation test between two samples is this:

assume the two samples are from the same population, this is also the null hypothesis.  say sample one has m elements and sample 2 has n elements.  there is a total of n + m data points.  take all the data into one vector, take a random sample of size n from the complete data set and find the mean, record this mean.  take another sample of size n from the sample, find and record the mean.  Do this for all permutations is possible, but that could be very painful, it will be sufficient to take say 10000 random samples of size n.  (note that a computer and some coding will likely be needed).

Once you have these 10000 means, sort them, then look where the rank of your observed mean from sample 2 is in this list from the random sample means.  This is how you find the p-value. If the observed mean for sample 2 is in the extremes of the sorted list from the bootstrap then we conclude that the two samples are not from the same population.

you can conduct one and two - tail tests using this method.

Report (0) (0) |   earlier
One option is to try find another distribution that the sample DOES fit, from which you can then do the hypothesis tests.  That's not necessarily going to be easy though.

You can also try see if there is any sort of transformation you can perform on the data (eg take logs, square roots, or whatever) that produces something that passes the tests.  Again though, that'll be a lot of trial and error.

Depending on how badly the data failed the tests though, you can always try using a T anyway (reduce degrees to apply something of a compensation).  Can also try some of the other tests for normality to see if they can provide any justification to this route of investigation.
Report (0) (0) |   earlier