# testing statistical dfference between medians of a sample and a subsample extracted from the sample

## testing statistical dfference between medians of a sample and a subsample extracted from the sample

 HI all ! In a data analysis I am required to perform a statistical test (parametric) to know the statistical  difference(if significant) between median of 2 sample where one is full sample and another is sub sample extracted from the full sample based on a given characteristics (e.g. respondents belonging to certain age group). Can anyone suggest how togo about it in spss ? regards vini
## Re: testing statistical dfference between medians of a sample and a subsample extracted from the sample

 "Parametric" and "median" don't usually go together.  See (3)for a possible meaning.  There are other problems.First - There is no such thing as a "proper" test for a sub-sampleversus the whole sample that it comes from.  The necessary logicsays that you compare a sub-sample to the *rest*  of the sample.You may occasionally see a good presentation that does use theapproximate tests of this sort, for convenience and ease, plus astrong desire to accommodate Ns that are unequal.  For sub-sampleswith equal Ns, you can use a simple Confidence Interval around theoverall mean.Second - Almost nobody actually, ever compares "medians".  That description is less often accurate than it is an erroneous reference to a test of ranks.  Third - The most "non-parametric" way to put a Confidence Intervalaround the median of a single sample (full sample, here?) is to end up using ranks of scores in the sample to delimit the range.For instance, for a sample of a certain N, the 40th and 60th centilesmight determine the scores to mark the 95% CI.  There is no strongreason to expect that CI to be symmetrical around the median.  Ifyou wanted a "parametric" version of that, I suppose you would use the SD to determine a range.  Do you want to pick out the sampleswhose medians do not fall in that range?-- Rich Ulrich
## Re: testing statistical dfference between medians of a sample and a subsample extracted from the sample

 I too was wondering why you wanted a test comparing medians.  People sometimes assume that the Wilcoxon-Mann-Whitney test (aka Mann-Whitney U) compares medians (as opposed to means).  But that is only true if the two populations being compared are identical apart from a shift in location.  And in that case, the test could be said to be comparing means, medians, or any other percentile point you might choose.   By the way, the WMW is quite sensitive to small differences in variance or skewness in the populations, which can cause it to reject H0 far too often when it is used purely as a test of differences in location.  See for example the nice article by Fagerland & Sandvik (2009). http://www.ncbi.nlm.nih.gov/pubmed/19247980

Rich Ulrich wrote:
"Parametric" and "median" don't usually go together.  See (3) for a possible meaning.  There are other problems. First - There is no such thing as a "proper" test for a sub-sample versus the whole sample that it comes from.  The necessary logic says that you compare a sub-sample to the *rest*  of the sample. You may occasionally see a good presentation that does use the approximate tests of this sort, for convenience and ease, plus a strong desire to accommodate Ns that are unequal.  For sub-samples with equal Ns, you can use a simple Confidence Interval around the overall mean. Second - Almost nobody actually, ever compares "medians".  That description is less often accurate than it is an erroneous reference to a test of ranks.   Third - The most "non-parametric" way to put a Confidence Interval around the median of a single sample (full sample, here?) is to end up using ranks of scores in the sample to delimit the range. For instance, for a sample of a certain N, the 40th and 60th centiles might determine the scores to mark the 95% CI.  There is no strong reason to expect that CI to be symmetrical around the median.  If you wanted a "parametric" version of that, I suppose you would use the SD to determine a range.  Do you want to pick out the samples whose medians do not fall in that range?

-- Bruce Weaver
## Re: testing statistical dfference between medians of a sample and a subsample extracted from the sample

 Hi: 1) Bruce, see also Anna Hart. "Mann-Whitnet test is not just a test of medians: differences in spread can be important" (BMJ 2001;323:391-3). I lost track of a reference that stated the same for Kruskal-Wallis test, I'll try to dig it (to many files in my external hard disk). 2) Maybe vinikalra could compute a 95%CI for the median of the subsample, and check if the full sample median is included within the limits. Best regards, Marta GG
## Re: testing statistical dfference between medians of a sample and a subsample extracted from the sample

 In reply to this post by vini Thanks for your reply. In the light of discussion above, it seems to me that to test the statistical difference between 2 samples' mean would be a better idea and in that case I can go for t-test ( As the data based on field survey, it can 'safely' be considered as normally distributed. ANY COMMENT ?). And as far as the relevance of comparing a full sample and a sub sample is concerned, the idea is to analysie if a particular sub sample (extracted based on certain parameter e.g. age group, education etc.) has the influence on the full sample . However, my question was how to go about it in SPSS(steps?) i.e. comparing full sample and sub sample of the full sample. Any suggestions? regards, vini
## Re: testing statistical dfference between medians of a sample and a subsample extracted from the sample

## Re: testing statistical dfference between medians of a sample and a subsample extracted from the sample

 Perhaps it is just an exercise to understand the properties of such methods, useful to convince ourselves of the theoretical appropriateness of it. :) BEGIN PROGRAM R. # read dataset, suppose it is called column x mydata <- spssdata.GetDataFromSPSS() fullsample <- mydata\$x median.fullsample <- median(fullsample) # loop if you want some kind of bootstraping subsample <- subset( [whatever condition], fullsample) # t.test(subsample,mu=median.fullsample) END PROGRAM.