If I may add my two cents to this thread: the issue of whether

a multiple comparison procedure (either planned comparison or

post hoc test; See Kirk's Experimental Design text for background

on these distinctions) is liberal or conservative requires one to make

a decision about the overall Type I error rate [alpha(overvall)] that

one is willing to accept.

The two stage procedure typically used for the Fisher's LSD

(Least Significant Difference test which is equivalent to doing

all pairwise comparisons or t-tests between means but using

the Mean Square Error from the ANOVA instead of just the

variance in the two groups being compared) uses

(a) one statistical test for ANOVA and (b) a seperate set of

statistical tests for the comparisons.

Statisticians such as Rand Wilcox argue against such two stage

procedures because, technically, one does not have to have a

significant ANOVA to do multiple comparisons (e.g., an ANOVA

is not really necessary for planned comparisons though some

of the values in an ANOVA table facillitates computations).

If one keeps alpha(overall) fixed to some reasonable level, such

as alpha(overall)= 0.05, then all of the multiple comparisons have

their per comparison alpha [i.e., alpha(per comparison)} adjusted

to a lower level or value.

To make things clearer, consider: alpha(overall) can be

calculated with the following formula:

alpha(overall) = (1 - (1 - alpha(per comparison))**K)

where K is equal to the number of comparisons or tests

that one is conducting.

In the LSD framework, each test has alpha(per comparison)=0.05,

that is, the usual alpha level is used for each test. But after 3

statistical tests (e.g., all pairwise comparisons between three means),

the alpha(overall)= 0.14, that is, after doing three t-tests among

the means, the is a probability 0.14 that a Type I error has been

commited. As the number of comparisons increases, the overall

probability that one has commited a Type I also increases until it

quite likely that one or more tests results are Type I errors (this is

also relevant when is checking for significant correlations in a correlation

matrix).

The Bonferroni solution to this situation is to set alpha(overall)=0.05

and divide alpha(overall) by the number of tests (i.e., K from above).

Consider:

corrected alpha(per comparison) = alpha(overall)/K

However, one can change alpha(per comparison) to different values

in the context of planned comparisons.

All multiple comparison procedures keep alpha(overall) = 0.05

but calculate alpha(per comparison) in different ways because they

rely upon different distributions and how many comparisons are

being made. The Scheffe F is the most conservative multiple

comparison procedure because it will have the largest critical value

that an obtained difference is compared against. The Scheffe F can

be use to compare pairs of means or complex combinations of

means (the previous tests are assuming that one is only doing pairwise

comparisons between means).

To make the discussion of multiple comparisons more rational, one

has to adopt a Neyman-Pearson framework that requires one to

specify a specific effect size (e.g., standardized difference betwee

population means) and allows on to identify the Type II error rate

or, equivalently the levels of statistical power (= 1- Type II error rate).

Fisher's LSD can be interpreted in the context of the Neyman-Pearson

framework but Fisher himself did not accept it as a meaningful or

valid statistical framework, consequently, issues of Type II errors

and statistical power are irrelevant if one is really being "old school".

Gerd Gigerenzer has written about this in his articles on the history

of statistics.

But if one is willing to use the Neyman-Pearson framework and

specify a fixed effect size that one wants to detect for a specific level

of power while keeping alpha(overall)= 0.05, then one can ask

which of the different pairwise comparison procedures produces

the smallest critical difference which has to be exceeded by the

obtained difference between sampel means.

I believe that the LSD procedure will produce the smallest critical

difference, the Scheffe F the largest, and all other tests will provide

critical differences between these extremes. In this sense, the LSD

is most liberal because it requires the smallest difference between

means to achieve statistical significant (but at the cost of an increase

alpha(overall)) while the Scheffe is the most conservative because

it will have the largest critical difference (but it will also be the

least powerful).

The SPSS procedures that provide these tests, such as in GLM,

do an odd thing. Instead of telling one what the actual LSD value

or Bonferroni or Tukey or whatever is, it just tells you whether the

observed difference between exceeds this critical difference (i.e.,

it is identified as statistically significant) or not. Formulas for

hand calculation of the actual difference is provided in a number

of sources such as Kirk's Experimental Design text and, if memory

serves, the Glass & Hopkins Stat Methods in Ed & Psych.

If one has a dataset where a one-way ANOVA is appropriate,

do the various multiple comparisons and see which results are

singificant by LSD but become nonsignificant with other tests.

If the difference/effect size is really different from zero, then

the tests that are nonsignificant are actually Type II errors.

Increasing the sample size (thereby increasing statistical power)

will usually change these test results from nonsignificant to

significant.

In summary, in defining whether multiple comparison is liberal

or conservative depends upon a number of factors but what is

critical is (1) keeping alpha(overall) equal to some specified

value such 0.05, (2) adjusting the alpha(per comparison) appropriately,

and (3) identifying what the critical difference is that a difference

between obtained sample means have to exceed to claim that

the difference is greater than zero (i.e., the two means are different

from each other).

I'll shut up now.

-Mike Palij

New York University

[hidden email]
----- Original Message -----

From: "Bruce Weaver" <

[hidden email]>

To: <

[hidden email]>

Sent: Wednesday, December 09, 2009 11:18 AM

Subject: Re: Liberal and conservative?

> Steve Simon, P.Mean Consulting wrote:

>>

>> Eins Bernardo wrote:

>>

>>> LSD is considered as Liberal of the Post Hoc Test in ANOVA, while the

>>> Duncan is more conservative thanthe LSD. Can someone

>>> differentiate/contrast between liberal and conservative in

>>> statistical context?

>>

>> A liberal test has a Type I error rate (or alpha level) that is larger

>> than the stated value. So a test that claims to have a Type I error rate

>> of 0.05 might actually have a rate of 0.08 or 0.13. A test can become

>> liberal if you fail to properly adjust for multiple comparisons or if

>> you allow early stopping at several points during the trial without

>> appropriate adjustments. Sometimes failure to meet the underlying

>> assumptions of a statistical test can produce a liberal result.

>>

>> A conservative test has a Type I error rate that is smaller than the

>> stated values. So a test that claims to have a Type I error rate of 0.05

>> might actually have a rate of 0.03 or 0.01. Some adjustments for

>> multiple comparisons can produce conservative tests. Also failure to

>> meet the underlying assumptions of a statistical test can produce a

>> conservative result.

>>

>> The research community generally shuns liberal tests, but do keep in

>> mind that a conservative test often suffers from loss of power and

>> (equivalently) an increase in the Type II error rate.

>> --

>> Steve Simon, Standard Disclaimer

>>

>

> Very nicely stated, Steve.

>

> Going back to the original post, let me add that Fisher's LSD is therefore

> neither liberal nor conservative when there are exactly 3 groups. In that

> situation, the family-wise alpha is maintained at exactly the same level as

> the per-contrast alpha. And so, given it's greater power, Fisher's LSD

> ought to be used a lot more than it is WHEN there are 3 groups.

>

> If anyone needs references to support the use of Fisher's LSD with 3 groups,

> here are two.

>

> Howell, DC. Statistical Methods for Psychology (various editions &

> years, chapter on multiple comparison procedures).

>

> Meier U. A note on the power of Fisher’s least significant difference

> procedure. Pharmaceut. Statist. 2006; 5: 253–263.

>

> Here is the abastract from Meier's article.

>

> Fisher's least significant difference (LSD) procedure is a two-step testing

> procedure for pairwise comparisons of several treatment groups. In the first

> step of the procedure, a global test is performed

> for the null hypothesis that the expected means of all treatment groups

> under study are equal. If this global null hypothesis can be rejected at the

> pre-specified level of significance, then in the second step of the

> procedure, one is permitted in principle to perform all pairwise comparisons

> at the same level of significance (although in practice, not all of them may

> be of primary interest). Fisher's LSD procedure is known to preserve the

> experimentwise type I error rate at the nominal level of significance, if

> (and only if) the number of treatment groups is three. The procedure may

> therefore be applied to phase III clinical trials comparing two doses of an

> active treatment against placebo in the confirmatory sense (while in this

> case, no confirmatory comparison has to be performed between the two active

> treatment groups). The power properties of this approach are examined in the

> present paper. It is shown that the power of the first step global test -

> and therefore the power of the overall procedure - may be relevantly lower

> than the power of the pairwise comparison between the more-favourable active

> dose group and placebo. Achieving a certain overall power for this

> comparison with Fisher's LSD procedure - irrespective of the effect size at

> the less-favourable dose group - may require slightly larger treatment

> groups than sizing the study with respect to the simple Bonferroni alpha

> adjustment. Therefore if Fisher's LSD procedure is used to avoid an alpha

> adjustment for phase III clinical trials, the potential loss of power due to

> the first-step global test should be considered at the planning stage.

> Copyright © 2006 John Wiley & Sons, Ltd.

>

>

> -----

> --

> Bruce Weaver

>

[hidden email]
>

http://sites.google.com/a/lakeheadu.ca/bweaver/> "When all else fails, RTFM."

>

> NOTE: My Hotmail account is not monitored regularly.

> To send me an e-mail, please use the address shown above.

> --

> View this message in context:

http://old.nabble.com/Liberal-and-conservative--tp26710095p26712988.html> Sent from the SPSSX Discussion mailing list archive at Nabble.com.

>

> =====================

> To manage your subscription to SPSSX-L, send a message to

>

[hidden email] (not to SPSSX-L), with no body text except the

> command. To leave the list, send the command

> SIGNOFF SPSSX-L

> For a list of commands to manage subscriptions, send the command

> INFO REFCARD

=====================

To manage your subscription to SPSSX-L, send a message to

[hidden email] (not to SPSSX-L), with no body text except the

command. To leave the list, send the command

SIGNOFF SPSSX-L

For a list of commands to manage subscriptions, send the command

INFO REFCARD