Liberal and conservative?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Liberal and conservative?

E. Bernardo
LSD is considered as Liberal of the Post Hoc Test in ANOVA, while the Duncan is more conservative thanthe LSD.  Can someone differentiate/contrast between liberal and conservative in statistical context?
 
Eins


Yahoo! Mail Now Faster and Cleaner. Experience it today!
Reply | Threaded
Open this post in threaded view
|

Re: Liberal and conservative?

Tim Daciuk

I have thought of a liberal test as one that is more likely to find statistical significance (even where it does not truly exist), more likely to make a Type I error and less prone to Type II errors.  A liberal test has more power.  A conservative test is less likely to find statistical significance (even where it does truly exist), less likely to make Type I errors and more prone to Type II errors.  Conservative tests have less power.

There are many definitions out there, but, my sense is that most are variations on the above themes.


Tim Daciuk
Director, Demo Team
SPSS, an IBM Company
Phone - 1-416-265-9789
Cell - 1-426-996-9789



From: Eins Bernardo <[hidden email]>
To: [hidden email]
Date: 12/09/2009 08:32 AM
Subject: Liberal and conservative?
Sent by: "SPSSX(r) Discussion" <[hidden email]>





LSD is considered as Liberal of the Post Hoc Test in ANOVA, while the Duncan is more conservative thanthe LSD.  Can someone differentiate/contrast between liberal and conservative in statistical context?
 
Eins



Yahoo! Mail Now Faster and Cleaner. Experience it today!

Reply | Threaded
Open this post in threaded view
|

Re: Liberal and conservative?

Martin Holt
In reply to this post by E. Bernardo
Hi Eins,
 
Please follow the attached link.....it's a long one.
 
 
It quotes Robert P. Abelson, "Statistics as Principled Argument",, which I highly recommend.
 
HTH,
Martin Holt
----- Original Message -----
Sent: Wednesday, December 09, 2009 1:22 PM
Subject: Liberal and conservative?

LSD is considered as Liberal of the Post Hoc Test in ANOVA, while the Duncan is more conservative thanthe LSD.  Can someone differentiate/contrast between liberal and conservative in statistical context?
 
Eins


Yahoo! Mail Now Faster and Cleaner. Experience it today!
Reply | Threaded
Open this post in threaded view
|

Re: Liberal and conservative?

Steve Simon, P.Mean Consulting
In reply to this post by E. Bernardo
Eins Bernardo wrote:

> LSD is considered as Liberal of the Post Hoc Test in ANOVA, while the
> Duncan is more conservative thanthe LSD.  Can someone
> differentiate/contrast between liberal and conservative in
> statistical context?

A liberal test has a Type I error rate (or alpha level) that is larger
than the stated value. So a test that claims to have a Type I error rate
of 0.05 might actually have a rate of 0.08 or 0.13. A test can become
liberal if you fail to properly adjust for multiple comparisons or if
you allow early stopping at several points during the trial without
appropriate adjustments. Sometimes failure to meet the underlying
assumptions of a statistical test can produce a liberal result.

A conservative test has a Type I error rate that is smaller than the
stated values. So a test that claims to have a Type I error rate of 0.05
might actually have a rate of 0.03 or 0.01. Some adjustments for
multiple comparisons can produce conservative tests. Also failure to
meet the underlying assumptions of a statistical test can produce a
conservative result.

The research community generally shuns liberal tests, but do keep in
mind that a conservative test often suffers from loss of power and
(equivalently) an increase in the Type II error rate.
--
Steve Simon, Standard Disclaimer
Two free webinars coming soon!
"What do all these numbers mean? Odds ratios,
relative risks, and number needed to treat"
Thursday, December 17, 2009, 11am-noon, CST.
"The first three steps in a descriptive
data analysis, with examples in PASW/SPSS"
Thursday, January 21, 2010, 11am-noon, CST.
Details at www.pmean.com/webinars

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Liberal and conservative?

Bruce Weaver
Administrator
Steve Simon, P.Mean Consulting wrote
Eins Bernardo wrote:

> LSD is considered as Liberal of the Post Hoc Test in ANOVA, while the
> Duncan is more conservative thanthe LSD.  Can someone
> differentiate/contrast between liberal and conservative in
> statistical context?

A liberal test has a Type I error rate (or alpha level) that is larger
than the stated value. So a test that claims to have a Type I error rate
of 0.05 might actually have a rate of 0.08 or 0.13. A test can become
liberal if you fail to properly adjust for multiple comparisons or if
you allow early stopping at several points during the trial without
appropriate adjustments. Sometimes failure to meet the underlying
assumptions of a statistical test can produce a liberal result.

A conservative test has a Type I error rate that is smaller than the
stated values. So a test that claims to have a Type I error rate of 0.05
might actually have a rate of 0.03 or 0.01. Some adjustments for
multiple comparisons can produce conservative tests. Also failure to
meet the underlying assumptions of a statistical test can produce a
conservative result.

The research community generally shuns liberal tests, but do keep in
mind that a conservative test often suffers from loss of power and
(equivalently) an increase in the Type II error rate.
--
Steve Simon, Standard Disclaimer
Very nicely stated, Steve.  

Going back to the original post, let me add that Fisher's LSD is therefore neither liberal nor conservative when there are exactly 3 groups.  In that situation, the family-wise alpha is maintained at exactly the same level as the per-contrast alpha.  And so, given it's greater power, Fisher's LSD ought to be used a lot more than it is WHEN there are 3 groups.  

If anyone needs references to support the use of Fisher's LSD with 3 groups, here are two.

Howell, DC. Statistical Methods for Psychology (various editions &
years, chapter on multiple comparison procedures).

Meier U. A note on the power of Fisher’s least significant difference
procedure. Pharmaceut. Statist. 2006; 5: 253–263.

Here is the abastract from Meier's article.

Fisher's least significant difference (LSD) procedure is a two-step testing procedure for pairwise comparisons of several treatment groups. In the first step of the procedure, a global test is performed
for the null hypothesis that the expected means of all treatment groups under study are equal. If this global null hypothesis can be rejected at the pre-specified level of significance, then in the second step of the procedure, one is permitted in principle to perform all pairwise comparisons at the same level of significance (although in practice, not all of them may be of primary interest). Fisher's LSD procedure is known to preserve the experimentwise type I error rate at the nominal level of significance, if (and only if) the number of treatment groups is three. The procedure may therefore be applied to phase III clinical trials comparing two doses of an active treatment against placebo in the confirmatory sense (while in this case, no confirmatory comparison has to be performed between the two active treatment groups). The power properties of this approach are examined in the present paper. It is shown that the power of the first step global test - and therefore the power of the overall procedure - may be relevantly lower than the power of the pairwise comparison between the more-favourable active dose group and placebo. Achieving a certain overall power for this comparison with Fisher's LSD procedure - irrespective of the effect size at the less-favourable dose group - may require slightly larger treatment groups than sizing the study with respect to the simple Bonferroni alpha adjustment. Therefore if Fisher's LSD procedure is used to avoid an alpha adjustment for phase III clinical trials, the potential loss of power due to the first-step global test should be considered at the planning stage.
Copyright © 2006 John Wiley & Sons, Ltd.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
Reply | Threaded
Open this post in threaded view
|

Re: Liberal and conservative?

Mike
If I may add my two cents to this thread:  the issue of whether
a multiple comparison procedure (either planned comparison or
post hoc test;  See Kirk's Experimental Design text for background
on these distinctions) is liberal or conservative requires one to make
a decision about the overall Type I error rate [alpha(overvall)] that
one is willing to accept.

The two stage procedure typically used for the Fisher's LSD
(Least Significant Difference test which is equivalent to doing
all pairwise comparisons or t-tests between means but using
the Mean Square Error from the ANOVA instead of just the
variance in the two groups being compared) uses
(a) one statistical test for ANOVA and (b) a seperate set of
statistical tests for the comparisons.

Statisticians such as Rand Wilcox argue against such two stage
procedures because, technically, one does not have to have a
significant ANOVA to do multiple comparisons (e.g., an ANOVA
is not really necessary for planned comparisons though some
of the values in an ANOVA table facillitates computations).
If one keeps alpha(overall) fixed to some reasonable level, such
as alpha(overall)= 0.05, then all of the multiple comparisons have
their per comparison alpha [i.e., alpha(per comparison)} adjusted
to a lower level or value.

To make things clearer, consider:  alpha(overall) can be
calculated with the following formula:

alpha(overall) = (1 - (1 - alpha(per comparison))**K)

where K is equal to the number of comparisons or tests
that one is conducting.

In the LSD framework, each test has alpha(per comparison)=0.05,
that is, the usual alpha level is used for each test.  But after 3
statistical tests (e.g., all pairwise comparisons between three means),
the alpha(overall)= 0.14, that is, after doing three t-tests among
the means, the is a probability 0.14 that a Type I error has been
commited.  As the number of comparisons increases, the overall
probability that one has commited a Type I also increases until it
quite likely that one or more tests results are Type I errors (this is
also relevant when is checking for significant correlations in a correlation
matrix).

The Bonferroni solution to this situation is to set alpha(overall)=0.05
and divide alpha(overall) by the number of tests (i.e., K from above).
Consider:

corrected alpha(per comparison) = alpha(overall)/K

However, one can change alpha(per comparison) to different values
in the context of planned comparisons.

All multiple comparison procedures keep alpha(overall) = 0.05
but calculate alpha(per comparison) in different ways because they
rely upon different distributions and how many comparisons are
being made.  The Scheffe F is the most conservative multiple
comparison procedure because it will have the largest critical value
that an obtained difference is compared against.  The Scheffe F can
be use to compare pairs of means or complex combinations of
means (the previous tests are assuming that one is only doing pairwise
comparisons between means).

To make the discussion of multiple comparisons more rational, one
has to adopt a Neyman-Pearson framework that requires one to
specify a specific effect size (e.g., standardized difference betwee
population means) and allows on to identify the Type II error rate
or, equivalently the levels of statistical power (= 1- Type II error rate).

Fisher's LSD can be interpreted in the context of the Neyman-Pearson
framework but Fisher himself did not accept it as a meaningful or
valid statistical framework, consequently, issues of Type II errors
and statistical power are irrelevant if one is really being "old school".
Gerd Gigerenzer has written about this in his articles on the history
of statistics.

But if one is willing to use the Neyman-Pearson framework and
specify a fixed effect size that one wants to detect for a specific level
of power while keeping alpha(overall)= 0.05, then one can ask
which of the different pairwise comparison procedures produces
the smallest critical difference which has to be exceeded by the
obtained difference between sampel means.

I believe that the LSD procedure will produce the smallest critical
difference, the Scheffe F the largest, and all other tests will provide
critical differences between these extremes.  In this sense, the LSD
is most liberal because it requires the smallest difference between
means to achieve statistical significant (but at the cost of an increase
alpha(overall)) while the Scheffe is the most conservative because
it will have the largest critical difference (but it will also be the
least powerful).

The SPSS procedures that provide these tests, such as in GLM,
do an odd thing.  Instead of telling one what the actual LSD value
or Bonferroni or Tukey or whatever is, it just tells you whether the
observed difference between exceeds this critical difference (i.e.,
it is identified as statistically significant) or not.  Formulas for
hand calculation of the actual difference is provided in a number
of sources such as Kirk's Experimental Design text and, if memory
serves, the Glass & Hopkins Stat Methods in Ed & Psych.
If one has a dataset where a one-way ANOVA is appropriate,
do the various multiple comparisons and see which results are
singificant by LSD but become nonsignificant with other tests.
If the difference/effect size is really different from zero, then
the tests that are nonsignificant are actually Type II errors.
Increasing the sample size (thereby increasing statistical power)
will usually change these test results from nonsignificant to
significant.

In summary, in defining whether multiple comparison is liberal
or conservative depends upon a number of factors but what is
critical is (1) keeping alpha(overall) equal to some specified
value such 0.05, (2) adjusting the alpha(per comparison) appropriately,
and (3) identifying what the critical difference is that a difference
between obtained sample means have to exceed to claim that
the difference is greater than zero (i.e., the two means are different
from each other).

I'll shut up now.

-Mike Palij
New York University
[hidden email]





----- Original Message -----
From: "Bruce Weaver" <[hidden email]>
To: <[hidden email]>
Sent: Wednesday, December 09, 2009 11:18 AM
Subject: Re: Liberal and conservative?


> Steve Simon, P.Mean Consulting wrote:
>>
>> Eins Bernardo wrote:
>>
>>> LSD is considered as Liberal of the Post Hoc Test in ANOVA, while the
>>> Duncan is more conservative thanthe LSD.  Can someone
>>> differentiate/contrast between liberal and conservative in
>>> statistical context?
>>
>> A liberal test has a Type I error rate (or alpha level) that is larger
>> than the stated value. So a test that claims to have a Type I error rate
>> of 0.05 might actually have a rate of 0.08 or 0.13. A test can become
>> liberal if you fail to properly adjust for multiple comparisons or if
>> you allow early stopping at several points during the trial without
>> appropriate adjustments. Sometimes failure to meet the underlying
>> assumptions of a statistical test can produce a liberal result.
>>
>> A conservative test has a Type I error rate that is smaller than the
>> stated values. So a test that claims to have a Type I error rate of 0.05
>> might actually have a rate of 0.03 or 0.01. Some adjustments for
>> multiple comparisons can produce conservative tests. Also failure to
>> meet the underlying assumptions of a statistical test can produce a
>> conservative result.
>>
>> The research community generally shuns liberal tests, but do keep in
>> mind that a conservative test often suffers from loss of power and
>> (equivalently) an increase in the Type II error rate.
>> --
>> Steve Simon, Standard Disclaimer
>>
>
> Very nicely stated, Steve.
>
> Going back to the original post, let me add that Fisher's LSD is therefore
> neither liberal nor conservative when there are exactly 3 groups.  In that
> situation, the family-wise alpha is maintained at exactly the same level as
> the per-contrast alpha.  And so, given it's greater power, Fisher's LSD
> ought to be used a lot more than it is WHEN there are 3 groups.
>
> If anyone needs references to support the use of Fisher's LSD with 3 groups,
> here are two.
>
> Howell, DC. Statistical Methods for Psychology (various editions &
> years, chapter on multiple comparison procedures).
>
> Meier U. A note on the power of Fisher’s least significant difference
> procedure. Pharmaceut. Statist. 2006; 5: 253–263.
>
> Here is the abastract from Meier's article.
>
> Fisher's least significant difference (LSD) procedure is a two-step testing
> procedure for pairwise comparisons of several treatment groups. In the first
> step of the procedure, a global test is performed
> for the null hypothesis that the expected means of all treatment groups
> under study are equal. If this global null hypothesis can be rejected at the
> pre-specified level of significance, then in the second step of the
> procedure, one is permitted in principle to perform all pairwise comparisons
> at the same level of significance (although in practice, not all of them may
> be of primary interest). Fisher's LSD procedure is known to preserve the
> experimentwise type I error rate at the nominal level of significance, if
> (and only if) the number of treatment groups is three. The procedure may
> therefore be applied to phase III clinical trials comparing two doses of an
> active treatment against placebo in the confirmatory sense (while in this
> case, no confirmatory comparison has to be performed between the two active
> treatment groups). The power properties of this approach are examined in the
> present paper. It is shown that the power of the first step global test -
> and therefore the power of the overall procedure - may be relevantly lower
> than the power of the pairwise comparison between the more-favourable active
> dose group and placebo. Achieving a certain overall power for this
> comparison with Fisher's LSD procedure - irrespective of the effect size at
> the less-favourable dose group - may require slightly larger treatment
> groups than sizing the study with respect to the simple Bonferroni alpha
> adjustment. Therefore if Fisher's LSD procedure is used to avoid an alpha
> adjustment for phase III clinical trials, the potential loss of power due to
> the first-step global test should be considered at the planning stage.
> Copyright © 2006 John Wiley & Sons, Ltd.
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
> "When all else fails, RTFM."
>
> NOTE:  My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
> --
> View this message in context: http://old.nabble.com/Liberal-and-conservative--tp26710095p26712988.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Liberal and conservative?

Bruce Weaver
Administrator
Mike Palij wrote
If I may add my two cents to this thread:  the issue of whether
a multiple comparison procedure (either planned comparison or
post hoc test;  See Kirk's Experimental Design text for background
on these distinctions) is liberal or conservative requires one to make
a decision about the overall Type I error rate [alpha(overvall)] that
one is willing to accept.

The two stage procedure typically used for the Fisher's LSD
(Least Significant Difference test which is equivalent to doing
all pairwise comparisons or t-tests between means but using
the Mean Square Error from the ANOVA instead of just the
variance in the two groups being compared) uses
(a) one statistical test for ANOVA and (b) a seperate set of
statistical tests for the comparisons.
But it is important to note that one only proceeds to b if the ANOVA at step a was significant.  

Mike Palij wrote
Statisticians such as Rand Wilcox argue against such two stage
procedures because, technically, one does not have to have a
significant ANOVA to do multiple comparisons (e.g., an ANOVA
is not really necessary for planned comparisons though some
of the values in an ANOVA table facillitates computations).
If one keeps alpha(overall) fixed to some reasonable level, such
as alpha(overall)= 0.05, then all of the multiple comparisons have
their per comparison alpha [i.e., alpha(per comparison)} adjusted
to a lower level or value.
I agree that a significant omnibus F-test is not necessary for most multiple comparison procedures, despite what some of us might have been taught back in the day.  But a significant omnibus F-test is required before one proceeds to the pair-wise tests when doing Fisher's LSD.  

Here is the meat of Dave Howell's argument about why Fisher's LSD controls the family-wise alpha at the per-contrast alpha level when there are 3 groups.

1. When the complete null hypothesis is true (i.e., all 3 population means are equal), "the requirement for a significant overall F [before proceeding to the pairwise tests] ensures that the familywise error rate will equal alpha" (6th Ed., p.368).

2. If two of the population means are equal, but different from the 3rd, then there is only one opportunity for a Type I error to occur--i.e., the test that compares samples from the two populations with equal means.

3. If the 3 population means are all different, then there is no opportunity for a Type I error to occur.  

Mike Palij wrote
To make things clearer, consider:  alpha(overall) can be
calculated with the following formula:

alpha(overall) = (1 - (1 - alpha(per comparison))**K)

where K is equal to the number of comparisons or tests
that one is conducting.
That formula applies when the contrasts are all mutually independent.  But that's not the case for all pair-wise contrasts for a set of means.  You're better off using the Bonferroni approximation, I think.  

Mike Palij wrote
In the LSD framework, each test has alpha(per comparison)=0.05,
that is, the usual alpha level is used for each test.  But after 3
statistical tests (e.g., all pairwise comparisons between three means),
the alpha(overall)= 0.14, that is, after doing three t-tests among
the means, the is a probability 0.14 that a Type I error has been
commited.  As the number of comparisons increases, the overall
probability that one has commited a Type I also increases until it
quite likely that one or more tests results are Type I errors (this is
also relevant when is checking for significant correlations in a correlation
matrix).
I think your argument might be right here if one proceeded with the pair-wise tests regardless of whether the omnibus F-test was significant or not.  But remember that you only get to the pair-wise tests if you first reject the null for the overall F-test.  Howell argues that this is what provides the "protection" for Fisher's protected t-tests as they are sometimes called.

Mike Palij wrote
The Bonferroni solution to this situation is to set alpha(overall)=0.05
and divide alpha(overall) by the number of tests (i.e., K from above).
Consider:

corrected alpha(per comparison) = alpha(overall)/K

However, one can change alpha(per comparison) to different values
in the context of planned comparisons.

All multiple comparison procedures keep alpha(overall) = 0.05
but calculate alpha(per comparison) in different ways because they
rely upon different distributions and how many comparisons are
being made.  The Scheffe F is the most conservative multiple
comparison procedure because it will have the largest critical value
that an obtained difference is compared against.  The Scheffe F can
be use to compare pairs of means or complex combinations of
means (the previous tests are assuming that one is only doing pairwise
comparisons between means).

To make the discussion of multiple comparisons more rational, one
has to adopt a Neyman-Pearson framework that requires one to
specify a specific effect size (e.g., standardized difference betwee
population means) and allows on to identify the Type II error rate
or, equivalently the levels of statistical power (= 1- Type II error rate).

Fisher's LSD can be interpreted in the context of the Neyman-Pearson
framework but Fisher himself did not accept it as a meaningful or
valid statistical framework, consequently, issues of Type II errors
and statistical power are irrelevant if one is really being "old school".
Gerd Gigerenzer has written about this in his articles on the history
of statistics.

But if one is willing to use the Neyman-Pearson framework and
specify a fixed effect size that one wants to detect for a specific level
of power while keeping alpha(overall)= 0.05, then one can ask
which of the different pairwise comparison procedures produces
the smallest critical difference which has to be exceeded by the
obtained difference between sampel means.

I believe that the LSD procedure will produce the smallest critical
difference, the Scheffe F the largest, and all other tests will provide
critical differences between these extremes.  In this sense, the LSD
is most liberal because it requires the smallest difference between
means to achieve statistical significant (but at the cost of an increase
alpha(overall)) while the Scheffe is the most conservative because
it will have the largest critical difference (but it will also be the
least powerful).

The SPSS procedures that provide these tests, such as in GLM,
do an odd thing.  Instead of telling one what the actual LSD value
or Bonferroni or Tukey or whatever is, it just tells you whether the
observed difference between exceeds this critical difference (i.e.,
it is identified as statistically significant) or not.  Formulas for
hand calculation of the actual difference is provided in a number
of sources such as Kirk's Experimental Design text and, if memory
serves, the Glass & Hopkins Stat Methods in Ed & Psych.
If one has a dataset where a one-way ANOVA is appropriate,
do the various multiple comparisons and see which results are
singificant by LSD but become nonsignificant with other tests.
If the difference/effect size is really different from zero, then
the tests that are nonsignificant are actually Type II errors.
Increasing the sample size (thereby increasing statistical power)
will usually change these test results from nonsignificant to
significant.

In summary, in defining whether multiple comparison is liberal
or conservative depends upon a number of factors but what is
critical is (1) keeping alpha(overall) equal to some specified
value such 0.05, (2) adjusting the alpha(per comparison) appropriately,
and (3) identifying what the critical difference is that a difference
between obtained sample means have to exceed to claim that
the difference is greater than zero (i.e., the two means are different
from each other).

I'll shut up now.
Me too.  ;-)

Cheers,
Bruce
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.