Hi everyone.

An alternative way to determine your sample size, which is much more

reliable than any rule of thumbs (20 participants per group, 10 observations

per parameter, and so on) is to perform a Power Analysis.

Power can be defined as the probability to reject H0 (your null hypothesis)

when it is actually false. In other word, it is the probability to find an

existing effect. Power depends on three factors : your decision thresold

(namely alpha, much of the time alpha is .05, in other words, a 95%

confidence interval), the effect size, and the sample size.

If you know (or if you are able to approximate) the effect size, and you

choose a particular alpha-level, you will then be able to perform a power

analysis in order to determine a recommended sample size given a desired

power. Cohen (1988) recommends a power of .80.

Let's take an example. Imagine you made an experiment with 2 experimental

groups. Each of these groups had 20 participants (classical rule of thumbs :

20 participants per group). The effect you are looking for is expected to

have a medium effect size (Cohen's d = .50, which is equivalent to a .243

effect size correlation), your decision thresold is alpha = .05 (which means

that if your statistical index is below a certain value corresponding to

this thresold, you will not reject H0). Given these informations, power can

be calculated. Your 20-participants-per-group means comparison will have a

power of .46.

If you wish to have a .80 power, your experiment must have about 50

participants per group.

This reflexion concerning power is very very close to Hector's answer.

If you are looking for more informations concerning power, you may read the

following :

Cohen, J. (1988). statistical power analysis for the behavioral sciences

(2nd ed.). Hillsdale, NJ: Erlbaum.

Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159.

Regards.

Fabrice.

****************************

Fabrice Gabarrot

PhD Student - Social Psychology

University of Geneva - Switzerland

On Tue, 8 Aug 2006 16:01:44 -0300, Hector Maletta <

[hidden email]>

wrote:

>Jeff,

>

>When one value of a dichotomy has a low proportion, the variance of the

>variable is LOWER, but it represents a higher proportion of the observed

>proportion. We can say the dichotomous variable with the lower proportion

>has a lower standard deviation but a higher coefficient of variation. For a

>dichotomy, variance is p(1-p), which has a maximum value of 0.25 for p=0.5

>and decreases steadily as p decreases or increases away from 0.5. For p=0.1

>or 0.9 variance is 0.1 x 0.9=0.09. Now, the standard error of the estimate

>of this proportion is the square root of p(1-p)/n, where n is sample size.

>Suppose your total sample is n=100. If p=0.5 the standard error is the

>square root of 0.25/100=0.05. An approximate confidence interval of two STD

>errors would be +/- 0.10 around the estimate, i.e. from 0.4 to 0.6. You

>cannot be sure whether any of the alternatives is in the majority, but at

>least you are pretty certain that none is zero. Now if the observed

>proportion of one of the alternatives was p=0.10, the standard error of the

>variable would be the square root of 0.09/100=0.03. This is lower than the

>previous case in absolute terms (0.03 < 0.05) but larger in relation to the

>proportion (0.03/0.10 > 0.05/0.50). An approximate confidence interval of

>two standard errors would be 0.10 +/- 0.06 going from 0.04 to 0.16. This

>interval does not contain the zero value. With this sample size (100) you

>can therefore be approx 95% confident that if the sample proportion is 0.10

>the population proportion is larger than zero. But with a lower total sample

>(say with n=50 or n=25) you will probably not (you work it out as an

>exercise). So the minimum sample needed depends on the size of the

>proportion (0.10 in this example) and the level of confidence desired (95%

>in this example). If your sample is not sufficient for 95% confidence, try

>90% confidence. It is riskier, of course, but such are the perils of

>statistics. There are also some guys around willing to go for less than 90%,

>but don't try it at home. It's too dangerous.

>

>

>

>Hector

>

>

>

> _____

>

>De: SPSSX(r) Discussion [mailto:

[hidden email]] En nombre de Stats

>Q

>Enviado el: Tuesday, August 08, 2006 10:15 AM

>Para:

[hidden email]
>Asunto: Re: Multiple Regression & Interactions

>

>

>

>Hi Jeff,

>

>Re: the first issue. There are guidelines regarding sample size requirements

>for multiple regression, but I couldn't find any guidelines regarding

>whether dichotomous predictors have to have a certain amount of cases per

>group. I know SPSS will run the analysis anyway, but I imagine results

>aren't reliable unless there are say more than 20 cases per group. As you

>say, "you may get an estimate for the regression coefficient, but the p

>value will be high".

>

>I follow what you're saying in your example about interactions. I thought

>there would be a simple way to look at it :-)

>

>Thank you for your help Jeff.

>

>

>K S Scot

>

>

> _____

>

>

>From: Jeff <

[hidden email]>

>Reply-To: Jeff <

[hidden email]>

>To:

[hidden email]
>Subject: Re: Multiple Regression & Interactions

>Date: Mon, 7 Aug 2006 09:35:37 -0600

>>At 06:01 AM 8/7/2006, you wrote:

>>>Does anyone know whether there are sample size requirements for

>>>dichotomous

>>>predictors in multiple regression? That is, for the dichotomous

>>>predictor,

>>>what is the smallest number of cases per group that is allowed?

>>>

>>>Also, I have another query regarding interpreting interaction

>>>effects in

>>>SPSS's multiple regression. When a cross product term is created

>>>by

>>>multiplying together a predictor which is positively associated

>>>with the DV

>>>(e.g., happiness) and a predictor that is negatively associated

>>>with the DV

>>>(e.g., depression), would the resulting product term be expected to

>>>show a

>>>positive or negative beta coefficient? I'm sure there's a really

>>>simple

>>>answer to this.

>>>

>>>Thank you in advance.

>>>K S Scot

>>

>>

>>Regarding the first issue - I'm not sure I understand -

>>mathematically, the

>>number of cases doesn't matter as long as it isn't a constant -

>>e.g., all

>>cases are in the same group. Practically, if you have a small number

>>of

>>cases in one group, you won't be able to accurately examine the

>>group

>>differences - i.e., you may get an estimate for the regression

>>coefficient,

>>but the p value will be high.

>>

>>Regarding the second issue - the sign of the bivariate correlations

>>doesn't

>>really matter. What matters is whether there is an interaction

>>between the

>>effects (by definition). In other words, let's say Happy-days/month

>>is

>>positively related to amount of time spent outside of house/month,

>>while

>>depression/month is negatively related. A significant interaction,

>>for

>>example, might imply that if there are many depressed days/month,

>>the

>>desire to go outside during happy days is reduced.

>>

>>

>>

>>

>>

>>

>>Jeff

>

>

>

>

> _____

>

>Hotmail is evolving - be one of the first to try out the Windows

><

http://g.msn.com/8HMBENUK/2734??PS=47575> LiveT Mail Beta