Marta, just a quick clarification. The first values listed below from your program were obtained with the seed you specified. The following 10 were based on a random seed. So the M and SD were based on 11 runs, not 10. Hope that wasn't confusing.
Greg
 Original Message
 From: Meyer, Gregory J
 Sent: Monday, June 19, 2006 6:40 PM
 To: 'Marta GarcíaGranero'; '
[hidden email]'
 Subject: RE: Re: Confidence Interval for Rsquare: R2.exe
 (Steiger&Fouladi) vs Bootstrapping

 Hi again Marta,

 Interesting problem. I tinkered with your syntax and replaced
 the SET SEED=29147290 command with SET SEED=Random. Then I
 ran 10 iterations of your bootstrap. Here's what I found.

 As you reported, using the R2.exe program, the 95% CI limits
 should be:
 Lower: Upper:
 .3303 .8879

 What we get from your syntax is:
 Lower: Upper:
 .3306 .8861
 .3377 .8906
 .3358 .8888
 .3412 .8875
 .3356 .8911
 .3169 .8898
 .3288 .8886
 .3316 .8867
 .3396 .8809
 .3363 .8861
 .3391 .8885

 Ave:
 .3339 .8877
 SD:
 .0069 .0028

 It seems that what appeared to be fixed deviations from the
 R2.exe program in your bootstrap was really the consequence
 of having your random number seed fixed at a specific value.
 Using a fluctuating random seed it becomes clear that despite
 the large bootstrap samples, there's still a good amount of
 variability in the CI estimates from one run to the next. The
 R2.exe findings are within less than one SD of the mean from
 the 10 estimates from your bootstrap so it looks like both
 programs are targeting the same parameters.

 The Steiger & Fouladi program allows 2 options for computing
 the CI; one quicker and less exact and the other slower but
 more precise. If the R2.exe results are based on the latter,
 perhaps it would be useful to increase your bootstrap
 parameters to something like h = 100 and k = 10000. On the
 other hand, it may be that your program is more precise and
 theirs more variable. I don't recall the parameters they used
 to generate their estimates. Hope this helps.

 Greg

  Original Message
  From: SPSSX(r) Discussion [mailto:
[hidden email]]
  On Behalf Of Marta GarcíaGranero
  Sent: Monday, June 19, 2006 12:28 PM
  To:
[hidden email]
  Subject: Re: Confidence Interval for Rsquare: R2.exe
  (Steiger&Fouladi) vs Bootstrapping
 
  Hi Gregory,
 
  Although it is true I had forgotten to bootstrap adjusted R2
  (Rhosquare) instead of sample R2 (thanks!), this doesn't
 explain the
  discrepancies between both methods of estimating CI for
 Rhosquare. I
  still have to modify the percentyles (1.2 instead of 2.5 &
 98 instead
  of 97.5) to get results more consistent with those R2.exe gives (see
  below). Therefore, my question is still the same: should I
  just indicate
  the differences or adjust the percentyles?...
 
  * Sample dataset (from 'Statistics at Square One', BMJ
  online book) *.
  DATA LIST FREE/height deadsp (2 F8.0).
  BEGIN DATA
  110 44 116 31 124 43 129 45 131 56
  138 79 142 57 150 56 153 58 155 92
  156 78 159 64 164 88 168 112 174 101
  END DATA.
  VARIABLE LABEL height 'Height (cm)' /deadsp 'Dead space (ml)'.
 
  REGRESSION
  /STATISTICS COEFF OUTS R ANOVA
  /DEPENDENT deadsp
  /METHOD=ENTER height .
 
  * Using R2.exe: 95%CI Lower Limit: 0.33029; Upper Limit: 0.88794 *.
 
  PRESERVE.
  SET SEED=29147290.
  SET MXLOOPS=1000.
  MATRIX.
  PRINT /TITLE='Confidence Interval for Rhosquare in Simple
  Linear Regression'.
  GET data /VAR=height deadsp /MISSING OMIT.
  * Sample statistics *.
  COMPUTE n=NROW(data).
  COMPUTE mean=CSUM(data)/n.
  COMPUTE variance=(CSSQ(data)n&*(mean&**2))/(n1).
  COMPUTE x=data(:,1).
  COMPUTE y=data(:,2).
  COMPUTE covxy=((T(x)*y)n*mean(1)*mean(2))/(n1).
  COMPUTE r=covxy/SQRT(variance(1)*variance(2)).
  COMPUTE b=covxy/variance(1).
  COMPUTE a=mean(2)b*mean(1).
  PRINT {a,b}
  /FORMAT='F8.3'
  /CLABEL='a','b'
  /TITLE='Regression line'.
  PRINT {r**2,1(1r**2)*(n1)/(n2)}
  /FORMAT='F8.3'
  /CLABEL='RSquare','PSquare'
  /TITLE='Sample & Population (Adjusted) Rsquare'.
  * Bootstraping PR2 *.
  COMPUTE k=1000.
  COMPUTE bootR2 =MAKE(k,1,0).
  COMPUTE bootsamp=MAKE(n,2,0).
  COMPUTE lowersum=0.
  COMPUTE uppersum=0.
  LOOP h=1 TO 10. /* 10 runs (to average them later) *.
   LOOP i=1 TO k. /* Extracting k bootstrap samples *.
   LOOP j= 1 TO n./* with sample size n *.
   COMPUTE flipcoin=1+TRUNC(n*UNIFORM(1,1)).
   COMPUTE bootsamp(j,:)=data(flipcoin,:)).
   END LOOP.
   COMPUTE mean=CSUM(bootsamp)/n.
   COMPUTE variance=(CSSQ(bootsamp)n&*(mean&**2))/(n1).
   COMPUTE x=bootsamp(:,1).
   COMPUTE y=bootsamp(:,2).
   COMPUTE covxy=((T(x)*y)n*mean(1)*mean(2))/(n1).
   COMPUTE r=covxy/SQRT(variance(1)*variance(2)).
   COMPUTE bootR2(i)=1(1r**2)*(n1)/(n2).
   END LOOP.
  * Ordered array: sorting algorithm by R Ristow & J Peck *.
   COMPUTE sortedR2=bootR2.
   COMPUTE sortedR2(GRADE(bootR2))=bootR2.
  * NP confidence interval *.
   COMPUTE lower=sortedR2(k*0.012).
   COMPUTE upper=sortedR2(1+k*0.98).
   COMPUTE lowersum=lowersum+lower.
   COMPUTE uppersum=uppersum+upper.
  END LOOP.
  PRINT {lowersum/10,uppersum/10}
  /FORMAT='F8.4'
  /TITLE='RhoSquare CI: Mean of 10 bootstrap runs (1000 reps. each)'
  /CLABEL='LowerCI','UpperCI'.
  END MATRIX.
  RESTORE.
 
  MGJ> I'm not sophisticated enough with MATRIX commands to follow
  MGJ> all that you've done. However, I don't see where you
 centered the
  MGJ> bootstrap distribution on R^2Adjusted (what Steiger
 calls P^2).
  MGJ> It appears that the distribution is centered on R^2 itself.
 
  MGJ> The population parameter estimate, P^2, is 1  [(1  R^2)(N
  MGJ>  1)]/(N  K), where K is the number of variables in
 the equation
  MGJ> (i.e., 2 for your example).
 
  Regards
 
  Marta
 
