Bootstrapping pairwise not listwise, a problem with N

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Bootstrapping pairwise not listwise, a problem with N

walrusandpossum
Hello :-)

First time posting so forgive me if I give an unclear description of what I am trying to achieve..

I need to bootstrap my correlations due to non-normal age distribution (and age is correlated with most variables, so I have controlled by regressing tasks onto age). Anyway, I would like to take advantage of all the data available but some data is not available (i.e. because the participant didn't understand the task so were excluded, or they didn't complete the task). So this means I have different N on each variable. Of course when I bootstrap all correlations in the matrix it uses listwise deletion. Is it ok to instead bootstrap each pair of correlations and create my own matrix? And then run a multiple regression on these bootstrapped correlations using pairwise deletion (or use as input into Amos)?

Any insights and advice would be greatly appreciated!

Many thanks,

Laura


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Bootstrapping pairwise not listwise, a problem with N

Bruce Weaver
Administrator
If I follow, the ultimate goal is to estimate a multiple regression model that includes age plus a bunch of other variables, but you are concerned because age is not normally distributed.  Right?  What does the age distribution look like?  

Bear in mind the following points.

1. The normal distribution is just a model, and nothing in nature is truly normal (see mkweb.bcgsc.ca/pointsofsignificance/img/Boxonmaths.pdf).  (Nothing in nature is truly linear either.)  

2. The key assumptions of OLS linear regression are that the *errors* (not the variables) are independently and identically distributed as normal with a mean of 0 and some variance (sigma^2).  And normality of the errors is less important than their independence and homoscedasticity.  

With those points in mind, you might want to just fit your regression model and then examine the residuals (e.g., using residual plots).  Googling <SPSS regression residual analysis> will likely turn up some good info on how to proceed.  

HTH.

walrusandpossum wrote
Hello :-)

First time posting so forgive me if I give an unclear description of what I am trying to achieve..

I need to bootstrap my correlations due to non-normal age distribution (and age is correlated with most variables, so I have controlled by regressing tasks onto age). Anyway, I would like to take advantage of all the data available but some data is not available (i.e. because the participant didn't understand the task so were excluded, or they didn't complete the task). So this means I have different N on each variable. Of course when I bootstrap all correlations in the matrix it uses listwise deletion. Is it ok to instead bootstrap each pair of correlations and create my own matrix? And then run a multiple regression on these bootstrapped correlations using pairwise deletion (or use as input into Amos)?

Any insights and advice would be greatly appreciated!

Many thanks,

Laura
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Bootstrapping pairwise not listwise, a problem with N

Jon Peck
Besides Bruce's points, which are all valid, you may have endogenous selection of your complete data for the regression, which could introduce bias in the regression model.  However, without more information about that model and the dependent variable, it is impossible to know.

On Wed, May 24, 2017 at 5:52 AM, Bruce Weaver <[hidden email]> wrote:
If I follow, the ultimate goal is to estimate a multiple regression model
that includes age plus a bunch of other variables, but you are concerned
because age is not normally distributed.  Right?  What does the age
distribution look like?

Bear in mind the following points.

1. The normal distribution is just a model, and nothing in nature is truly
normal (see mkweb.bcgsc.ca/pointsofsignificance/img/Boxonmaths.pdf).
(Nothing in nature is truly linear either.)

2. The key assumptions of OLS linear regression are that the *errors* (not
the variables) are independently and identically distributed as normal with
a mean of 0 and some variance (sigma^2).  And normality of the errors is
less important than their independence and homoscedasticity.

With those points in mind, you might want to just fit your regression model
and then examine the residuals (e.g., using residual plots).  Googling <SPSS
regression residual analysis> will likely turn up some good info on how to
proceed.

HTH.


walrusandpossum wrote
> Hello :-)
>
> First time posting so forgive me if I give an unclear description of what
> I am trying to achieve..
>
> I need to bootstrap my correlations due to non-normal age distribution
> (and age is correlated with most variables, so I have controlled by
> regressing tasks onto age). Anyway, I would like to take advantage of all
> the data available but some data is not available (i.e. because the
> participant didn't understand the task so were excluded, or they didn't
> complete the task). So this means I have different N on each variable. Of
> course when I bootstrap all correlations in the matrix it uses listwise
> deletion. Is it ok to instead bootstrap each pair of correlations and
> create my own matrix? And then run a multiple regression on these
> bootstrapped correlations using pairwise deletion (or use as input into
> Amos)?
>
> Any insights and advice would be greatly appreciated!
>
> Many thanks,
>
> Laura





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Bootstrapping-pairwise-not-listwise-a-problem-with-N-tp5734235p5734236.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Loading...