# How to: Create a distribution to illustrate Heteroscedasticity?

8 messages
Open this post in threaded view
|

## How to: Create a distribution to illustrate Heteroscedasticity?

 Hi all, I’m trying to put together an example dataset for teaching purposes.  I’ve successfully written a few macros and additional syntax that can create example data with bivariate correlations between X1 and Y1 of various strengths with corresponding error terms where I can specify the mean and std dev of X1 and Y1 (with N=1,000), but I’m at a loss to figure out how to write some type of code that might create a distribution that might illustrate the concept of heteroscedasticity.  I’m assuming that the easiest way to do this is to create a variable (also with N=1000) that might represent error in Y1 where the variance of that error increases as the predicted value of Y increases, and then add that error to my original Y1 variable, but I’m unsure how to create that heteroscedastic error variable.  Any hints/ideas? Best, Jeff    ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Open this post in threaded view
|

## Re: How to: Create a distribution to illustrate Heteroscedasticity?

 The easy cure for heteroscedasticity, when there is a cure, is the appropriate power- transformation of the mis-scaled variable.  Assuming that you want to show the cure, you should start by generating your ordinary relationship as you prefer, using non-negative numbers; then transform with the inverse of whichever "cure" you want.  You get more effect with a larger multiplicative range of scores, but the square (for instance) is a mild power; you get more effect from exponentiating (cured by logging) or taking a reciprocal.  ["multiplicative range" - (1,2) or (100,200) give the same range; (2,20) is a much larger range, being 10-fold instead of two-fold.  I am talking about "power transformations", after all.] When Tukey talked about simulations for heteroscedasticity, another approach for generating wild variance was to use a mixture-of-populations. The simulation would assume that there was some regular relationship for most cases; and, then, some small fraction of cases, like 1% or 5% or 10%, came from a different population where the variance was (say) 10 times as great for the IV.   I forget his other details, but I imagine situations where the small -but-variable population differs from the other; it has the only real effects, or it is pure noise, or it contradicts the other effects. Hope this helps. -- Rich Ulrich From: SPSSX(r) Discussion <[hidden email]> on behalf of Jeff <[hidden email]> Sent: Sunday, May 27, 2018 1:13:29 AM To: [hidden email] Subject: How to: Create a distribution to illustrate Heteroscedasticity?     Hi all,   I’m trying to put together an example dataset for teaching purposes.   I’ve successfully written a few macros and additional syntax that can create example data with bivariate correlations between X1 and Y1 of various strengths with corresponding error terms where I can specify the mean and std dev of X1 and Y1 (with N=1,000), but I’m at a loss to figure out how to write some type of code that might create a distribution that might illustrate the concept of heteroscedasticity.   I’m assuming that the easiest way to do this is to create a variable (also with N=1000) that might represent error in Y1 where the variance of that error increases as the predicted value of Y increases, and then add that error to my original Y1 variable, but I’m unsure how to create that heteroscedastic error variable.   Any hints/ideas?   Best,   Jeff       ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Open this post in threaded view
|

## Re: How to: Create a distribution to illustrate Heteroscedasticity?

 In reply to this post by Jeff6610 compute heteros = rv.normal(z, k * x)where z is the source of the mean, x is the source of the heteroscedasticity, and k scales it as desired.You might also be interested in the Data with Cases custom dialog.  It generates code for a dataset of any number of random variables from any of 23 distributions and 5 types of correlations (some of which mess up the chosen distribution).The dialog generates an input program that can in most cases be run without having the dialog installed.This dialog is posted on the old SPSS Community website, but I can send it to anyone who is interested.On Sat, May 26, 2018 at 11:13 PM, Jeff wrote: Hi all, I’m trying to put together an example dataset for teaching purposes.  I’ve successfully written a few macros and additional syntax that can create example data with bivariate correlations between X1 and Y1 of various strengths with corresponding error terms where I can specify the mean and std dev of X1 and Y1 (with N=1,000), but I’m at a loss to figure out how to write some type of code that might create a distribution that might illustrate the concept of heteroscedasticity.  I’m assuming that the easiest way to do this is to create a variable (also with N=1000) that might represent error in Y1 where the variance of that error increases as the predicted value of Y increases, and then add that error to my original Y1 variable, but I’m unsure how to create that heteroscedastic error variable.  Any hints/ideas? Best, Jeff    ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD-- Jon K Peck[hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Open this post in threaded view
|

## Re: How to: Create a distribution to illustrate Heteroscedasticity?

 Jon, I'd be interested in this. Thanks. Brian From: SPSSX(r) Discussion <[hidden email]> on behalf of Jon Peck <[hidden email]> Sent: Sunday, May 27, 2018 1:53:18 PM To: [hidden email] Subject: Re: How to: Create a distribution to illustrate Heteroscedasticity?   compute heteros = rv.normal(z, k * x) where z is the source of the mean, x is the source of the heteroscedasticity, and k scales it as desired. You might also be interested in the Data with Cases custom dialog.  It generates code for a dataset of any number of random variables from any of 23 distributions and 5 types of correlations (some of which mess up the chosen distribution). The dialog generates an input program that can in most cases be run without having the dialog installed. This dialog is posted on the old SPSS Community website, but I can send it to anyone who is interested. On Sat, May 26, 2018 at 11:13 PM, Jeff wrote:   Hi all,   I’m trying to put together an example dataset for teaching purposes.   I’ve successfully written a few macros and additional syntax that can create example data with bivariate correlations between X1 and Y1 of various strengths with corresponding error terms where I can specify the mean and std dev of X1 and Y1 (with N=1,000), but I’m at a loss to figure out how to write some type of code that might create a distribution that might illustrate the concept of heteroscedasticity.   I’m assuming that the easiest way to do this is to create a variable (also with N=1000) that might represent error in Y1 where the variance of that error increases as the predicted value of Y increases, and then add that error to my original Y1 variable, but I’m unsure how to create that heteroscedastic error variable.   Any hints/ideas?   Best,   Jeff       ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD -- Jon K Peck [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Open this post in threaded view
|

## Re: How to: Create a distribution to illustrate Heteroscedasticity?

 In reply to this post by Jeff6610 Also interested and thanks WillWMB & AssociatesStatistical Services ============mailto: [hidden email]============  > On 5/27/2018 2:40:44 PM, Dates, Brian ([hidden email]) wrote: > > Jon, > > > I'd be interested in this. Thanks. > > > Brian > ________________________________ > From: SPSSX(r) Discussion <[hidden email]> on behalf of Jon Peck <[hidden email]> > Sent: Sunday, May 27, 2018 1:53:18 PM > To: [hidden email] > Subject: Re: How to: Create a distribution to illustrate Heteroscedasticity? > > compute heteros = rv.normal(z, k * x) > where z is the source of the mean, x is the source of the heteroscedasticity, and k scales it as desired. > > You might also be interested in the Data with Cases custom dialog.  It generates code for a dataset of any number of random variables from any of 23 distributions and 5 types of correlations (some of which mess up the chosen distribution).> > The dialog generates an input program that can in most cases be run without having the dialog installed. > > This dialog is posted on the old SPSS Community website, but I can send it to anyone who is interested. > > On Sat, May 26, 2018 at 11:13 PM, Jeff <[hidden email]<[hidden email]>> wrote: > > > > Hi all, > > > > I’m trying to put together an example dataset for teaching purposes. > > > > I’ve successfully written a few macros and additional ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Open this post in threaded view
|

## Re: How to: Create a distribution to illustrate Heteroscedasticity?

 In reply to this post by Jeff6610 Jeff,Interesting question. Below my name you will find simulation code that should generate data with heteroskedastic errors from a simple linear regression model. Next, there is code that produces a scatterplot of standardized residual values against standardized predicted values from the simple linear regression model.Hope this helps.Ryan--*GENERATE DATA.SET SEED 98768795.NEW FILE.INPUT PROGRAM.LOOP i = 1 TO 100.COMPUTE X = EXP(RV.NORMAL(0,1)).COMPUTE B0 = -1.COMPUTE B1 = 2.COMPUTE ERROR = SQRT(.8*x)*RV.NORMAL(0,1).COMPUTE Y = B0 + B1*x + ERROR.END CASE.END LOOP.END FILE.END INPUT PROGRAM.EXECUTE.REGRESSION  /DEPENDENT Y  /METHOD=ENTER X  /SCATTERPLOT=(*ZRESID ,*ZPRED).On Sun, May 27, 2018 at 1:13 AM, Jeff wrote: Hi all, I’m trying to put together an example dataset for teaching purposes.  I’ve successfully written a few macros and additional syntax that can create example data with bivariate correlations between X1 and Y1 of various strengths with corresponding error terms where I can specify the mean and std dev of X1 and Y1 (with N=1,000), but I’m at a loss to figure out how to write some type of code that might create a distribution that might illustrate the concept of heteroscedasticity.  I’m assuming that the easiest way to do this is to create a variable (also with N=1000) that might represent error in Y1 where the variance of that error increases as the predicted value of Y increases, and then add that error to my original Y1 variable, but I’m unsure how to create that heteroscedastic error variable.  Any hints/ideas? Best, Jeff    ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD