

Hi all, I’m trying to put together an example dataset for teaching purposes. I’ve successfully written a few macros and additional syntax that can create example data with bivariate correlations between X1 and Y1 of various strengths with corresponding error terms where I can specify the mean and std dev of X1 and Y1 (with N=1,000), but I’m at a loss to figure out how to write some type of code that might create a distribution that might illustrate the concept of heteroscedasticity. I’m assuming that the easiest way to do this is to create a variable (also with N=1000) that might represent error in Y1 where the variance of that error increases as the predicted value of Y increases, and then add that error to my original Y1 variable, but I’m unsure how to create that heteroscedastic error variable. Any hints/ideas? Best, Jeff
=====================
To manage your subscription to SPSSXL, send a message to
[hidden email] (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD


The easy cure for heteroscedasticity, when there is a cure, is the appropriate power
transformation of the misscaled variable.
Assuming that you want to show the cure, you should start by generating your ordinary
relationship as you prefer, using nonnegative numbers; then transform with the inverse
of whichever "cure" you want. You get more effect with a larger multiplicative range of
scores, but the square (for instance) is a mild power; you get more effect from exponentiating
(cured by logging) or taking a reciprocal. ["multiplicative range"  (1,2) or (100,200) give the
same range; (2,20) is a much larger range, being 10fold instead of twofold. I am talking
about "power transformations", after all.]
When Tukey talked about simulations for heteroscedasticity, another approach for generating
wild variance was to use a mixtureofpopulations. The simulation would assume that there
was some regular relationship for most cases; and, then, some small fraction of cases,
like 1% or 5% or 10%, came from a different population where the variance was (say) 10
times as great for the IV. I forget his other details, but I imagine situations where the small
butvariable population differs from the other; it has the only real effects, or it is pure noise,
or it contradicts the other effects.
Hope this helps.

Rich Ulrich
Hi all,
I’m trying to put together an example dataset for teaching purposes.
I’ve successfully written a few macros and additional syntax that can create example data with bivariate correlations between X1 and Y1 of various strengths with corresponding error terms where I can specify the mean and std dev of X1
and Y1 (with N=1,000), but I’m at a loss to figure out how to write some type of code that might create a distribution that might illustrate the concept of heteroscedasticity.
I’m assuming that the easiest way to do this is to create a variable (also with N=1000) that might represent error in Y1 where the variance of that error increases as the predicted value of Y increases, and then add that error to my original
Y1 variable, but I’m unsure how to create that heteroscedastic error variable.
Any hints/ideas?
Best,
Jeff
===================== To manage your subscription to SPSSXL, send a message to [hidden email] (not to SPSSXL), with no body text except the command. To leave the list, send the command SIGNOFF SPSSXL For a list of commands to manage subscriptions, send the command INFO REFCARD
=====================
To manage your subscription to SPSSXL, send a message to
[hidden email] (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD


compute heteros = rv.normal(z, k * x) where z is the source of the mean, x is the source of the heteroscedasticity, and k scales it as desired.
You might also be interested in the Data with Cases custom dialog. It generates code for a dataset of any number of random variables from any of 23 distributions and 5 types of correlations (some of which mess up the chosen distribution).
The dialog generates an input program that can in most cases be run without having the dialog installed.
This dialog is posted on the old SPSS Community website, but I can send it to anyone who is interested.
=====================
To manage your subscription to SPSSXL, send a message to
[hidden email] (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Jon,
I'd be interested in this. Thanks.
From: SPSSX(r) Discussion <[hidden email]> on behalf of Jon Peck <[hidden email]>
Sent: Sunday, May 27, 2018 1:53:18 PM
To: [hidden email]
Subject: Re: How to: Create a distribution to illustrate Heteroscedasticity?
compute heteros = rv.normal(z, k * x)
where z is the source of the mean, x is the source of the heteroscedasticity, and k scales it as desired.
You might also be interested in the Data with Cases custom dialog. It generates code for a dataset of any number of random variables from any of 23 distributions and 5 types of correlations (some of which
mess up the chosen distribution).
The dialog generates an input program that can in most cases be run without having the dialog installed.
This dialog is posted on the old SPSS Community website, but I can send it to anyone who is interested.
===================== To manage your subscription to SPSSXL, send a message to [hidden email] (not to SPSSXL), with no body text except the command. To leave the list, send the command SIGNOFF SPSSXL For a list of commands to manage subscriptions, send the command INFO REFCARD
=====================
To manage your subscription to SPSSXL, send a message to
[hidden email] (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Also interested and thanks Will WMB & Associates Statistical Services ============ mailto: [hidden email] http://home.earthlink.net/~z_statman ============ > On 5/27/2018 2:40:44 PM, Dates, Brian ([hidden email]) wrote: > > Jon, > > > I'd be interested in this. Thanks. > > > Brian > ________________________________ > From: SPSSX(r) Discussion <[hidden email]> on behalf of Jon Peck <[hidden email]> > Sent: Sunday, May 27, 2018 1:53:18 PM > To: [hidden email] > Subject: Re: How to: Create a distribution to illustrate Heteroscedasticity? > > compute heteros = rv.normal(z, k * x) > where z is the source of the mean, x is the source of the heteroscedasticity, and k scales it as desired. > > You might also be interested in the Data with Cases custom dialog. It generates code for a dataset of any number of random variables from any of 23 distributions and 5 types of correlations (some of which mess up the chosen distribution). > > The dialog generates an input program that can in most cases be run without having the dialog installed. > > This dialog is posted on the old SPSS Community website, but I can send it to anyone who is interested. > > On Sat, May 26, 2018 at 11:13 PM, Jeff <[hidden email]<[hidden email]>> wrote: > > > > Hi all, > > > > I’m trying to put together an example dataset for teaching purposes. > > > > I’ve successfully written a few macros and additional
=====================
To manage your subscription to SPSSXL, send a message to
[hidden email] (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Jeff,
Interesting question. Below my name you will find simulation code that should generate data with heteroskedastic errors from a simple linear regression model. Next, there is code that produces a scatterplot of standardized residual values against standardized predicted values from the simple linear regression model.
Hope this helps.
Ryan 
*GENERATE DATA. SET SEED 98768795. NEW FILE. INPUT PROGRAM. LOOP i = 1 TO 100. COMPUTE X = EXP(RV.NORMAL(0,1)). COMPUTE B0 = 1. COMPUTE B1 = 2. COMPUTE ERROR = SQRT(.8*x)*RV.NORMAL(0,1). COMPUTE Y = B0 + B1*x + ERROR. END CASE. END LOOP. END FILE. END INPUT PROGRAM. EXECUTE.
REGRESSION /DEPENDENT Y /METHOD=ENTER X /SCATTERPLOT=(*ZRESID ,*ZPRED).
=====================
To manage your subscription to SPSSXL, send a message to
[hidden email] (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Hi Jon, I’ve searched quickly for the website you mentioned, but didn’t have any luck so I’m definitely interested in the dialog you’ve mentioned. I’m finding are too many examples, videos, and other pieces of info on line that are either far too complex, either too detailed or not detailed enough, or just plain wrong for teaching purposes and I’ve decided to start to make my own example data and output to use in class. Jeff From: Jon Peck <[hidden email]> Sent: Monday, 28 May 2018 3:53 AM To: Jeff <[hidden email]> Cc: SPSS List <[hidden email]> Subject: Re: [SPSSXL] How to: Create a distribution to illustrate Heteroscedasticity? compute heteros = rv.normal(z, k * x) where z is the source of the mean, x is the source of the heteroscedasticity, and k scales it as desired. You might also be interested in the Data with Cases custom dialog. It generates code for a dataset of any number of random variables from any of 23 distributions and 5 types of correlations (some of which mess up the chosen distribution). The dialog generates an input program that can in most cases be run without having the dialog installed. This dialog is posted on the old SPSS Community website, but I can send it to anyone who is interested. On Sat, May 26, 2018 at 11:13 PM, Jeff <[hidden email]> wrote: Hi all, I’m trying to put together an example dataset for teaching purposes. I’ve successfully written a few macros and additional syntax that can create example data with bivariate correlations between X1 and Y1 of various strengths with corresponding error terms where I can specify the mean and std dev of X1 and Y1 (with N=1,000), but I’m at a loss to figure out how to write some type of code that might create a distribution that might illustrate the concept of heteroscedasticity. I’m assuming that the easiest way to do this is to create a variable (also with N=1000) that might represent error in Y1 where the variance of that error increases as the predicted value of Y increases, and then add that error to my original Y1 variable, but I’m unsure how to create that heteroscedastic error variable. Any hints/ideas? Best, Jeff ===================== To manage your subscription to SPSSXL, send a message to [hidden email] (not to SPSSXL), with no body text except the command. To leave the list, send the command SIGNOFF SPSSXL For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSXL, send a message to
[hidden email] (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Thanks. …much easier than then the way I’ve been generating a few variables. …will have to explore this code more and what Jon has offered to send. Jeff From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Ryan Black Sent: Monday, 28 May 2018 8:18 AM To: [hidden email] Subject: Re: How to: Create a distribution to illustrate Heteroscedasticity? Jeff, Interesting question. Below my name you will find simulation code that should generate data with heteroskedastic errors from a simple linear regression model. Next, there is code that produces a scatterplot of standardized residual values against standardized predicted values from the simple linear regression model. COMPUTE X = EXP(RV.NORMAL(0,1)). COMPUTE ERROR = SQRT(.8*x)*RV.NORMAL(0,1). COMPUTE Y = B0 + B1*x + ERROR. /SCATTERPLOT=(*ZRESID ,*ZPRED). On Sun, May 27, 2018 at 1:13 AM, Jeff <[hidden email]> wrote: Hi all, I’m trying to put together an example dataset for teaching purposes. I’ve successfully written a few macros and additional syntax that can create example data with bivariate correlations between X1 and Y1 of various strengths with corresponding error terms where I can specify the mean and std dev of X1 and Y1 (with N=1,000), but I’m at a loss to figure out how to write some type of code that might create a distribution that might illustrate the concept of heteroscedasticity. I’m assuming that the easiest way to do this is to create a variable (also with N=1000) that might represent error in Y1 where the variance of that error increases as the predicted value of Y increases, and then add that error to my original Y1 variable, but I’m unsure how to create that heteroscedastic error variable. Any hints/ideas? Best, Jeff ===================== To manage your subscription to SPSSXL, send a message to [hidden email] (not to SPSSXL), with no body text except the command. To leave the list, send the command SIGNOFF SPSSXL For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSXL, send a message to [hidden email] (not to SPSSXL), with no body text except the command. To leave the list, send the command SIGNOFF SPSSXL For a list of commands to manage subscriptions, send the command INFO REFCARD
=====================
To manage your subscription to SPSSXL, send a message to
[hidden email] (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD

