How to: Create a distribution to illustrate Heteroscedasticity?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

How to: Create a distribution to illustrate Heteroscedasticity?

Jeff-2

 

Hi all,

 

I’m trying to put together an example dataset for teaching purposes.

 

I’ve successfully written a few macros and additional syntax that can create example data with bivariate correlations between X1 and Y1 of various strengths with corresponding error terms where I can specify the mean and std dev of X1 and Y1 (with N=1,000), but I’m at a loss to figure out how to write some type of code that might create a distribution that might illustrate the concept of heteroscedasticity.

 

I’m assuming that the easiest way to do this is to create a variable (also with N=1000) that might represent error in Y1 where the variance of that error increases as the predicted value of Y increases, and then add that error to my original Y1 variable, but I’m unsure how to create that heteroscedastic error variable.

 

Any hints/ideas?

 

Best,

 

Jeff

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: How to: Create a distribution to illustrate Heteroscedasticity?

Rich Ulrich

The easy cure for heteroscedasticity, when there is a cure, is the appropriate power-

transformation of the mis-scaled variable. 


Assuming that you want to show the cure, you should start by generating your ordinary

relationship as you prefer, using non-negative numbers; then transform with the inverse

of whichever "cure" you want.  You get more effect with a larger multiplicative range of

scores, but the square (for instance) is a mild power; you get more effect from exponentiating

(cured by logging) or taking a reciprocal.  ["multiplicative range" - (1,2) or (100,200) give the

same range; (2,20) is a much larger range, being 10-fold instead of two-fold.  I am talking

about "power transformations", after all.]


When Tukey talked about simulations for heteroscedasticity, another approach for generating

wild variance was to use a mixture-of-populations. The simulation would assume that there

was some regular relationship for most cases; and, then, some small fraction of cases,

like 1% or 5% or 10%, came from a different population where the variance was (say) 10

times as great for the IV.   I forget his other details, but I imagine situations where the small

-but-variable population differs from the other; it has the only real effects, or it is pure noise,

or it contradicts the other effects.


Hope this helps.

--

Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Jeff <[hidden email]>
Sent: Sunday, May 27, 2018 1:13:29 AM
To: [hidden email]
Subject: How to: Create a distribution to illustrate Heteroscedasticity?
 

 

Hi all,

 

I’m trying to put together an example dataset for teaching purposes.

 

I’ve successfully written a few macros and additional syntax that can create example data with bivariate correlations between X1 and Y1 of various strengths with corresponding error terms where I can specify the mean and std dev of X1 and Y1 (with N=1,000), but I’m at a loss to figure out how to write some type of code that might create a distribution that might illustrate the concept of heteroscedasticity.

 

I’m assuming that the easiest way to do this is to create a variable (also with N=1000) that might represent error in Y1 where the variance of that error increases as the predicted value of Y increases, and then add that error to my original Y1 variable, but I’m unsure how to create that heteroscedastic error variable.

 

Any hints/ideas?

 

Best,

 

Jeff

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: How to: Create a distribution to illustrate Heteroscedasticity?

Jon Peck
In reply to this post by Jeff-2
compute heteros = rv.normal(z, k * x)
where z is the source of the mean, x is the source of the heteroscedasticity, and k scales it as desired.

You might also be interested in the Data with Cases custom dialog.  It generates code for a dataset of any number of random variables from any of 23 distributions and 5 types of correlations (some of which mess up the chosen distribution).

The dialog generates an input program that can in most cases be run without having the dialog installed.

This dialog is posted on the old SPSS Community website, but I can send it to anyone who is interested.

On Sat, May 26, 2018 at 11:13 PM, Jeff <[hidden email]> wrote:

 

Hi all,

 

I’m trying to put together an example dataset for teaching purposes.

 

I’ve successfully written a few macros and additional syntax that can create example data with bivariate correlations between X1 and Y1 of various strengths with corresponding error terms where I can specify the mean and std dev of X1 and Y1 (with N=1,000), but I’m at a loss to figure out how to write some type of code that might create a distribution that might illustrate the concept of heteroscedasticity.

 

I’m assuming that the easiest way to do this is to create a variable (also with N=1000) that might represent error in Y1 where the variance of that error increases as the predicted value of Y increases, and then add that error to my original Y1 variable, but I’m unsure how to create that heteroscedastic error variable.

 

Any hints/ideas?

 

Best,

 

Jeff

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: How to: Create a distribution to illustrate Heteroscedasticity?

bdates

Jon,


I'd be interested in this. Thanks.


Brian

From: SPSSX(r) Discussion <[hidden email]> on behalf of Jon Peck <[hidden email]>
Sent: Sunday, May 27, 2018 1:53:18 PM
To: [hidden email]
Subject: Re: How to: Create a distribution to illustrate Heteroscedasticity?
 
compute heteros = rv.normal(z, k * x)
where z is the source of the mean, x is the source of the heteroscedasticity, and k scales it as desired.

You might also be interested in the Data with Cases custom dialog.  It generates code for a dataset of any number of random variables from any of 23 distributions and 5 types of correlations (some of which mess up the chosen distribution).

The dialog generates an input program that can in most cases be run without having the dialog installed.

This dialog is posted on the old SPSS Community website, but I can send it to anyone who is interested.

On Sat, May 26, 2018 at 11:13 PM, Jeff <[hidden email]> wrote:

 

Hi all,

 

I’m trying to put together an example dataset for teaching purposes.

 

I’ve successfully written a few macros and additional syntax that can create example data with bivariate correlations between X1 and Y1 of various strengths with corresponding error terms where I can specify the mean and std dev of X1 and Y1 (with N=1,000), but I’m at a loss to figure out how to write some type of code that might create a distribution that might illustrate the concept of heteroscedasticity.

 

I’m assuming that the easiest way to do this is to create a variable (also with N=1000) that might represent error in Y1 where the variance of that error increases as the predicted value of Y increases, and then add that error to my original Y1 variable, but I’m unsure how to create that heteroscedastic error variable.

 

Any hints/ideas?

 

Best,

 

Jeff

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: How to: Create a distribution to illustrate Heteroscedasticity?

WillBaileyz
In reply to this post by Jeff-2

Also interested and thanks

 

Will

WMB & Associates

Statistical Services

============

mailto: [hidden email]

http://home.earthlink.net/~z_statman

============

 

 

> On 5/27/2018 2:40:44 PM, Dates, Brian ([hidden email]) wrote:
> > Jon,
>
>
> I'd be interested in this. Thanks.
>
>
> Brian
> ________________________________
> From: SPSSX(r) Discussion <[hidden email]> on behalf of Jon Peck <[hidden email]>
> Sent: Sunday, May 27, 2018 1:53:18 PM
> To: [hidden email]
> Subject: Re: How to: Create a distribution to illustrate Heteroscedasticity?
>
> compute heteros = rv.normal(z, k * x)
> where z is the source of the mean, x is the source of the heteroscedasticity, and k scales it as desired.
>
> You might also be interested in the Data with Cases custom dialog.  It generates code for a dataset of any number of random variables from any of 23 distributions and 5 types of correlations (some of which mess up the chosen distribution).

>
> The dialog generates an input program that can in most cases be run without having the dialog installed.
>
> This dialog is posted on the old SPSS Community website, but I can send it to anyone who is interested.
>
> On Sat, May 26, 2018 at 11:13 PM, Jeff <[hidden email]<[hidden email]>> wrote:
>
>
>
> Hi all,
>
>
>
> I’m trying to put together an example dataset for teaching purposes.
>
>
>
> I’ve successfully written a few macros and additional

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: How to: Create a distribution to illustrate Heteroscedasticity?

Ryan Black
In reply to this post by Jeff-2
Jeff,

Interesting question. Below my name you will find simulation code that should generate data with heteroskedastic errors from a simple linear regression model. Next, there is code that produces a scatterplot of standardized residual values against standardized predicted values from the simple linear regression model.

Hope this helps.

Ryan
--

*GENERATE DATA.
SET SEED 98768795.
NEW FILE.
INPUT PROGRAM.
LOOP i = 1 TO 100.
COMPUTE X = EXP(RV.NORMAL(0,1)).
COMPUTE B0 = -1.
COMPUTE B1 = 2.
COMPUTE ERROR = SQRT(.8*x)*RV.NORMAL(0,1).
COMPUTE Y = B0 + B1*x + ERROR.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
EXECUTE.

REGRESSION
  /DEPENDENT Y
  /METHOD=ENTER X
  /SCATTERPLOT=(*ZRESID ,*ZPRED).




On Sun, May 27, 2018 at 1:13 AM, Jeff <[hidden email]> wrote:

 

Hi all,

 

I’m trying to put together an example dataset for teaching purposes.

 

I’ve successfully written a few macros and additional syntax that can create example data with bivariate correlations between X1 and Y1 of various strengths with corresponding error terms where I can specify the mean and std dev of X1 and Y1 (with N=1,000), but I’m at a loss to figure out how to write some type of code that might create a distribution that might illustrate the concept of heteroscedasticity.

 

I’m assuming that the easiest way to do this is to create a variable (also with N=1000) that might represent error in Y1 where the variance of that error increases as the predicted value of Y increases, and then add that error to my original Y1 variable, but I’m unsure how to create that heteroscedastic error variable.

 

Any hints/ideas?

 

Best,

 

Jeff

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: How to: Create a distribution to illustrate Heteroscedasticity?

Jeff-2
In reply to this post by Jon Peck

 

Hi Jon,

 

I’ve searched quickly for the website you mentioned, but didn’t have any luck so I’m definitely interested in the dialog you’ve mentioned.

 

I’m finding are too many examples, videos, and other pieces of info on line that are either far too complex, either too detailed or not detailed enough, or just plain wrong for teaching purposes and I’ve decided to start to make my own example data and output to use in class.

 

Jeff

 

 

 

From: Jon Peck <[hidden email]>
Sent: Monday, 28 May 2018 3:53 AM
To: Jeff <[hidden email]>
Cc: SPSS List <[hidden email]>
Subject: Re: [SPSSX-L] How to: Create a distribution to illustrate Heteroscedasticity?

 

compute heteros = rv.normal(z, k * x)

where z is the source of the mean, x is the source of the heteroscedasticity, and k scales it as desired.

 

You might also be interested in the Data with Cases custom dialog.  It generates code for a dataset of any number of random variables from any of 23 distributions and 5 types of correlations (some of which mess up the chosen distribution).

 

The dialog generates an input program that can in most cases be run without having the dialog installed.

 

This dialog is posted on the old SPSS Community website, but I can send it to anyone who is interested.

 

On Sat, May 26, 2018 at 11:13 PM, Jeff <[hidden email]> wrote:

 

Hi all,

 

I’m trying to put together an example dataset for teaching purposes.

 

I’ve successfully written a few macros and additional syntax that can create example data with bivariate correlations between X1 and Y1 of various strengths with corresponding error terms where I can specify the mean and std dev of X1 and Y1 (with N=1,000), but I’m at a loss to figure out how to write some type of code that might create a distribution that might illustrate the concept of heteroscedasticity.

 

I’m assuming that the easiest way to do this is to create a variable (also with N=1000) that might represent error in Y1 where the variance of that error increases as the predicted value of Y increases, and then add that error to my original Y1 variable, but I’m unsure how to create that heteroscedastic error variable.

 

Any hints/ideas?

 

Best,

 

Jeff

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



 

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: How to: Create a distribution to illustrate Heteroscedasticity?

Jeff-2
In reply to this post by Ryan Black

 

Thanks.

 

…much easier than then the way I’ve been generating a few variables. …will have to explore this code more and what Jon has offered to send.

 

Jeff

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Ryan Black
Sent: Monday, 28 May 2018 8:18 AM
To: [hidden email]
Subject: Re: How to: Create a distribution to illustrate Heteroscedasticity?

 

Jeff,

 

Interesting question. Below my name you will find simulation code that should generate data with heteroskedastic errors from a simple linear regression model. Next, there is code that produces a scatterplot of standardized residual values against standardized predicted values from the simple linear regression model.

 

Hope this helps.

 

Ryan

--

 

*GENERATE DATA.

SET SEED 98768795.

NEW FILE.

INPUT PROGRAM.

LOOP i = 1 TO 100.

COMPUTE X = EXP(RV.NORMAL(0,1)).

COMPUTE B0 = -1.

COMPUTE B1 = 2.

COMPUTE ERROR = SQRT(.8*x)*RV.NORMAL(0,1).

COMPUTE Y = B0 + B1*x + ERROR.

END CASE.

END LOOP.

END FILE.

END INPUT PROGRAM.

EXECUTE.

 

REGRESSION

  /DEPENDENT Y

  /METHOD=ENTER X

  /SCATTERPLOT=(*ZRESID ,*ZPRED).

 

 

 

 

On Sun, May 27, 2018 at 1:13 AM, Jeff <[hidden email]> wrote:

 

Hi all,

 

I’m trying to put together an example dataset for teaching purposes.

 

I’ve successfully written a few macros and additional syntax that can create example data with bivariate correlations between X1 and Y1 of various strengths with corresponding error terms where I can specify the mean and std dev of X1 and Y1 (with N=1,000), but I’m at a loss to figure out how to write some type of code that might create a distribution that might illustrate the concept of heteroscedasticity.

 

I’m assuming that the easiest way to do this is to create a variable (also with N=1000) that might represent error in Y1 where the variance of that error increases as the predicted value of Y increases, and then add that error to my original Y1 variable, but I’m unsure how to create that heteroscedastic error variable.

 

Any hints/ideas?

 

Best,

 

Jeff

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD