syntax for correlation randomly sort Y, say, 10,000 times

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

syntax for correlation randomly sort Y, say, 10,000 times

Art Kendall
I don't have SPSS where I am right now, but thought somebody on this list
might have written syntax to do this.

As a demo of uncertainty in, e.g., correlations, t-test, an article has
suggested randomly sorting values of a variable many times. (This could be
thought of as a reduced instance of parallel analysis for factor analysis.
If I had access to SPSS right now, I would just cannibalize the syntax for
parallel analysis.)

The steps are.
1. compute a correlation or other stat between X and Y
2. loop 10,000 times
-- 2.1 randomly sort Y leaving X in place
-- 2.2 compute the correlation or other stat between X and Y.
-- 2.3 put the stat in a file
3. end loop.
4. examine the distribution of the computed correlation or other stat.
5. report numerically and visually how the original correlation fits in the
distribution.
6. compare (a) the SE by conventional methods of the original correlation
with (b) the SE of the randomized correlation





-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: syntax for correlation randomly sort Y, say, 10,000 times

Andy W
The word you are looking for is "permutations". I have a macro that will
generate permutations of a particular variable and add them into a dataset,
see
https://dl.dropboxusercontent.com/s/2hgi2fqupeyorff/MACRO_Permutations.sps?dl=0.

Below is an example applying that to a correlation coefficient to generate a
reference distribution.

************************************************************************************************.            
DEFINE !PermData (Var = !TOKENS(1)
                 /N = !TOKENS(1)
                 /Base = !TOKENS(1)
                 /File = !TOKENS(1) )
PRESERVE.
SET MXLOOPS=!N.
DATASET ACTIVATE !File.
COMPUTE XX_TempID_XX = $casenum.
MATRIX.
GET z /FILE = * /VARIABLES = !Var.
GET Id /FILE = * /VARIABLES = XX_TempID_XX.
COMMENT generating permutation distributions.
COMPUTE Res = MAKE(NROW(z),!N+1,0).
COMPUTE Res(:,1) = Id.
LOOP #I = 1 TO !N.
 COMPUTE zP = !PERMC(z).
 COMPUTE Res(:,#I+1) = zP.
END LOOP.
SAVE Res
  /VARIABLES = XX_TempID_XX !CONCAT(!Base,"1") TO !CONCAT(!Base,!N)
  /OUTFILE = *.
END MATRIX.
DATASET NAME XX_TempResults_XX.
DATASET ACTIVATE !File.
MATCH FILES FILE = *
  /FILE = 'XX_TempResults_XX'
  /BY XX_TempID_XX.
DATASET CLOSE XX_TempResults_XX.
MATCH FILES FILE = * /DROP XX_TempID_XX.
RESTORE.
!ENDDEFINE.

*Permutates the order of a column vector (or rows of a matrix).
DEFINE !PERMC (!POSITIONAL !ENCLOSE("(",")") )
(!1(GRADE(UNIFORM(NROW(!1),1)),:))
!ENDDEFINE.  

*Now making fake data to show permutation approach.
SET SEED 10.
INPUT PROGRAM.
LOOP Id = 1 TO 1000.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
DATASET NAME Sim.
COMPUTE #Corr = 0.2.
COMPUTE X = RV.NORMAL(0,1).
COMPUTE Y = #Corr*X + RV.NORMAL(0,SQRT(1-#Corr**2)).
FREQ X Y /FORMAT = NOTABLE /STATISTICS MEAN STDDEV.

!PermData Var = Y N = 1000 Base = YPerm File = Sim.

VARSTOCASES /MAKE YPerm FROM YPerm1 TO YPerm1000 /INDEX Perm.
SORT CASES BY Perm Id.
SPLIT FILE BY Perm.
DATASET DECLARE Corrs.
OMS /SELECT TABLES /IF SUBTYPES='Correlations' /DESTINATION FORMAT=SAV
OUTFILE='Corrs' VIEWER=NO /TAG = 'CorrOut'.
CORRELATIONS X WITH Y YPerm.
OMSEND TAG='CorrOut'.
SPLIT FILE OFF.
DATASET ACTIVATE Corrs.
SELECT IF Var3 = "Pearson Correlation".
FORMATS YPerm Y (F3.2).
EXECUTE.
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=YPerm Y MISSING=LISTWISE
REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: YPerm=col(source(s), name("YPerm"))
  DATA: Y=col(source(s), name("Y"))
  TRANS: top=eval(100)
  TRANS: bot=eval(0)
  GUIDE: axis(dim(1), label("Correlation"), delta(0.05))
  GUIDE: axis(dim(2), label("Frequency"))
  ELEMENT: interval(position(summary.count(bin.rect(YPerm))),
shape.interior(shape.square))
  ELEMENT: edge(position(region.spread.range(Y*(bot+top))),
color.interior(color.red))
END GPL.
************************************************************************************************.      

Here you can see that the permutation distribution have correlations that
are centered on zero and range from about -0.1 to 0.1. So in this sample the
observed correlation of 0.17 is not very likely if the null of zero
correlation were true.

<http://spssx-discussion.1045642.n5.nabble.com/file/t329824/HistoPerm.png>



-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: syntax for correlation randomly sort Y, say, 10,000 times

Art Kendall
Great! All *permutations* are even better than randomizations for a reference
distribution.  When I get back I'll give this a try.

I look forward to seeing how this compares to the conventional SE.



-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: syntax for correlation randomly sort Y, say, 10,000 times

David Marso
Administrator
In reply to this post by Art Kendall
I did precisely that as an experiment and the correlations were uniformly
miniscule.
Why would you expect that the magnitude of the correlation would be
preserved under a randomization of one of the two vectors?
Here is my code.  Maybe I csrewed the pooch along the way -Doubtful- ;-)?

---
DEFINE !InputProgram (N  !TOKENS (1) /Corr !TOKENS (1) /Vars !CMDEND )
MATRIX.
SAVE MAKE(!N,2,0) /OUTFILE * / VARIABLES !Vars .
END MATRIX.
DO REPEAT v=!Vars.
COMPUTE v=RV.NORMAL (0,1).
END REPEAT.
FACTOR VARIABLES !Vars / CRITERIA FACTORS(2)/ SAVE REG (2,FS).
DELETE VARIABLES !Vars .
RENAME VARIABLES (FS1 FS2 = !Vars ).
MATRIX.
COMPUTE R={1,!Corr;!Corr,1}.
GET Data /FILE * / VARIABLES=!Vars.
COMPUTE Chol_R=CHOL(R).
COMPUTE CorrData=Data * Chol_R .
SAVE CorrData /OUTFILE * / VARIABLES =!Vars.
END MATRIX.
CORRELATIONS VARIABLES x y.
!ENDDEFINE.

DEFINE !DoPermutations ( NIter !TOKENS (1) /  OUTFILE !TOKENS (1) / Vars
!CMDEND )
PRESERVE.
FILE HANDLE fpFolder_With_ReadWriteRights /NAME='%userprofile%\Desktop' .
CD fpFolder_With_ReadWriteRights .
DATASET DECLARE !OUTFILE.
SET MXLOOPS=!NIter.

MATRIX.
GET Data / FILE * /VARIABLES !Vars.
COMPUTE N=NROW(Data).
COMPUTE Sums=T(MAKE(N,1,1))*Data.
COMPUTE S12Term=Sums(1)* Sums(2)/N .
COMPUTE SDs=SQRT((DIAG(T(Data)*Data)-DIAG(T(Sums)*Sums)/N)/(N-1)).
COMPUTE SD12= SDs(1) * SDs(2).
COMPUTE Tx=T(Data(:,1)).
COMPUTE y=Data(:,2).
COMPUTE Perm_Y=y.

LOOP #=1 TO !NIter.
+  COMPUTE g_y=GRADE(UNIFORM(N,1)).
+  LOOP ##= 1 TO N.
+    COMPUTE Y(##)=Perm_Y(g_y(##)).
+  END LOOP.
+  SAVE ((Tx*y-S12Term)/(N-1) / SD12)
        /OUTFILE !OUTFILE
        /VARIABLES Corr.
END LOOP.

END MATRIX.
RESTORE.
!ENDDEFINE.

NEW FILE.
DATASET CLOSE ALL.
!InputProgram N =100 Corr =.5  Vars x y .
!DoPermutations NIter=10000 OUTFILE=test Vars=x y .
DATASET ACTIVATE test.
DESCRIPTIVES VARIABLES Corr / STATISTICS ALL.



Art Kendall wrote

> I don't have SPSS where I am right now, but thought somebody on this list
> might have written syntax to do this.
>
> As a demo of uncertainty in, e.g., correlations, t-test, an article has
> suggested randomly sorting values of a variable many times. (This could be
> thought of as a reduced instance of parallel analysis for factor analysis.
> If I had access to SPSS right now, I would just cannibalize the syntax for
> parallel analysis.)
>
> The steps are.
> 1. compute a correlation or other stat between X and Y
> 2. loop 10,000 times
> -- 2.1 randomly sort Y leaving X in place
> -- 2.2 compute the correlation or other stat between X and Y.
> -- 2.3 put the stat in a file
> 3. end loop.
> 4. examine the distribution of the computed correlation or other stat.
> 5. report numerically and visually how the original correlation fits in
> the
> distribution.
> 6. compare (a) the SE by conventional methods of the original correlation
> with (b) the SE of the randomized correlation
>
>
>
>
>
> -----
> Art Kendall
> Social Research Consultants
> --
> Sent from: http://spssx-discussion.1045642.n5.nabble.com/
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD


Art Kendall wrote

> I don't have SPSS where I am right now, but thought somebody on this list
> might have written syntax to do this.
>
> As a demo of uncertainty in, e.g., correlations, t-test, an article has
> suggested randomly sorting values of a variable many times. (This could be
> thought of as a reduced instance of parallel analysis for factor analysis.
> If I had access to SPSS right now, I would just cannibalize the syntax for
> parallel analysis.)
>
> The steps are.
> 1. compute a correlation or other stat between X and Y
> 2. loop 10,000 times
> -- 2.1 randomly sort Y leaving X in place
> -- 2.2 compute the correlation or other stat between X and Y.
> -- 2.3 put the stat in a file
> 3. end loop.
> 4. examine the distribution of the computed correlation or other stat.
> 5. report numerically and visually how the original correlation fits in
> the
> distribution.
> 6. compare (a) the SE by conventional methods of the original correlation
> with (b) the SE of the randomized correlation
>
>
>
>
>
> -----
> Art Kendall
> Social Research Consultants
> --
> Sent from: http://spssx-discussion.1045642.n5.nabble.com/
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: syntax for correlation randomly sort Y, say, 10,000 times

Bruce Weaver
Administrator
David Marso wrote

> Why would you expect that the magnitude of the correlation would be
> preserved under a randomization of one of the two vectors?
>
> That is exactly my question.  
>
> Art, can you provide details on the article you mentioned?  
>
> Cheers,
> Bruce
>
>
> Art Kendall wrote
>> I don't have SPSS where I am right now, but thought somebody on this list
>> might have written syntax to do this.
>>
>> As a demo of uncertainty in, e.g., correlations, t-test, an article has
>> suggested randomly sorting values of a variable many times. (This could
>> be
>> thought of as a reduced instance of parallel analysis for factor
>> analysis.
>> If I had access to SPSS right now, I would just cannibalize the syntax
>> for
>> parallel analysis.)
>>
>> The steps are.
>> 1. compute a correlation or other stat between X and Y
>> 2. loop 10,000 times
>> -- 2.1 randomly sort Y leaving X in place
>> -- 2.2 compute the correlation or other stat between X and Y.
>> -- 2.3 put the stat in a file
>> 3. end loop.
>> 4. examine the distribution of the computed correlation or other stat.
>> 5. report numerically and visually how the original correlation fits in
>> the
>> distribution.
>> 6. compare (a) the SE by conventional methods of the original correlation
>> with (b) the SE of the randomized correlation
>>
>>
>>
>>
>>
>> -----
>> Art Kendall
>> Social Research Consultants
>> --
>> Sent from: http://spssx-discussion.1045642.n5.nabble.com/
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>
>> LISTSERV@.UGA
>
>>  (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>
>
> Art Kendall wrote
>> I don't have SPSS where I am right now, but thought somebody on this list
>> might have written syntax to do this.
>>
>> As a demo of uncertainty in, e.g., correlations, t-test, an article has
>> suggested randomly sorting values of a variable many times. (This could
>> be
>> thought of as a reduced instance of parallel analysis for factor
>> analysis.
>> If I had access to SPSS right now, I would just cannibalize the syntax
>> for
>> parallel analysis.)
>>
>> The steps are.
>> 1. compute a correlation or other stat between X and Y
>> 2. loop 10,000 times
>> -- 2.1 randomly sort Y leaving X in place
>> -- 2.2 compute the correlation or other stat between X and Y.
>> -- 2.3 put the stat in a file
>> 3. end loop.
>> 4. examine the distribution of the computed correlation or other stat.
>> 5. report numerically and visually how the original correlation fits in
>> the
>> distribution.
>> 6. compare (a) the SE by conventional methods of the original correlation
>> with (b) the SE of the randomized correlation
>>
>>
>>
>>
>>
>> -----
>> Art Kendall
>> Social Research Consultants
>> --
>> Sent from: http://spssx-discussion.1045642.n5.nabble.com/
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>
>> LISTSERV@.UGA
>
>>  (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>
>
>
>
>
> -----
> Please reply to the list and not to my personal email.
> Those desiring my consulting or training services please feel free to
> email me.
> ---
> "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos
> ne forte conculcent eas pedibus suis."
> Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in
> abyssum?"
> --
> Sent from: http://spssx-discussion.1045642.n5.nabble.com/
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
Reply | Threaded
Open this post in threaded view
|

Re: syntax for correlation randomly sort Y, say, 10,000 times

Art Kendall
In reply to this post by David Marso
I would not expect the correlation to be preserved.  I would expect that for
most randomizations of Y most r would be smaller.  

In the instance of a perfect correlation, all other orderings would yield
smaller r's.

The idea is to get a reference distribution with which to compare the
obtained/original r.

Since Andy's post I know know of 3 reference distributions. The goal is to
see how plausible it is that the observed r simply due to randomness.

1) the conventional p.
2) Andy's using all permutations (at some point it would yield too many
alternative variables, but would be fantastic for small N's
3) many randomizations of Y.






-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: syntax for correlation randomly sort Y, say, 10,000 times

Jon Peck
Why not bootstrap the correlations?  It's not clear to me what the point of randomly permuted reference distributions would be.

On Wed, Dec 20, 2017 at 9:57 AM, Art Kendall <[hidden email]> wrote:
I would not expect the correlation to be preserved.  I would expect that for
most randomizations of Y most r would be smaller.

In the instance of a perfect correlation, all other orderings would yield
smaller r's.

The idea is to get a reference distribution with which to compare the
obtained/original r.

Since Andy's post I know know of 3 reference distributions. The goal is to
see how plausible it is that the observed r simply due to randomness.

1) the conventional p.
2) Andy's using all permutations (at some point it would yield too many
alternative variables, but would be fantastic for small N's
3) many randomizations of Y.






-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: syntax for correlation randomly sort Y, say, 10,000 times

Rich Ulrich

"Random permutation" is one justification that Fisher used for the F-test.  Assuming

near-normal distributions, the F reproduces the p-values that would be achieved

by randomization. Art wants to show natural variability of results.


However, for an exercise that demonstrates the variability of results, I think I

would start with something other than real data. And I would plan on plotting

the results.  If you repeat the experiment with ordinary "normal" data, all the

plots will have predictable variability, with greater variance (S.E.) for smaller N.


If you take two distributions that are exponential, rather than normal, the set of

correlations has a fatter tail.  Or if you start with "normal" and a moderate number of

extreme outliers, your set of correlations will by symmetric but will better match a plot

based on smaller Ns (not much more than the count of outliers).  (Say, 90 cases with SD=1,

10 cases with SD=4. "Mixtures of distributions" like this are sometimes used for testing

robustness of proposed tests.)


--

Rich Ulrich



From: SPSSX(r) Discussion <[hidden email]> on behalf of Jon Peck <[hidden email]>
Sent: Wednesday, December 20, 2017 12:26 PM
To: [hidden email]
Subject: Re: syntax for correlation randomly sort Y, say, 10,000 times
 
Why not bootstrap the correlations?  It's not clear to me what the point of randomly permuted reference distributions would be.
...
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: syntax for correlation randomly sort Y, say, 10,000 times

Art Kendall
the article is
Grice, J. W. (2014) Observation Oriented Modeling /Comprehensive
Psychology/, 3,3.

*This approach focuses on the degree to which the expected model exactly
fits individual cases. *



IMO the article reinvents some wheels and is based on some "straw man"
caricatures of psychological research. It uses other terminology to suggest
crosstabs for pairs of items in items for Likert scales, DFA classification
phase to further look at  t-test data, and vertically displaying histograms
in the same graphs.  Some of the ideas would be useful in getting
students/clients to think of stat as more than magic tools.  

In sum, it has some ideas that would widen perspectives, but those ideas can
be implemented in many existing stat packages. But the presentation that
these ideas should replace rather than extend conventional stat/methods
teaching hits of disingenuity.

The author has the data available as SPSS files, so could have been clearer
when (s)he was discussing cases, variables, and values.

I drafted a  brief overview of the article for the other functional
consultants in Statistics Without Borders which I can send if you are
interested.

Jon's suggestion of bootstrapping is another good way to help students get a
broader understanding of correlations etc.

P.S. on another list someone asked for an example of "odd vocabulary".  I
replied the use of "Crossed Observations" for "crosstabs".  It is a good
idea when presenting crosstabs to make it explicit that one is creating a
table (matrix) of how values of one variable go together with values of
another variable.





-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: syntax for correlation randomly sort Y, say, 10,000 times

Bruce Weaver
Administrator
Thanks Art.  It appears to be open access, so anyone who is interested should
be able to view it here:

  http://journals.sagepub.com/doi/full/10.2466/05.08.IT.3.3
  http://journals.sagepub.com/doi/pdf/10.2466/05.08.IT.3.3

Cheers,
Bruce


Art Kendall wrote

> the article is
> Grice, J. W. (2014) Observation Oriented Modeling /Comprehensive
> Psychology/, 3,3.
>
> *This approach focuses on the degree to which the expected model exactly
> fits individual cases. *
>
>
>
> IMO the article reinvents some wheels and is based on some "straw man"
> caricatures of psychological research. It uses other terminology to
> suggest
> crosstabs for pairs of items in items for Likert scales, DFA
> classification
> phase to further look at  t-test data, and vertically displaying
> histograms
> in the same graphs.  Some of the ideas would be useful in getting
> students/clients to think of stat as more than magic tools.  
>
> In sum, it has some ideas that would widen perspectives, but those ideas
> can
> be implemented in many existing stat packages. But the presentation that
> these ideas should replace rather than extend conventional stat/methods
> teaching hits of disingenuity.
>
> The author has the data available as SPSS files, so could have been
> clearer
> when (s)he was discussing cases, variables, and values.
>
> I drafted a  brief overview of the article for the other functional
> consultants in Statistics Without Borders which I can send if you are
> interested.
>
> Jon's suggestion of bootstrapping is another good way to help students get
> a
> broader understanding of correlations etc.
>
> P.S. on another list someone asked for an example of "odd vocabulary".  I
> replied the use of "Crossed Observations" for "crosstabs".  It is a good
> idea when presenting crosstabs to make it explicit that one is creating a
> table (matrix) of how values of one variable go together with values of
> another variable.
>
>
>
>
>
> -----
> Art Kendall
> Social Research Consultants
> --
> Sent from: http://spssx-discussion.1045642.n5.nabble.com/
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
Reply | Threaded
Open this post in threaded view
|

Re: syntax for correlation randomly sort Y, say, 10,000 times

Art Kendall
This approach has been brought to the attention of the Functional Consultants
for Statistics Without Borders.  This note has details of techniques
suggested in the article.
At least most if not all of this can be done in SPSS.



Brief description of Observation Oriented Modeling.
This article uses the word “observation” in the sense of the value of a
measurement on a case/entity/respondent/unit of analysis. It does not refer
to methods of recording codes/variable values for behaviors, etc. It does
not refer to ‘observations’ as the term for cases/entities/respondents.
It rightfully decries that models and techniques of are thought of in a very
mechanistic way and that insufficient attention is paid to the meaning of
the variables and the questions the models represent.
It reinforces the idea that a statistical model does not necessarily
identically fit every case used in building the model.  It advocates
replacing conventional methods by examining the ‘accuracy’ with which a
model fits cases.  It emphasizes looking a the particular/concrete rather
than the general/abstract portions of a model. It advocates examining data
visually rather than with equations.
In much of psychology and other social sciences, it is customary to look at
both the statistical model and how well it fits individual cases.  It is
also customary to look at the data both numerically and visually.
Details. I have seen these approaches since at least the mid 70s.
It uses a variety of techniques to get at how “accurate” a model is.   It
emphasizes the “percent correctly classified”.  Although it does not use
these words, it is much the same thing as “flipping” the roles of an
independent variable with 2 values and a continuous dependent variable.  In
practice this is conventionally done by following a t-test with a 2 group
discriminant function analysis (DFA).  The estimation phase would calculate
predicted scores and assigned group membership for each cases. The
classification phase of the DFA would crosstab the original group membership
and the membership assigned by the DFA.
It talks about creating a reference distribution for a correlation by
randomly reordering one of the variables many times. Again, not in these
words.   This is a lot like jackknifing, and bootstrapping to enhance
understanding of the uncertainty inherent in a model.  It is also like the
parallel analysis typically done in principal component and principal factor
analysis, Only with two variables rather than many.
To look at the data for 2 groups and one variable, it suggests using
side-by-side horizontal bar graphs. The vertical axis represents the
variable. The portion of the bar representing exact fir to the hypothesis is
shaded. It calls this a multigram.  It suggests progressively coarsening
measurement by collapsing variables to see how that changes the visual
impression.
It suggests cross tabulating pairs of individual items in a summative scale.
It suggests cross tabulating a pair of continuous variables and shading
cells to see what the picture would look like IF there were perfect
correlation/fit.




-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: syntax for correlation randomly sort Y, say, 10,000 times

bdates
Just to weigh in on this, it's not the values of the correlations that need to be greater, but the predictive value arising from the correlations with Y against the predictive value arising from the correlations with YPerm. The entire idea is to examine the proportion of accurate predictive capacities of the original correlation with respect to the proportion of accurate predictive capacities based on randomization. Art's explanation below is, as he suggests, shorthand for accomplishing the same outcome. I used a dataset of my own and using Art's method below correctly reproduced 86.7% of the original treatment group membership. With OOM, I reproduced 86.3%. Art's method is a lot more straightforward, easy to develop syntax for, and seems to produce nearly the same values. When I added the option of adding clustered bar charts to the output, I was able to determine relative accuracy in reproducing each of the treatment groups, e.g., the predictive equation was more accurate in reproducing the control group than the treatment group, akin to sensitivity and specificity.

Brian
________________________________________
From: SPSSX(r) Discussion [[hidden email]] on behalf of Art Kendall [[hidden email]]
Sent: Monday, January 08, 2018 10:21 AM
To: [hidden email]
Subject: Re: syntax for correlation randomly sort Y, say, 10,000 times

This approach has been brought to the attention of the Functional Consultants
for Statistics Without Borders.  This note has details of techniques
suggested in the article.
At least most if not all of this can be done in SPSS.



Brief description of Observation Oriented Modeling.
This article uses the word “observation” in the sense of the value of a
measurement on a case/entity/respondent/unit of analysis. It does not refer
to methods of recording codes/variable values for behaviors, etc. It does
not refer to ‘observations’ as the term for cases/entities/respondents.
It rightfully decries that models and techniques of are thought of in a very
mechanistic way and that insufficient attention is paid to the meaning of
the variables and the questions the models represent.
It reinforces the idea that a statistical model does not necessarily
identically fit every case used in building the model.  It advocates
replacing conventional methods by examining the ‘accuracy’ with which a
model fits cases.  It emphasizes looking a the particular/concrete rather
than the general/abstract portions of a model. It advocates examining data
visually rather than with equations.
In much of psychology and other social sciences, it is customary to look at
both the statistical model and how well it fits individual cases.  It is
also customary to look at the data both numerically and visually.
Details. I have seen these approaches since at least the mid 70s.
It uses a variety of techniques to get at how “accurate” a model is.   It
emphasizes the “percent correctly classified”.  Although it does not use
these words, it is much the same thing as “flipping” the roles of an
independent variable with 2 values and a continuous dependent variable.  In
practice this is conventionally done by following a t-test with a 2 group
discriminant function analysis (DFA).  The estimation phase would calculate
predicted scores and assigned group membership for each cases. The
classification phase of the DFA would crosstab the original group membership
and the membership assigned by the DFA.
It talks about creating a reference distribution for a correlation by
randomly reordering one of the variables many times. Again, not in these
words.   This is a lot like jackknifing, and bootstrapping to enhance
understanding of the uncertainty inherent in a model.  It is also like the
parallel analysis typically done in principal component and principal factor
analysis, Only with two variables rather than many.
To look at the data for 2 groups and one variable, it suggests using
side-by-side horizontal bar graphs. The vertical axis represents the
variable. The portion of the bar representing exact fir to the hypothesis is
shaded. It calls this a multigram.  It suggests progressively coarsening
measurement by collapsing variables to see how that changes the visual
impression.
It suggests cross tabulating pairs of individual items in a summative scale.
It suggests cross tabulating a pair of continuous variables and shading
cells to see what the picture would look like IF there were perfect
correlation/fit.




-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD