Just to weigh in on this, it's not the values of the correlations that need to be greater, but the predictive value arising from the correlations with Y against the predictive value arising from the correlations with YPerm. The entire idea is to examine the proportion of accurate predictive capacities of the original correlation with respect to the proportion of accurate predictive capacities based on randomization. Art's explanation below is, as he suggests, shorthand for accomplishing the same outcome. I used a dataset of my own and using Art's method below correctly reproduced 86.7% of the original treatment group membership. With OOM, I reproduced 86.3%. Art's method is a lot more straightforward, easy to develop syntax for, and seems to produce nearly the same values. When I added the option of adding clustered bar charts to the output, I was able to determine relative accuracy in reproducing each of the treatment groups, e.g., the predictive equation was more accurate in reproducing the control group than the treatment group, akin to sensitivity and specificity.

Brian

________________________________________

From: SPSSX(r) Discussion [

[hidden email]] on behalf of Art Kendall [

[hidden email]]

Sent: Monday, January 08, 2018 10:21 AM

To:

[hidden email]
Subject: Re: syntax for correlation randomly sort Y, say, 10,000 times

This approach has been brought to the attention of the Functional Consultants

for Statistics Without Borders. This note has details of techniques

suggested in the article.

At least most if not all of this can be done in SPSS.

Brief description of Observation Oriented Modeling.

This article uses the word “observation” in the sense of the value of a

measurement on a case/entity/respondent/unit of analysis. It does not refer

to methods of recording codes/variable values for behaviors, etc. It does

not refer to ‘observations’ as the term for cases/entities/respondents.

It rightfully decries that models and techniques of are thought of in a very

mechanistic way and that insufficient attention is paid to the meaning of

the variables and the questions the models represent.

It reinforces the idea that a statistical model does not necessarily

identically fit every case used in building the model. It advocates

replacing conventional methods by examining the ‘accuracy’ with which a

model fits cases. It emphasizes looking a the particular/concrete rather

than the general/abstract portions of a model. It advocates examining data

visually rather than with equations.

In much of psychology and other social sciences, it is customary to look at

both the statistical model and how well it fits individual cases. It is

also customary to look at the data both numerically and visually.

Details. I have seen these approaches since at least the mid 70s.

It uses a variety of techniques to get at how “accurate” a model is. It

emphasizes the “percent correctly classified”. Although it does not use

these words, it is much the same thing as “flipping” the roles of an

independent variable with 2 values and a continuous dependent variable. In

practice this is conventionally done by following a t-test with a 2 group

discriminant function analysis (DFA). The estimation phase would calculate

predicted scores and assigned group membership for each cases. The

classification phase of the DFA would crosstab the original group membership

and the membership assigned by the DFA.

It talks about creating a reference distribution for a correlation by

randomly reordering one of the variables many times. Again, not in these

words. This is a lot like jackknifing, and bootstrapping to enhance

understanding of the uncertainty inherent in a model. It is also like the

parallel analysis typically done in principal component and principal factor

analysis, Only with two variables rather than many.

To look at the data for 2 groups and one variable, it suggests using

side-by-side horizontal bar graphs. The vertical axis represents the

variable. The portion of the bar representing exact fir to the hypothesis is

shaded. It calls this a multigram. It suggests progressively coarsening

measurement by collapsing variables to see how that changes the visual

impression.

It suggests cross tabulating pairs of individual items in a summative scale.

It suggests cross tabulating a pair of continuous variables and shading

cells to see what the picture would look like IF there were perfect

correlation/fit.

-----

Art Kendall

Social Research Consultants

--

Sent from:

http://spssx-discussion.1045642.n5.nabble.com/=====================

To manage your subscription to SPSSX-L, send a message to

[hidden email] (not to SPSSX-L), with no body text except the

command. To leave the list, send the command

SIGNOFF SPSSX-L

For a list of commands to manage subscriptions, send the command

INFO REFCARD

=====================

To manage your subscription to SPSSX-L, send a message to

[hidden email] (not to SPSSX-L), with no body text except the

command. To leave the list, send the command

SIGNOFF SPSSX-L

For a list of commands to manage subscriptions, send the command

INFO REFCARD