In general, when you use a criterion such as AIC or BIC, you select the

model that MINIMIZES the criterion. In your reported data, AIC gets smaller

and smaller as the number of clusters increases, up to the maximum of 15

clusters.

TwoStep Cluster also calculates 2 other column that you report. These are

the "ratio of AIC changes" and the "ratio of distance measures." The ratio

of AIC changes sets the AIC change from K=1 to K=2 as 1, and then scales the

other AIC changes relative to this one. The ratio of distance measures is

the ratio of the distance measure in a given step to the distance measure in

the previous step. TwoStep looks at these measures to choose K. It is argued

that a jump in either measure between two consecutive Ks suggests a

tentative number of clusters. While this is given as a rationale for the

choice of K, some research shows that this model selection approach doesn't

always work - in particular, when there are a mix of continuous and

categorical basis variables, and when there is no underlying cluster

structure.

-----Original Message-----

From: SPSSX(r) Discussion [mailto:

[hidden email]] On Behalf Of

Allan Reese (Cefas)

Sent: Thursday, July 27, 2006 10:14 AM

To:

[hidden email]
Subject: Two-step cluster

I'm trying twostep cluster in version 13. The documentation states that it

will select "between 1 and the maximum" number of clusters using the

Information Criterion. I assume it is looking for a maximum AIC/BIC so find

this output (apologies that column headings don't line up with values)

inconsistent with it reporting two clusters for this dataset. This seems so

obvious that I'm probably overlooking the obvious.

Auto-Clustering

Number of Clusters Akaike's Information Criterion (AIC) AIC

Change(a) Ratio of AIC Changes(b) Ratio of Distance Measures(c)

1 42679.177

2 29736.437 -12942.740 1.000 3.076

3 25584.939 -4151.498 .321 1.444

4 22736.637 -2848.302 .220 1.751

5 21145.674 -1590.963 .123 1.008

6 19567.475 -1578.199 .122 1.515

7 18554.435 -1013.040 .078 1.257

8 17765.615 -788.819 .061 1.034

9 17005.491 -760.125 .059 1.151

10 16356.395 -649.095 .050 1.150

11 15802.853 -553.542 .043 1.305

12 15398.294 -404.559 .031 1.167

13 15063.487 -334.807 .026 1.095

14 14765.135 -298.352 .023 1.084

15 14496.451 -268.684 .021 1.225

a The changes are from the previous number of clusters in the table.

b The ratios of changes are relative to the change for the two cluster

solution.

c The ratios of distance measures are based on the current number of

clusters against the previous number of clusters.

If there is in fact only one cluster, this may explain why the next output

lists all the variables in their name order, though the command description

states they "will be sorted by the importance rank of each variable."

The documentation for this command is very sparse, so I'd appreciate

feedback from other users. The menu generates syntax for "twostep cluster"

followed by "aim" but I am unable to trace any documentation for AIM.

Allan

****************************************************************************

*******

This email and any attachments are intended for the named recipient only.

Its unauthorised use, distribution, disclosure, storage or copying is not

permitted. If you have received it in error, please destroy all copies and

notify the sender. In messages of a non-business nature, the views and

opinions expressed are the author's own and do not necessarily reflect those

of the organisation from which it is sent. All emails may be subject to

monitoring.

****************************************************************************

*******