Allan Reese (Cefas)
I got the explanation from the horse's mouth of how twostep cluster determines its "optimum" number of clusters (Marija Norusis no less).  It's not pretty and there seems to be a rule-of-thumb constant (1.15) that appears from nowhere.  But it explained why some of my runs popped up 6 clusters with no obvious min/max in the IC table.  If the second best IC is less that 1.15 times the best and suggests more clusters, it takes the second.  The relevant documents are on the SPSS support website www.spss.com. (pages for TWOSTEP CLUSTER and AIM)

The method worked well on our example as a heuristic, because there were some 3500 cases that appeared homogeneous and a small number (about 150) anomalous.  Using the subcommand for outlier exclusion and running several times with the case order randomized confirmed this interpretation was stable.  Without outlier exclusion, the clustering was strongly influenced by whether an "outlier" appeared early in the case order.  Putting the cases that were marked anomalous into a hierarchical cluster analysis suggested some small tight clusters, which made sense but would not have some out of clustering 3600 by brute force.

Google found some fascinating applications of twostep clustering.  For the market researchers on the list, it suggested some 80 segments in the UK consumer market.  Can anyone tell me what "Tom Brown's schooldays" are like?


