>I agree with Art Kendall opinion that "In DFA, I recommend closely examining

>the probabilities of assignment to each cluster for each case, and the

>probability that a member of a cluster would be as far away from the

>centroid as this particular case is. This is a very old but very useful aid

>in interpreting a clustering. The classification phase of DFA should provide

>insight into the reliability of the cluster assignments."

>Besides using or not using DFA for this purpose, cases far away from the

>centroid are often of doubtful usefulness. In one exercise I did with a

>large sample some time ago, I applied clustering to create a certain number

>of clusters, but there were a lot of cases of borderline membership. We

>figured a small amount of measurement error would land those cases in

>another cluster altogether.

>For certain research purpose it proved useful to divide each cluster into a

>"core" and a "periphery", the core being a relatively small area around the

>centroid. This is only useful when many cases are near the centroid, and few

>are in the no-man's land or borderline area between clusters, far away from

>the centroid.

>I do not remember all the details, but I do remember I tried several ways of

>defining the core, including the following: (1) all cases situated within

>the minimum distance from the centroid that encompassed, say, 25% of all

>cases in the cluster; (2) all cases, whichever their number or proportion as

>long as they were at least 30, located within an Euclidean distance of, say,

>one cluster-specific standard deviation from the centroid.

>The "core" of the cluster is usually quite homogeneous, and proved a very

>useful tool to define the "typical" features of the cluster, and to select

>typical cases for frequent follow-up, at least for means if not for

>variability around the mean.

>In fact, what we did was creating a "model" (a "model farm-household" in

>that experience) defined by the centroid values of all variables,

>periodically re-evaluating those values by following-up a small rotational

>sample of cases randomly selected from the core. Since the centroid was

>supposed to be defined by the mean of those variables for the entire cluster

>(core+periphery), we boldly multiplied the updated centroid means times the

>clusters' total membership to obtain updated population means and totals in

>an economical way (this was done in order to monitor rural development at

>farm/household level in a poor developing country, where large sample

>surveys cannot be carried out with the necessary frequency, and casual

>visits by extension workers are not enough).

>Hope this helps.

>Hector Maletta

-----Mensaje original-----

De: SPSSX(r) Discussion En nombre de Art

[hidden email]] En nombre de Art

Kendall

Enviado el: Monday, August 14, 2006 2:13 PM

>Para:

[hidden email]
Asunto: Re: Cluster Analysis - Seeds needed for K-Means

>It is some time since I used version 12, but the hierarchical

>clustering part has been around for since the 70's.

>If you used the SAVE specification, you should have a new variable that

>indicates for each case to which cluster it is assigned. say you called

>it Kluster3 and the variables to base the clustering on Var01 to Var12.

>

>

>to get the centroids

>(I'm not sure how you would have interpreted the cluster meanings

>without using DISCRIMINANT or means already.)

>discriminant groups= kluster3 (1,3)/ variables = var01 to var12 . . ..

>

>or

>means tables= var01 to var12 by kluster3 /cells= count means . . . .

>

>once you type the above command into a syntax window, highlight (select)

>the procedure name with you mouse and click the syntax button to see

>other possibilities for the procedure.

>

>In DFA, I recommend closely examining the probabilities of assignment to

>each cluster for each case, and the probability that a member of a

>cluster would be as far away from the centroid as this particular case

>is. This is a very old but very useful aid in interpreting a clustering.

>The classification phase of DFA should provide insight into the

>reliability of the cluster assignments.

>The GUI in SPSS is very useful for the first draft of your syntax.

>Simply exit the menus via the "paste" button. This shows you the syntax

>that will do what you specified in the menu. As you look at your

>results, and as you develop your approach you can simply edit the pasted

>syntax.

>

>To get your means into a .sav file. There are more automated ways to

>get the centroids into kmeans, but this is straightforward.

>open a new data file

>label the variables kluster3 and var01 ... var12.

>key in the centroids.

>save the file.

>You might also want to consider applying the TWOSTEP procedure.

>It will produce AIC and BIC to check on the number of clusters to retain.

>

>Art Kendall

>Social Research Consultants

>Aaron Eakman wrote:

>

>

>>I am using SPSS 12 for my clustering procedures. I started with

>>heirarchical clustering using Wards method with squared euclidean

>>distance. I have identified a three cluster solution as the best option

>>

>>from a possible range of 2-4 that I established a priori.

>

>>Here is my problem, I want to next run a K-means clustering procedure.

>>More specifically, I want to use the centroids of the three clusters from

>>my heirarchical procedure as "seed" or starting values for the K-means

>>clustering procedure. Unfortunately, SPSS does not generate this output

>>

>>from the heirarchical procedure. And I do not know 1) how to generate

>

>>cluster centroids from the cluster assignment information provided by SPSS

>>heirarchical procedure, and 2) even if I did, I do not know how

>>to generate an SPSS.sav file with that information for use by the K-means

>>approach. A further problem, I am a point and clicker and not savvy with

>>command syntax; I AM WILLING TO LEARN IF IT CAN GET ME OUT OF MY MESS!!

>>Any persons that are SPSS - Cluster Analysis savvy, or know others that

>>might lend a hand would be met with gratitude for any assistance.

>>

>>Take care,

>>

>>Aaron Eakman

>>

