Clustering is a nonparametric classificatory device for which statistical

significance is of secondary relevance. By "secondary" I mean that once you

have your clusters you may test the hypothesis that they differ (or do not

differ) in some variable of interest, and only then the number of cases

would be important, to determine the significance of the difference in order

and thus accept or reject the null hypothesis. But for forming the clusters

themselves this plays no role. Just two cases are enough to define two

clusters, although that would be a bit silly.

Besides that, the fact that your cross tabulation shows many empty cells

tends to suggests there is association between the variables, i.e. cases

tend to cluster in cells representing certain combinations of values of the

variables, and not in others. This would suggest that a cluster analysis

would be meaningful.

However, that is not the usual reason why one may want to do cluster

analysis. What is your purpose? Are you trying to build a multivariate

typology, based on the various dimensions of a complex concept, and thus

classifying your cases in various types or groups? Are you trying to

identify "odd" groups of cases with unusual combinations of values? (The

latter you may do directly in your cross table).

From your message it seems all your variables are categorical, and the

combined number of categories not too large, since they lend themselves to

be cross-tabulated in a single table, albeit somewhat large. There are

things you can do directly there, without going into other procedures. There

are also other statistical procedures that may be in order, depending on

your purpose.

Perhaps, therefore, you should consider alternatives to cluster analysis,

and the reasons why you may want to use or not to use it.

Hector

-----Mensaje original-----

De: SPSSX(r) Discussion [mailto:

[hidden email]] En nombre de Ole

Rohwer

Enviado el: Saturday, August 12, 2006 9:17 AM

Para:

[hidden email]
Asunto: cluster analysis

Dear List,

I´ve got a large crosstable with a lot of zero cases, some 80 to 85

percent. Is there a minimum number of cases neccessary to run two step

or clustercenter analysis ?

Regards

Ole