Clustering is a nonparametric classificatory device for which statistical
significance is of secondary relevance. By "secondary" I mean that once you
have your clusters you may test the hypothesis that they differ (or do not
differ) in some variable of interest, and only then the number of cases
would be important, to determine the significance of the difference in order
and thus accept or reject the null hypothesis. But for forming the clusters
themselves this plays no role. Just two cases are enough to define two
clusters, although that would be a bit silly.
Besides that, the fact that your cross tabulation shows many empty cells
tends to suggests there is association between the variables, i.e. cases
tend to cluster in cells representing certain combinations of values of the
variables, and not in others. This would suggest that a cluster analysis
would be meaningful.
However, that is not the usual reason why one may want to do cluster
analysis. What is your purpose? Are you trying to build a multivariate
typology, based on the various dimensions of a complex concept, and thus
classifying your cases in various types or groups? Are you trying to
identify "odd" groups of cases with unusual combinations of values? (The
latter you may do directly in your cross table).
From your message it seems all your variables are categorical, and the
combined number of categories not too large, since they lend themselves to
be cross-tabulated in a single table, albeit somewhat large. There are
things you can do directly there, without going into other procedures. There
are also other statistical procedures that may be in order, depending on
Perhaps, therefore, you should consider alternatives to cluster analysis,
and the reasons why you may want to use or not to use it.
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Ole
Enviado el: Saturday, August 12, 2006 9:17 AM
Para: [hidden email] Asunto: cluster analysis
I´ve got a large crosstable with a lot of zero cases, some 80 to 85
percent. Is there a minimum number of cases neccessary to run two step
or clustercenter analysis ?
Good morning Keith,
I hope you are doing fine. I would like you to please help me with this
I have two surveys, one is the longer version of the other one. The goal was to reduce the legnth of the survey and use answer of Var1.....Var6 and their answer on Var7 & Var8 to identify people that look like them on the other survey and impute the value of Var7 & Var8.
Below is a king of representation of what I'm talking about.
Is there a way to use SPSS to do this type of imputation?
I tried linear regression and the parameters were not good.