Q. Why do GLM commands not included multicollinearity diagnostics?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Q. Why do GLM commands not included multicollinearity diagnostics?

Bruce Weaver
Administrator
Another recent thread (see link below) reminded me of a question that has
occurred to me frequently over the years:

Why do the GLM commands (GLM, UNIANOVA, CSGLM) NOT include the same
multicollinearity diagnostics that REGRESSION has?  

Taking it further, why do they not compute Fox & Monette's (1992)
Generalized Variance Inflation Factor (gVIF)?  Their article was published
in 1992--it's high time SPSS implemented gVIF, IMO!  


References

http://spssx-discussion.1045642.n5.nabble.com/Multicollinearity-Logistic-regression-td5739623.html

Fox, J., & Monette, G. (1992). Generalized Collinearity Diagnostics. Journal
of the American Statistical Association, 87(417), 178-183.
doi:10.2307/2290467





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
Reply | Threaded
Open this post in threaded view
|

Re: Q. Why do GLM commands not included multicollinearity diagnostics?

Jon Peck
(Forwarded Bruce's email to project management)

But the GVIF is straightforward to compute from the coefficient correlation matrices as outlined here

Ignoring the ridge part, the formula given can be computed using the MATRIX procedure.

However, as I apparently noted in the  link Bruce cited, it's a misnomer to talk of testing  for multicollinearity, since it is a matter of degree unless the X'X matrix is singular.

Calculating the VIF using REGRESSION with or without the sampling weight might give a good idea of whether a substantial degree of multicollinearity is present.  It also reminds me that ridge regression, while not available as a CS procedure, could be useful in addressing collinearity.  RIDGE is available in the CATREG procedure.  If using that, though, watch out for zero values in the regressors, because that procedure considers such values to be missing :-(.  Add a 1 if necessary.

On Mon, Sep 7, 2020 at 7:49 AM Bruce Weaver <[hidden email]> wrote:
Another recent thread (see link below) reminded me of a question that has
occurred to me frequently over the years:

Why do the GLM commands (GLM, UNIANOVA, CSGLM) NOT include the same
multicollinearity diagnostics that REGRESSION has? 

Taking it further, why do they not compute Fox & Monette's (1992)
Generalized Variance Inflation Factor (gVIF)?  Their article was published
in 1992--it's high time SPSS implemented gVIF, IMO! 


References

http://spssx-discussion.1045642.n5.nabble.com/Multicollinearity-Logistic-regression-td5739623.html

Fox, J., & Monette, G. (1992). Generalized Collinearity Diagnostics. Journal
of the American Statistical Association, 87(417), 178-183.
doi:10.2307/2290467





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Q. Why do GLM commands not included multicollinearity diagnostics?

Mike
In reply to this post by Bruce Weaver
I am going to speculate here but I think that most statistics programs do
not provide multicollinearity statistics for ANOVA analyses (today we
include these analyses under the heading of GLM) because of a traditional
distinction that had been made, at least in psychology and the biomedical
fields, between when one uses ANOVA analyses and when one uses
multiple regression analysis. 

If one has conducts a bona fide true experiment where one manipulates/controls
the independent variables, then one makes sure that the independent
variables are orthogonal to each other (i.e., uncorrelated).  For example,
if one uses a 2x2 design, the two variables do not represent attributes
of the subjects/participants  (e.g., sex/gender, age, etc.) and the design
is balanced -- either the cells all have the same sample sizes or they are
proportional according to some criteria that keeps the two independent
variables orthogonal. 

When the independent variables are orthogonal  to each other, the
calculations that one needs to do simplifies relative to the calculations
for multiple regression -- this is why ANOVA analyses in intro psych
stats textbooks have focused on how to calculate the appropriate Sums of
Squares (SS) for the independent variables and error term(s).  After
determining the degrees of freedom (df) for each source, Mean Squares
for each source could be easily calculated by MS = SS/df.  The
F-ratio is simply F = MS-A/MS-Error (where A is one independent variable)
and so on.

These calculations are inappropriate if the independent variables are
not orthogonal.  One now has to use multiple regression to "save"
the analysis because the design has been mucked up, so to speak.

Experimentalists would argue that these types of "true experiments"
allows one to attribute a causal role to each of the independent variables
if appropriate controls are in place (e.g., no confounding variables are
present).  The use of Randomized Control Trials (RCTs) to test the effect
of a drug relative to a placebo or a standard treatment depends upon the
orthogonality of independent variables (one example I use is A= Drug vs Placebo,
B= Psychotherapy [2 levels], and the AxB 2-way interaction).  In this context,
if there is a main effect of drug, one can legitimately say that the drug had a
causal effect (hopefully, a beneficial effect).  Similarly, one can make
a similar statement for psychotherapy but if the interaction is significant,
one doesn't focus on the main effects, instead, one focuses on how
the effect of one ind var varies as a function of the second ind var. 
Note that true experiments typically have a small number of independent
variables -- generally 4 or less (Zar's text goes up 5 IV).

If the independent variables are non-orthogonal one really shouldn't
make statements about the causal effects of the independent variables
because the non-orthogonality causes the main effect of one ind var
to be correlated with the main effect of the other ind var -- the
effects are confounded.  Experimental psychologists for most of the
20th century used this logic and relied on ANOVA analyses (as did
experimentalists in the biomedical fields which is why the BMDP
statistical package had such great ANOVA programs long before
SPSS did, especially within-subject/repeated measures designs;
SAS had good between-subject design analysis capabilities but
it wasn't until the 1980s that within-subject analyses were made
easier to specify and prevented errors of incorrect design specification).
Today, one might say that researchers are much more casual about
causality and non-orthogonality is seen as less of a problem.

In the 1960s Jack Cohen and others argued that ANOVA analyses of
designs with orthogonal independent variables are a special case of the
multiple regression analysis and showed how the traditional ANOVA
analyses could be done in regression. The first edition of his regression
text brought together the threads of this argument, though many
experimental psychologists didn't see the benefits and regression
was seen as being tainted because it was used with correlated
independent variables which undermined causal attributions.  Social
scientists using data from nonexperimental designs would beg to
differ.

So, long story short:  for traditional experimentalists using orthogonal
independent variables, there is no multicollinearity so why provide
measures of them?  If one is using non-orthogonal/correlated ind vars,
well, one is going to need to know how much non-orthogonality there is. 

In the last quarter of the 20th century it became generally accepted
to view ANOVA and regression analyses as specific instances of the
General Linear Model, and the GLM as subset of more general linear/nonlinear
analyses.  But the distinction between analyses of true experiments
versus quasi- and nonexperimental designs is maintained by having
separate programs/procedures for ANOVA and regression instead of
a single GLM program (not the one currently in SPSS).  In the structural
equation modeling (SEM) field, LISREL and MPLUS (I assume other SEM
programs as well) have in fact presented this view:  one can do everything
from t-tests to MANOVA, canonical correlation analysis, and most of
the other multivariate analyses with new capabilities being regularly added.
So why use SPSS when an SEM package can do almost everything one
wants (one can always use Excel or undergraduates to help clean and
organize the data ;-).

Okay, have I exceeded tldr?

-Mike Palij
New York University


On Mon, Sep 7, 2020 at 9:49 AM Bruce Weaver <[hidden email]> wrote:
Another recent thread (see link below) reminded me of a question that has
occurred to me frequently over the years:

Why do the GLM commands (GLM, UNIANOVA, CSGLM) NOT include the same
multicollinearity diagnostics that REGRESSION has? 

Taking it further, why do they not compute Fox & Monette's (1992)
Generalized Variance Inflation Factor (gVIF)?  Their article was published
in 1992--it's high time SPSS implemented gVIF, IMO! 


References

https://urldefense.proofpoint.com/v2/url?u=http-3A__spssx-2Ddiscussion.1045642.n5.nabble.com_Multicollinearity-2DLogistic-2Dregression-2Dtd5739623.html&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=reuGhr4sxMCzFxDlz1ti9Pxdyh4qZnf3crheHfGZ-UY&s=tLN7xiWt6CpB7od8tvntn8xMAVjpBVGu973Evf0LR3c&e=

Fox, J., & Monette, G. (1992). Generalized Collinearity Diagnostics. Journal
of the American Statistical Association, 87(417), 178-183.
doi:10.2307/2290467


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD