If X is an ordered, numeric categorical variable, then it might make sense to test for deviations from linearity. In OLS regression, this means comparing the model where X is categorical with one where X is

assumed to be linearly relayed.

PAul

Paul R. Swank, Ph.D.

Professor, Developmental Pediatrics

Director of Research, Center for Improving the Readiness of Children for Learning and Education (C.I.R.C.L.E.)

Medical School

UT Health Science Center at Houston

-----Original Message-----

From: SPSSX(r) Discussion [mailto:

[hidden email]] On Behalf Of Marta García-Granero

Sent: Friday, June 23, 2006 11:36 AM

To:

[hidden email]
Subject: Re: unable to test linearity assumption of logistic regression

Hi Benoît,

I might be wrong, but the assumption of linearity of the logit is important for quantitative/ordinal variables, not for categorical ones...

HTH

Marta

BD> I am doing a simple binary logistic regression with the following structure:

BD> logit(Y) = a + bX

BD> X is a categorical variable with 5 categories, which is entered into

BD> the model as indicator with the last category as reference.

BD> To test for linearity, I performed the box-tidwell transformation on

BD> X (=

BD> X*ln(x) ) and added this variable as a covariate (I actually use the

BD> multinomial logistic regression procedure of SPSS as a binary

BD> logistic regression to achieve more information).

BD> First Question: Is it ok to enter this new variable as covariate (=

BD> non categorical)? Or should it be entered as a Factor (= categorical).

BD> However, the results are strange. When taking a look at the

BD> Likelihood Ratio Test, SPSS tells us that removing X ( or X*ln(X) )

BD> does not increase the degrees of freedom, hence no significance is

BD> calculated and I cannot check for linearity.

BD> Second Question: How comes that removing this parameter from the

BD> model, does not increase the degrees of freedom?

BD> Third Question: How can i test for linearity ?

BD> I'm really puzzled here. Also looking at the parameter estimates it

BD> seems that no parameters are calculated for two of the five

BD> categories of X (they are set to 0). I understand that the

BD> parameters for the fifth category of X is redundant because this is

BD> the reference category. But somehow, also the fourth category becomes redundant?

BD> When I enter X as a covariate (= non-categorical variable), none of

BD> these problems occur.