 Hello, I am doing a simple binary logistic regression with the following structure: logit(Y) = a + bX X is a categorical variable with 5 categories, which is entered into the model as indicator with the last category as reference. To test for linearity, I performed the box-tidwell transformation on X (= X*ln(x) ) and added this variable as a covariate (I actually use the multinomial logistic regression procedure of SPSS as a binary logistic regression to achieve more information). First Question: Is it ok to enter this new variable as covariate (= non categorical)? Or should it be entered as a Factor (= categorical). However, the results are strange. When taking a look at the Likelihood Ratio Test, SPSS tells us that removing X ( or X*ln(X) ) does not increase the degrees of freedom, hence no significance is calculated and I cannot check for linearity. Second Question: How comes that removing this parameter from the model, does not increase the degrees of freedom? Third Question: How can i test for linearity ? I'm really puzzled here. Also looking at the parameter estimates it seems that no parameters are calculated for two of the five categories of X (they are set to 0). I understand that the parameters for the fifth category of X is redundant because this is the reference category. But somehow, also the fourth category becomes redundant? When I enter X as a covariate (= non-categorical variable), none of these problems occur. Thanks in advance
 Hi Benoît,

I might be wrong, but the assumption of linearity of the logit is important for quantitative/ordinal variables, not for categorical ones...

HTH
Marta
 If X is an ordered, numeric categorical variable, then it might make sense to test for deviations from linearity. In OLS regression, this means comparing the model where X is categorical with one where X is assumed to be linearly relayed.

Paul R. Swank, Ph.D.
Professor, Developmental Pediatrics
Director of Research, Center for Improving the Readiness of Children for Learning and Education (C.I.R.C.L.E.)
Medical School
UT Health Science Center at Houston
 In reply to this post by Beno=?ISO-8859-1?Q?=EEt?= Depaire At 02:52 AM 6/23/2006, Benoît Depaire wrote: >I am doing a simple binary logistic regression >with the following structure: > >logit(Y) = a + bX > >X is a categorical variable with 5 categories, >which is entered into the model as indicator >[variables] with the last category as reference. > >To test for linearity, I performed the >box-tidwell transformation on X (=X*ln(x) ) and >added this variable as a covariate. Curious: what does the transformation mean, if X is a categorical variable? If it's truly categorical, it would be the same variable after any RECODE, like RECODE X (1=3) (2=1) (3=5) (5=4) (4=2). but that would raise Cain with the transformation, wouldn't it? Or, what if the categories were A, B, C, D and E? >The results are strange. When taking a look at >the Likelihood Ratio Test, SPSS tells us that >removing X ( or X*ln(X) ) does not increase the >degrees of freedom, hence no significance is >calculated and I cannot check for linearity. Exactly. The transform is totally confounded with the four categorical indicators -  they're collinear. (Algebra below.) If this were a multiple regression, it would fail for multi-collinearity of the variables. Collinearity: The variables are multi-collinear if any one is a linear combination of the others. If your indicator variables for the categories are X1, X2, X3, X4 and X5; the numeric values you associate with them are x1, x2, x3, x4, and x5; and BT is the box-tidwell transformed variable. Let bt1 = box-tidwell value for x1 = x1*ln(x1), etc. Then, you have BT = bt1*X1 + bt2*X2 + bt3*X3 + bt4*X4 + bt5*X5. Or, with a constant in the model and 5 as the reference category, BT = bt5 + (bt1-bt5)*X1+(bt2-bt5)*X2+(bt3-bt5)*X3+(bt4-bt5)*X4 Collinearity.