problem with logistic regression - linearity to the logit

problem with logistic regression - linearity to the logit

 Hi, I am a very beginner in stat. Now working on a research which apply logistic regression. I already entered all the data into SPSS & done with coding. Now, I presumed I should start with testing the assumption? So, I tested the linearity of the logit. I created the log for each continuous IV & ran it using binary logistic, but the output showed such warning : Warning # 602 >The argument for the natural log function is less than or equal to zero. >The result has been set to the system-missing value.  The p value all came out to be 0.999 & 1.0. My missing cases were 740 out of total sample of 800! I think this should be the problem of zero cells? How to deal with it? Certainly I cannot delete the case, it is so many! My objective is to find out the existence of the management committee (SRMC) among public companies. DV: SRMC (0 or 1) IV: 1) INDDIR - continuous 2) INDCHAIR - categorical 3) BRDSIZE- continuous 4) DIRSHIP- continuous 5)MEETING- continuous 6) EXPERT- continuous 7) INSTI- continuous 8) DEBT- continuous 9) SIZE- continuous 10) BIG4 - categorical Many thanks!
Re: problem with logistic regression - linearity to the logit

 You cannot log-transform values equal to 0 or less.   Re this method you're using to test for "linearity of the logit", do you have references to support it?  See the suggestion here about how to test it:    http://www.stat.ubc.ca/~rollin/teach/643w04/lec/node54.html I.e., go ahead and run the model, then see how well it fits. HTH. -- Bruce Weaver
Re: problem with logistic regression - linearity to the logit

 One approach would be to compare the fit of the model that assumes linearity (logit(p) = b0 + b1*X1 + b2*X2 + ... + bk*Xk) against the fit of a model that does not make such an assumption (including polynomials of the predictors). Ryan
Re: problem with logistic regression - linearity to the logit

 Dear Bruce,I thought examining the linearity of logit assumption is one of the requirement for logistic? Actually it should be done before or after the fitting of model? I am too confuse on what to do!So, now I have collected the data, what is the next step?Many thanks.

Re: problem with logistic regression - linearity to the logit

 Dear Ryan,I don't really understand your method. Meaning to compare the output of logistic with another method?How about the zero cells? Just leave it like that? Anyway, I cannot just simply modify those zero cells as they are not missing value, they are just 0 value.

Re: problem with logistic regression - linearity to the logit

 It has to be done after fitting the model.  As Ryan has suggested, fit the model assuming linearity.  Then fit another model that does not assume linearity--e.g., a model with both the linear and quadratic terms.  Does the -2LL value change significantly from one model to the next?  If it does, then you have evidence against the linearity assumption. You have many variables, but let's simplify it to a single continuous predictor variable.   Model 1:  logit(p) = b0 + b1*X1  + error Model 2:  logit(p) = b0 + b1*X1  + b2*X1^2 + error Model 1 constrains the relationship between X1 and logit(p) to be linear.  Model 2 allows it to be curvilinear (with one change in direction).  If Model 2 fits better than Model 1, then you do not want to force a linear fit.  The test for improvement in fit is a chi-square test on the change in -2LL from Model 1 to Model 2, with df = the difference in the number of model parameters.   If this all sounds like Greek to you, you need to do some more background reading.  David Garson's StatNotes might be a good place to start. HTH. -- Bruce Weaver
Re: problem with logistic regression - linearity to the logit

 Another method to test the assumption of linearity in the logit is to use the Box-Tidwell transformation .  This involves adding a term of the form (X)ln(X) to the equation.  If the coefficient for this variable is statistically significant, there is evidence of nonlinearity in the relationship between logit(Y) and X. ~~~~~~~~~~~ Scott R Millis, PhD, ABPP, CStat, PStat(ASA) Professor Wayne State University School of Medicine
Re: problem with logistic regression - linearity to the logit

Are you familiar with SPSS syntax?

Ryan

Ryan
 Dear Ryan, I don't really understand your method. Meaning to compare the output of logistic with another method? How about the zero cells? Just leave it like that? Anyway, I cannot just simply modify those zero cells as they are not missing value, they are just 0 value.

Re: problem with logistic regression - linearity to the logit

In this post I provide SPSS code that simulates data that approximate the binary logistic regression equation,

logit(p) = -1.5 + 0.9*X + 0.2*X^2

At the end of the code, note that I fit two logistic regression models via the LOGISTIC REGRESSION procedure. The deviance difference test which Bruce referred to in the previous post is outputted from the LOGISTIC REGRESSION procedure. Results from the deviance difference test are located in the first row of the "Omnibus Tests of Model Coefficients" Table. Clearly the assumption of linearity does not hold, but we already knew that from the simulation code, didn't we? :)

HTH,

Ryan

--

*Generate Data.
set seed 98765432.
new file.

inp pro.

loop ID= 1 to 1000.

comp x = rv.normal(0,1).
comp b0 = -1.5.
comp b1 = 0.9.
comp b2 = 0.2.
comp eta  = b0 + b1*x + b2*x**2.
comp prob = exp(eta) / (1+ exp(eta)).

comp y = rv.bernoulli(prob).

end case.
end loop.
end file.
end inp pro.
exe.

Delete variables b0 b1 b2 eta prob.

COMPUTE x_squared = x**2.
EXECUTE.

LOGISTIC REGRESSION VARIABLES y
/METHOD=ENTER x
/METHOD=ENTER x x_squared.

 Dear Ryan, I don't really understand your method. Meaning to compare the output of logistic with another method? How about the zero cells? Just leave it like that? Anyway, I cannot just simply modify those zero cells as they are not missing value, they are just 0 value.

Re: problem with logistic regression - linearity to the logit

I don't use syntax, I use the SPSS dialog box.

Are you familiar with SPSS syntax?

Ryan

 Dear Ryan, I don't really understand your method. Meaning to compare the output of logistic with another method? How about the zero cells? Just leave it like that? Anyway, I cannot just simply modify those zero cells as they are not missing value, they are just 0 value.

Re: problem with logistic regression - linearity to the logit

 My favorite method for testing the assumption of linearity in the logit is the one proposed by Hosmer and Lemeshow on page 96 of their 1989 book. You stratify the variable in quartiles and treat the variable as categorical when entering it in the logistic model. If there is an increasing or decreasing trend in the ORs for the variable, then the assumption is met.  Yes, the other methods are more simple but I take comfort in actually seeing the linearity. Now working on a > research which apply > >> logistic regression. I already entered all the > data into SPSS & done with > >> coding. Now, I presumed I should start with > testing the assumption? > >> > >> So, I tested the linearity of the logit. I created > the log for each > >> continuous IV & ran it using binary logistic, > but the output showed such > >> warning : > >> Warning # 602 > >>>The argument for the natural log function is > less than or equal to zero. > >>>The result has been set to the system-missing > value. > >> > >> The p value all came out to be 0.999 & 1.0. My > missing cases were 740 out > >> of total sample of 800! I think this should be the > problem of zero cells? > >> How to deal with it? Certainly I cannot delete the > case, it is so many! > >> > >> My objective is to find out the existence of the > management committee > >> (SRMC) among public companies. > >> DV: SRMC (0 or 1) > >> IV: > >> 1) INDDIR - continuous > >> 2) INDCHAIR - categorical > >> 3) BRDSIZE- continuous > >> 4) DIRSHIP- continuous > >> 5)MEETING- continuous > >> 6) EXPERT- continuous > >> 7) INSTI- continuous > >> 8) DEBT- continuous > >> 9) SIZE- continuous > >> 10) BIG4 - categorical > >> > >> Many thanks! > >> > > > > You cannot log-transform values equal to 0 or less. > > > > Re this method you're using to test for "linearity of > the logit", do you > > have references to support it?  See the > suggestion here about how to test > > it: > > > >   http://www.stat.ubc.ca/~rollin/teach/643w04/lec/node54.html> > > > I.e., go ahead and run the model, then see how well it > fits. > > > > HTH. > > > > > > > > ----- > > -- > > Bruce Weaver > > [hidden email] > > http://sites.google.com/a/lakeheadu.ca/bweaver/> > > > "When all else fails, RTFM." > > > > NOTE: My Hotmail account is not monitored regularly. > > To send me an e-mail, please use the address shown > above. > > > > -- > > View this message in context: > > http://spssx-discussion.1045642.n5.nabble.com/problem-with-logistic-> > regression-linearity-to-the-logit-tp3336036p3336372.html > > Sent from the SPSSX Discussion mailing list archive at > Nabble.com. > > > > ===================== > > To manage your subscription to SPSSX-L, send a message > to > > [hidden email] > (not to SPSSX-L), with no body text except the > > command. To leave the list, send the command SIGNOFF SPSSX-L For a > > list of commands to manage subscriptions, send > the command > > INFO REFCARD > > > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except > the command. To leave the list, send the command SIGNOFF SPSSX-L For a > list of commands to manage subscriptions, send the command INFO > REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Re: problem with logistic regression - linearity to the logit

 Question from the original poster: "I thought examining the linearity of logit assumption is one of the requirement for logistic? Actually it should be done before or after the fitting of model? I am too confuse on what to do!" Notice that all of the methods that have been proposed require you to fit the model first, and then observe how well it fits (sometimes in comparison to an alternative model). HTH. If there is an increasing or decreasing trend in the ORs for the variable, then the assumption is met.  Yes, the other methods are more simple but I take comfort in actually seeing the linearity. -----Original Message----- From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of SR Millis Sent: Tuesday, January 11, 2011 10:46 AM To: SPSSX-L@LISTSERV.UGA.EDU Subject: Re: problem with logistic regression - linearity to the logit Another method to test the assumption of linearity in the logit is to use the Box-Tidwell transformation .  This involves adding a term of the form (X)ln(X) to the equation.  If the coefficient for this variable is statistically significant, there is evidence of nonlinearity in the relationship between logit(Y) and X. ~~~~~~~~~~~ Scott R Millis, PhD, ABPP, CStat, PStat(ASA) Professor Wayne State University School of Medicine Email:  aa3379@wayne.edu Email:  srmillis@yahoo.com Tel: 313-993-8085 --- On Tue, 1/11/11, R B  wrote: > From: R B > Subject: Re: problem with logistic regression - linearity to the logit > To: SPSSX-L@LISTSERV.UGA.EDU > Date: Tuesday, January 11, 2011, 8:33 AM One approach would be to > compare the fit of the model that assumes linearity (logit(p) = b0 + > b1*X1 + b2*X2 + ... + bk*Xk) against the fit of a model that does not > make such an assumption (including polynomials of the predictors). > > Ryan > > On Tue, Jan 11, 2011 at 7:11 AM, Bruce Weaver > > wrote: > > lcl23 wrote: > >> > >> Hi, I am a very beginner in stat. Now working on a > research which apply > >> logistic regression. I already entered all the > data into SPSS & done with > >> coding. Now, I presumed I should start with > testing the assumption? > >> > >> So, I tested the linearity of the logit. I created > the log for each > >> continuous IV & ran it using binary logistic, > but the output showed such > >> warning : > >> Warning # 602 > >>>The argument for the natural log function is > less than or equal to zero. > >>>The result has been set to the system-missing > value. > >> > >> The p value all came out to be 0.999 & 1.0. My > missing cases were 740 out > >> of total sample of 800! I think this should be the > problem of zero cells? > >> How to deal with it? Certainly I cannot delete the > case, it is so many! > >> > >> My objective is to find out the existence of the > management committee > >> (SRMC) among public companies. > >> DV: SRMC (0 or 1) > >> IV: > >> 1) INDDIR - continuous > >> 2) INDCHAIR - categorical > >> 3) BRDSIZE- continuous > >> 4) DIRSHIP- continuous > >> 5)MEETING- continuous > >> 6) EXPERT- continuous > >> 7) INSTI- continuous > >> 8) DEBT- continuous > >> 9) SIZE- continuous > >> 10) BIG4 - categorical > >> > >> Many thanks! > >> > > > > You cannot log-transform values equal to 0 or less. > > > > Re this method you're using to test for "linearity of > the logit", do you > > have references to support it?  See the > suggestion here about how to test > > it: > > > >   http://www.stat.ubc.ca/~rollin/teach/643w04/lec/node54.html> > > > I.e., go ahead and run the model, then see how well it > fits. > > > > HTH. > > > > > > > > ----- > > -- > > Bruce Weaver > > bweaver@lakeheadu.ca > > http://sites.google.com/a/lakeheadu.ca/bweaver/> > > > "When all else fails, RTFM." > > > > NOTE: My Hotmail account is not monitored regularly. > > To send me an e-mail, please use the address shown > above. > > > > -- > > View this message in context: > > http://spssx-discussion.1045642.n5.nabble.com/problem-with-logistic-> > regression-linearity-to-the-logit-tp3336036p3336372.html > > Sent from the SPSSX Discussion mailing list archive at > Nabble.com. > > > > ===================== > > To manage your subscription to SPSSX-L, send a message > to > > LISTSERV@LISTSERV.UGA.EDU > (not to SPSSX-L), with no body text except the > > command. To leave the list, send the command SIGNOFF SPSSX-L For a > > list of commands to manage subscriptions, send > the command > > INFO REFCARD > > > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except > the command. To leave the list, send the command SIGNOFF SPSSX-L For a > list of commands to manage subscriptions, send the command INFO > REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD -- Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/"When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
Re: problem with logistic regression - linearity to the logit

 thanks, I will look into it. You stratify the variable in quartiles and treat the variable as categorical when entering it in the logistic model. If there is an increasing or decreasing trend in the ORs for the variable, then the assumption is met.  Yes, the other methods are more simple but I take comfort in actually seeing the linearity. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of SR Millis Sent: Tuesday, January 11, 2011 10:46 AM To: [hidden email] Subject: Re: problem with logistic regression - linearity to the logit Another method to test the assumption of linearity in the logit is to use the Box-Tidwell transformation .  This involves adding a term of the form (X)ln(X) to the equation.  If the coefficient for this variable is statistically significant, there is evidence of nonlinearity in the relationship between logit(Y) and X. ~~~~~~~~~~~ Scott R Millis, PhD, ABPP, CStat, PStat(ASA) Professor Wayne State University School of Medicine Email:  [hidden email] Email:  [hidden email] Tel: 313-993-8085 --- On Tue, 1/11/11, R B <[hidden email]> wrote: > From: R B <[hidden email]> > Subject: Re: problem with logistic regression - linearity to the logit > To: [hidden email] > Date: Tuesday, January 11, 2011, 8:33 AM One approach would be to > compare the fit of the model that assumes linearity (logit(p) = b0 + > b1*X1 + b2*X2 + ... + bk*Xk) against the fit of a model that does not > make such an assumption (including polynomials of the predictors). > > Ryan > > On Tue, Jan 11, 2011 at 7:11 AM, Bruce Weaver > <[hidden email]> > wrote: > > lcl23 wrote: > >> > >> Hi, I am a very beginner in stat. Now working on a > research which apply > >> logistic regression. I already entered all the > data into SPSS & done with > >> coding. Now, I presumed I should start with > testing the assumption? > >> > >> So, I tested the linearity of the logit. I created > the log for each > >> continuous IV & ran it using binary logistic, > but the output showed such > >> warning : > >> Warning # 602 > >>>The argument for the natural log function is > less than or equal to zero. > >>>The result has been set to the system-missing > value. > >> > >> The p value all came out to be 0.999 & 1.0. My > missing cases were 740 out > >> of total sample of 800! I think this should be the > problem of zero cells? > >> How to deal with it? Certainly I cannot delete the > case, it is so many! > >> > >> My objective is to find out the existence of the > management committee > >> (SRMC) among public companies. > >> DV: SRMC (0 or 1) > >> IV: > >> 1) INDDIR - continuous > >> 2) INDCHAIR - categorical > >> 3) BRDSIZE- continuous > >> 4) DIRSHIP- continuous > >> 5)MEETING- continuous > >> 6) EXPERT- continuous > >> 7) INSTI- continuous > >> 8) DEBT- continuous > >> 9) SIZE- continuous > >> 10) BIG4 - categorical > >> > >> Many thanks! > >> > > > > You cannot log-transform values equal to 0 or less. > > > > Re this method you're using to test for "linearity of > the logit", do you > > have references to support it?  See the > suggestion here about how to test > > it: > > > >   http://www.stat.ubc.ca/~rollin/teach/643w04/lec/node54.html> > > > I.e., go ahead and run the model, then see how well it > fits. > > > > HTH. > > > > > > > > ----- > > -- > > Bruce Weaver > > [hidden email] > > http://sites.google.com/a/lakeheadu.ca/bweaver/> > > > "When all else fails, RTFM." > > > > NOTE: My Hotmail account is not monitored regularly. > > To send me an e-mail, please use the address shown > above. > > > > -- > > View this message in context: > > http://spssx-discussion.1045642.n5.nabble.com/problem-with-logistic-> > regression-linearity-to-the-logit-tp3336036p3336372.html > > Sent from the SPSSX Discussion mailing list archive at > Nabble.com. > > > > ===================== > > To manage your subscription to SPSSX-L, send a message > to > > [hidden email] > (not to SPSSX-L), with no body text except the > > command. To leave the list, send the command SIGNOFF SPSSX-L For a > > list of commands to manage subscriptions, send > the command > > INFO REFCARD > > > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except > the command. To leave the list, send the command SIGNOFF SPSSX-L For a > list of commands to manage subscriptions, send the command INFO > REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/"When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. View message @ http://spssx-discussion.1045642.n5.nabble.com/problem-with-logistic-regression-linearity-to-the-logit-tp3336036p3338669.html To unsubscribe from problem with logistic regression - linearity to the logit, click here.

Re: problem with logistic regression - linearity to the logit

In reply to this post by SR Millis-3
 yeah, I'm using this method too. But, something unusual found in the output. One continuous variable gives me this result: B=9.62, SE=7.0, Wald=1.89, Sig=0.16, Exp(B)=15203.83. So, it is not sig., thus no issue of multicollinearity right? But the Exp(B) is so large which I think something wrong here. Another interaction (X*LnX) is sig. at 0.05, thus multicollinearity exists. From what I read from the book, the only solution is to transform this continuous variable to categorical? Also, there is one interaction has Exp(B) = 0.00. Why? Thanks. Now working on a > research which apply > >> logistic regression. I already entered all the > data into SPSS & done with > >> coding. Now, I presumed I should start with > testing the assumption? > >> > >> So, I tested the linearity of the logit. I created > the log for each > >> continuous IV & ran it using binary logistic, > but the output showed such > >> warning : > >> Warning # 602 > >>>The argument for the natural log function is > less than or equal to zero. > >>>The result has been set to the system-missing > value. > >> > >> The p value all came out to be 0.999 & 1.0. My > missing cases were 740 out > >> of total sample of 800! I think this should be the > problem of zero cells? > >> How to deal with it? Certainly I cannot delete the > case, it is so many! > >> > >> My objective is to find out the existence of the > management committee > >> (SRMC) among public companies. > >> DV: SRMC (0 or 1) > >> IV: > >> 1) INDDIR - continuous > >> 2) INDCHAIR - categorical > >> 3) BRDSIZE- continuous > >> 4) DIRSHIP- continuous > >> 5)MEETING- continuous > >> 6) EXPERT- continuous > >> 7) INSTI- continuous > >> 8) DEBT- continuous > >> 9) SIZE- continuous > >> 10) BIG4 - categorical > >> > >> Many thanks! > >> > > > > You cannot log-transform values equal to 0 or less. > > > > Re this method you're using to test for "linearity of > the logit", do you > > have references to support it?  See the > suggestion here about how to test > > it: > > > >   http://www.stat.ubc.ca/~rollin/teach/643w04/lec/node54.html> > > > I.e., go ahead and run the model, then see how well it > fits. > > > > HTH. > > > > > > > > ----- > > -- > > Bruce Weaver > > [hidden email] > > http://sites.google.com/a/lakeheadu.ca/bweaver/> > > > "When all else fails, RTFM." > > > > NOTE: My Hotmail account is not monitored regularly. > > To send me an e-mail, please use the address shown > above. > > > > -- > > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/problem-with-logistic-regression-linearity-to-the-logit-tp3336036p3336372.html> > Sent from the SPSSX Discussion mailing list archive at > Nabble.com. > > > > ===================== > > To manage your subscription to SPSSX-L, send a message > to > > [hidden email] > (not to SPSSX-L), with no body text except the > > command. To leave the list, send the command > > SIGNOFF SPSSX-L > > For a list of commands to manage subscriptions, send > the command > > INFO REFCARD > > > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD View message @ http://spssx-discussion.1045642.n5.nabble.com/problem-with-logistic-regression-linearity-to-the-logit-tp3336036p3337035.html To unsubscribe from problem with logistic regression - linearity to the logit, click here.

Re: problem with logistic regression - linearity to the logit

 attached herewith the output file of the test of nonlinearity

Re: problem with logistic regression - linearity to the logit

 How many cases you have, and how many fall into each category of the outcome variable?  One rule of thumb is that in order to avoid over-fitting the model, you should have 15-20 'events' per model parameter, where 'event' is defined as the outcome category with the lower frequency.  You have 12 explanatory variables, plus interaction terms, and 25 parameters in total (including the constant).  So you would need at least 750 cases, assuming 50% Yes and 50% No on the outcome variable (25 parameters * 15 * 2).   See Mike Babyak's nice readable article for more information on over-fitting.    http://www.class.uidaho.edu/psy586/Course%20Readings/Babyak_04.pdf HTH. -- Bruce Weaver
Re: problem with logistic regression - linearity to the logit

 I have 797 valid cases. Have you seen my attached output? How to interpret the 'something wrong' there?

Re: problem with logistic regression - linearity to the logit

 How many fall into each category of the outcome (dependent) variable?
Re: problem with logistic regression - linearity to the logit

 around 85% fall under No (0), the rest are Yes (1)--- On Tue, 18/1/11, Bruce Weaver [via SPSSX Discussion] <[hidden email]> wrote:From: Bruce Weaver [via SPSSX Discussion] <[hidden email]>Subject: Re: problem with logistic regression - linearity to the logitTo: "lcl23" <[hidden email]>Date: Tuesday, 18 January, 2011, 2:52 AM How many fall into each category of the outcome (dependent) variable? lcl23 wrote: I have 797 valid cases.  Have you seen my attached output? How to interpret the 'something wrong' there? --- On Sun, 16/1/11, Bruce Weaver [via SPSSX Discussion] <[hidden email]> wrote: From: Bruce Weaver [via SPSSX Discussion] <[hidden email]> Subject: Re: problem with logistic regression - linearity to the logit To: "lcl23" <[hidden email]> Date: Sunday, 16 January, 2011, 11:08 PM         lcl23 wrote: attached herewith the output file of the test of nonlinearity How many cases you have, and how many fall into each category of the outcome variable?  One rule of thumb is that in order to avoid over-fitting the model, you should have 15-20 'events' per model parameter, where 'event' is defined as the outcome category with the lower frequency.  You have 12 explanatory variables, plus interaction terms, and 25 parameters in total (including the constant).  So you would need at least 750 cases, assuming 50% Yes and 50% No on the outcome variable (25 parameters * 15 * 2).   See Mike Babyak's nice readable article for more information on over-fitting.    http://www.class.uidaho.edu/psy586/Course%20Readings/Babyak_04.pdfHTH.                                         -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/"When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.                                                                                 View message @ http://spssx-discussion.1045642.n5.nabble.com/problem-with-logistic-regression-linearity-to-the-logit-tp3336036p3343380.html                                To unsubscribe from problem with logistic regression - linearity to the logit, click here.         -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/"When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. View message @ http://spssx-discussion.1045642.n5.nabble.com/problem-with-logistic-regression-linearity-to-the-logit-tp3336036p3344966.html To unsubscribe from problem with logistic regression - linearity to the logit, click here.

Re: problem with logistic regression - linearity to the logit

 The rule of thumb I mentioned earlier says that in order to avoid over-fitting, you should have 15-20 events per model parameter.  You have approximately 120 events (i.e., 15% of 797).  Therefore, your model should have about 8 parameters at most.  The model you posted has 25 parameters.  You either need more data, or fewer parameters in your model. HTH. lcl23 wrote around 85% fall under No (0), the rest are Yes (1) --- On Tue, 18/1/11, Bruce Weaver [via SPSSX Discussion]  wrote: From: Bruce Weaver [via SPSSX Discussion] Subject: Re: problem with logistic regression - linearity to the logit To: "lcl23" Date: Tuesday, 18 January, 2011, 2:52 AM         How many fall into each category of the outcome (dependent) variable? lcl23 wrote: I have 797 valid cases.  Have you seen my attached output? How to interpret the 'something wrong' there? --- On Sun, 16/1/11, Bruce Weaver [via SPSSX Discussion]  wrote: From: Bruce Weaver [via SPSSX Discussion] Subject: Re: problem with logistic regression - linearity to the logit To: "lcl23" Date: Sunday, 16 January, 2011, 11:08 PM         lcl23 wrote: attached herewith the output file of the test of nonlinearity How many cases you have, and how many fall into each category of the outcome variable?  One rule of thumb is that in order to avoid over-fitting the model, you should have 15-20 'events' per model parameter, where 'event' is defined as the outcome category with the lower frequency.  You have 12 explanatory variables, plus interaction terms, and 25 parameters in total (including the constant).  So you would need at least 750 cases, assuming 50% Yes and 50% No on the outcome variable (25 parameters * 15 * 2).   See Mike Babyak's nice readable article for more information on over-fitting.    http://www.class.uidaho.edu/psy586/Course%20Readings/Babyak_04.pdfHTH.                                         -- Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/"When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.                                                                                 View message @ http://spssx-discussion.1045642.n5.nabble.com/problem-with-logistic-regression-linearity-to-the-logit-tp3336036p3343380.html                                To unsubscribe from problem with logistic regression - linearity to the logit, click here.                                                 -- Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/"When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.                                                                                 View message @ http://spssx-discussion.1045642.n5.nabble.com/problem-with-logistic-regression-linearity-to-the-logit-tp3336036p3344966.html                                To unsubscribe from problem with logistic regression - linearity to the logit, click here.         -- Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/"When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
