Cutoff Values in Binary Logistic Regression

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Cutoff Values in Binary Logistic Regression

Bob Walker
Hi fellow listers,

I am conducting an analysis for a trade association, and attempting to identify the events and activities that lead to renewal or rejection in the association (a binary measure), hence I am using binary logistic regression. I have each respondent's record of activity in various association events as independent variables in my attempt to correctly classify their renewal status and build a model. My question relates to setting an appropriate cutoff value.

The default cutoff value is .5, and regardless of this value, the overall model and variables identified does not change, as expected. With the default cut value, I correct classify renewals (94%), but not rejecters (55%). The actual proportion of those not renewing in the data set is relatively low, at 20%, hence I am more interested in being able to identify activities that predict rejection then renewal.

Setting a higher cutoff value (say .7 or .8) produces better correct classification of rejection but with a slight penalty in the ability to correctly classify renewals, also expected, but this seems to be a reasonable trade-off, since momentum generally leads to a renewal. I could argue that setting a cutoff value approaching 1.0 is the most useful, since we want to identify activities that lead to rejection, but this seems to penalize the correct classification of renewals a bit too heavily.

Any thoughts or points of view on approaches to setting cutoff values with a skewed distribution like this is appreciated.

Many thanks,

Bob Walker
Surveys & Forecasts, LLC
https://www.safllc.com 

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Cutoff Values in Binary Logistic Regression

Andy W
Save the predicted probability from a regression equation and then plot it using an ROC curve. This plot then gives the % true positives on the Y axis, and the % false positives on the X axis.

It typically gives a curve -- see https://andrewpwheeler.wordpress.com/2015/03/09/roc-and-precision-recall-curves-in-spss/ -- so there are no clear cut-offs. It depends on the application what is a reasonable tradeoff in costs for false-positives vs false-negatives where the cut-off should be located. See https://andrewpwheeler.wordpress.com/2015/05/27/how-wide-to-make-the-net-in-actuarial-tools-false-positives-versus-false-negatives/ for some example discussion.
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Cutoff Values in Binary Logistic Regression

Bob Walker
Andy - much appreciated... yes, I have run several ROC curves at different cutoff values. They are helpful to a point; knowing the specific use application is probably my best guide.

Many thanks,

Bob Walker
Surveys & Forecasts, LLC
https://www.safllc.com 

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Andy W
Sent: Wednesday, August 23, 2017 9:31 AM
To: [hidden email]
Subject: Re: Cutoff Values in Binary Logistic Regression

Save the predicted probability from a regression equation and then plot it using an ROC curve. This plot then gives the % true positives on the Y axis, and the % false positives on the X axis.

It typically gives a curve -- see
https://andrewpwheeler.wordpress.com/2015/03/09/roc-and-precision-recall-curves-in-spss/
-- so there are no clear cut-offs. It depends on the application what is a reasonable tradeoff in costs for false-positives vs false-negatives where the cut-off should be located. See https://andrewpwheeler.wordpress.com/2015/05/27/how-wide-to-make-the-net-in-actuarial-tools-false-positives-versus-false-negatives/
for some example discussion.



-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Cutoff-Values-in-Binary-Logistic-Regression-tp5734730p5734733.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Cutoff Values in Binary Logistic Regression

Jon Peck
You might also want to try other classification methods such as SVM, which is available as the STATS  SVM extension command.  It allows you to specify a misclassification cost factor and finds the best model taking that into account.

On Wed, Aug 23, 2017 at 7:45 AM, Bob Walker <[hidden email]> wrote:
Andy - much appreciated... yes, I have run several ROC curves at different cutoff values. They are helpful to a point; knowing the specific use application is probably my best guide.

Many thanks,

Bob Walker
Surveys & Forecasts, LLC
https://www.safllc.com

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Andy W
Sent: Wednesday, August 23, 2017 9:31 AM
To: [hidden email]
Subject: Re: Cutoff Values in Binary Logistic Regression

Save the predicted probability from a regression equation and then plot it using an ROC curve. This plot then gives the % true positives on the Y axis, and the % false positives on the X axis.

It typically gives a curve -- see
https://andrewpwheeler.wordpress.com/2015/03/09/roc-and-precision-recall-curves-in-spss/
-- so there are no clear cut-offs. It depends on the application what is a reasonable tradeoff in costs for false-positives vs false-negatives where the cut-off should be located. See https://andrewpwheeler.wordpress.com/2015/05/27/how-wide-to-make-the-net-in-actuarial-tools-false-positives-versus-false-negatives/
for some example discussion.



-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Cutoff-Values-in-Binary-Logistic-Regression-tp5734730p5734733.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Cutoff Values in Binary Logistic Regression

Rich Ulrich
In reply to this post by Bob Walker

What is your purpose?  Getting renewals? You might want to send a cheap email reminder

to almost everyone, and save the expensive approaches for a smaller, targeted audience.


Unless you have a single well-defined purpose, the ROC curve is the "reduced form" of the

data that preserves the information you have on hand.


--

Rich Ulrich



From: SPSSX(r) Discussion <[hidden email]> on behalf of Bob Walker <[hidden email]>
Sent: Wednesday, August 23, 2017 8:30:19 AM
To: [hidden email]
Subject: Cutoff Values in Binary Logistic Regression
 
Hi fellow listers,

I am conducting an analysis for a trade association, and attempting to identify the events and activities that lead to renewal or rejection in the association (a binary measure), hence I am using binary logistic regression. I have each respondent's record of activity in various association events as independent variables in my attempt to correctly classify their renewal status and build a model. My question relates to setting an appropriate cutoff value.

The default cutoff value is .5, and regardless of this value, the overall model and variables identified does not change, as expected. With the default cut value, I correct classify renewals (94%), but not rejecters (55%). The actual proportion of those not renewing in the data set is relatively low, at 20%, hence I am more interested in being able to identify activities that predict rejection then renewal.

Setting a higher cutoff value (say .7 or .8) produces better correct classification of rejection but with a slight penalty in the ability to correctly classify renewals, also expected, but this seems to be a reasonable trade-off, since momentum generally leads to a renewal. I could argue that setting a cutoff value approaching 1.0 is the most useful, since we want to identify activities that lead to rejection, but this seems to penalize the correct classification of renewals a bit too heavily.

Any thoughts or points of view on approaches to setting cutoff values with a skewed distribution like this is appreciated.

Many thanks,

Bob Walker
Surveys & Forecasts, LLC
https://www.safllc.com

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Cutoff Values in Binary Logistic Regression

Bob Walker

Hi Rich,

 

Thanks for your input. The purpose is to (hopefully) identify variables most predictive of both renewals and resignations of association companies. The database itself is relatively small — about 200 companies. The variables contain the number of people at each company who attended various events over the past 24 months (repeated measures per year, for example, attendance at their annual conference, webinars, etc.). After some experimentation, binary logistic regression with slightly higher cutoff values (.6 or .7) does a good job of classifying these groups at > 85%. We’re also looking to develop models by member type; the regression results here are even stronger because the reasons for renewal are specific by member type.

 

You, Jon, and Andy all suggested adding the ROC analysis; I will use the AUC values to further help identify variables that the association might focus on first.

 

Many thanks,

 

Bob Walker

Surveys & Forecasts, LLC

https://www.safllc.com

 

From: Rich Ulrich [mailto:[hidden email]]
Sent: Wednesday, August 23, 2017 1:07 PM
To: [hidden email]; Bob Walker <[hidden email]>
Subject: Re: Cutoff Values in Binary Logistic Regression

 

What is your purpose?  Getting renewals? You might want to send a cheap email reminder

to almost everyone, and save the expensive approaches for a smaller, targeted audience.

 

Unless you have a single well-defined purpose, the ROC curve is the "reduced form" of the

data that preserves the information you have on hand.

 

--

Rich Ulrich

 


From: SPSSX(r) Discussion <[hidden email]> on behalf of Bob Walker <[hidden email]>
Sent: Wednesday, August 23, 2017 8:30:19 AM
To: [hidden email]
Subject: Cutoff Values in Binary Logistic Regression

 

Hi fellow listers,

I am conducting an analysis for a trade association, and attempting to identify the events and activities that lead to renewal or rejection in the association (a binary measure), hence I am using binary logistic regression. I have each respondent's record of activity in various association events as independent variables in my attempt to correctly classify their renewal status and build a model. My question relates to setting an appropriate cutoff value.

The default cutoff value is .5, and regardless of this value, the overall model and variables identified does not change, as expected. With the default cut value, I correct classify renewals (94%), but not rejecters (55%). The actual proportion of those not renewing in the data set is relatively low, at 20%, hence I am more interested in being able to identify activities that predict rejection then renewal.

Setting a higher cutoff value (say .7 or .8) produces better correct classification of rejection but with a slight penalty in the ability to correctly classify renewals, also expected, but this seems to be a reasonable trade-off, since momentum generally leads to a renewal. I could argue that setting a cutoff value approaching 1.0 is the most useful, since we want to identify activities that lead to rejection, but this seems to penalize the correct classification of renewals a bit too heavily.

Any thoughts or points of view on approaches to setting cutoff values with a skewed distribution like this is appreciated.

Many thanks,

Bob Walker
Surveys & Forecasts, LLC
https://www.safllc.com

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD