# Issue with Computation Differences Between IBM Statistics 26 and MS Products

13 messages
Open this post in threaded view
|

## Issue with Computation Differences Between IBM Statistics 26 and MS Products

 SPSSX-L Digest - 10 May 2020 to 11 May 2020 (#2020-97)   I am calculating percentages in IBM Statistics 26 and finding a small number of cases where I am expecting to yield a whole number percent but am not getting this result.   For instance, 14/25*100 yields 56.00000000000001 instead of 56. I would get a value of 56 with no resulting decimal when performing the same computation in Excel or SQL. In other cases, such as 2/4 I would get 50 with no resulting decimals which matches MS products. This difference seems to only occur in a handful of cases by I’m not sure why.   I know I have come across this issue before, but can’t remember why this occurs. I believe IBM also has some documentation on this issue but can’t seem to locate it from a previous conversation with them. I also can’t seem to put in a support request with IBM at the moment.   Does anyone know the reason for this difference and if there is a workaround to yield the same result as MS products? ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Open this post in threaded view
|

## Re: Issue with Computation Differences Between IBM Statistics 26 and MS Products

 Calculations in SPSS Statistics are done in double precision floating point hardware.  That allows for about 15-16 significant figures.  The value of 14/25 is, in greater precision0.560000000000000053290705182007513940334320068359375which has to be rounded or truncated to fit in a double precision floating point number, so if rounded, it will end in 1 while if truncated it will end in 0.Statistics uses the IEEE standard for floating point arithmetic.  Excel might use truncation or it might just be doing the calculation in a different orderNote that a difference this tiny is utterly negligible except in the case where you are comparing two floating point numbers with fractional parts for exact equality.  That is generally not a good thing to do as even a difference like 14 * 100/25 vs 14/25 * 100 can produce an infinitesimal difference.On Tue, May 12, 2020 at 11:25 AM Veena Nambiar <[hidden email]> wrote:   I am calculating percentages in IBM Statistics 26 and finding a small number of cases where I am expecting to yield a whole number percent but am not getting this result.   For instance, 14/25*100 yields 56.00000000000001 instead of 56. I would get a value of 56 with no resulting decimal when performing the same computation in Excel or SQL. In other cases, such as 2/4 I would get 50 with no resulting decimals which matches MS products. This difference seems to only occur in a handful of cases by I’m not sure why.   I know I have come across this issue before, but can’t remember why this occurs. I believe IBM also has some documentation on this issue but can’t seem to locate it from a previous conversation with them. I also can’t seem to put in a support request with IBM at the moment.   Does anyone know the reason for this difference and if there is a workaround to yield the same result as MS products? ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD -- Jon K Peck[hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Open this post in threaded view
|

## Re: Issue with Computation Differences Between IBM Statistics 26 and MS Products

 Reminder of what Jon is talking about: values of 1/2, 1/4, ..., (powers of 2)   are represented precisely in binary notation of the usual computer arithmetic; whereas 1/10, 1/100, ...,    are not. You do run into less chance of a problem if you divide by 10 or 25, etc., /last/ in your computation.  For integer compares of floating point numbers to be secure (in some sense), you can test for a narrow range or round or truncate. -- Rich Ulrich From: SPSSX(r) Discussion <[hidden email]> on behalf of Jon Peck <[hidden email]> Sent: Tuesday, May 12, 2020 1:55 PM To: [hidden email] <[hidden email]> Subject: Re: Issue with Computation Differences Between IBM Statistics 26 and MS Products   Calculations in SPSS Statistics are done in double precision floating point hardware.  That allows for about 15-16 significant figures.  The value of 14/25 is, in greater precision 0.560000000000000053290705182007513940334320068359375 which has to be rounded or truncated to fit in a double precision floating point number, so if rounded, it will end in 1 while if truncated it will end in 0. Statistics uses the IEEE standard for floating point arithmetic.  Excel might use truncation or it might just be doing the calculation in a different order Note that a difference this tiny is utterly negligible except in the case where you are comparing two floating point numbers with fractional parts for exact equality.  That is generally not a good thing to do as even a difference like 14 * 100/25 vs 14/25 * 100 can produce an infinitesimal difference. On Tue, May 12, 2020 at 11:25 AM Veena Nambiar <[hidden email]> wrote:   I am calculating percentages in IBM Statistics 26 and finding a small number of cases where I am expecting to yield a whole number percent but am not getting this result.   For instance, 14/25*100 yields 56.00000000000001 instead of 56. I would get a value of 56 with no resulting decimal when performing the same computation in Excel or SQL. In other cases, such as 2/4 I would get 50 with no resulting decimals which matches MS products. This difference seems to only occur in a handful of cases by I’m not sure why.   I know I have come across this issue before, but can’t remember why this occurs. I believe IBM also has some documentation on this issue but can’t seem to locate it from a previous conversation with them. I also can’t seem to put in a support request with IBM at the moment.   Does anyone know the reason for this difference and if there is a workaround to yield the same result as MS products? ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD -- Jon K Peck [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Open this post in threaded view
|

## Re: Issue with Computation Differences Between IBM Statistics 26 and MS Products

Open this post in threaded view
|

## Re: Issue with Computation Differences Between IBM Statistics 26 and MS Products

Open this post in threaded view
|

## Re: [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences Between IBM Statistics 26 and MS Products

Open this post in threaded view
|

## Re: [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences Between IBM Statistics 26 and MS Products

 First, note that the result Statistics produces is mathematically correct as far as is possible with double precision floating point arithmetic.  My earlier post showed the value to more precision (calculated with special code outside of Statistics that is thousands of times slower).  The Statistics result is more accurate than the Excel result, but the difference is way too tiny for anyone to argue that there is a difference that matters.  At least that's my opinion.Statistics has only one numeric data type: all numbers are double precision floating point values.  The number of decimals in the format is just a display issue: the displayed values would be the same in Statistics and Excel to more decimals than  you would ever want.I'm not sure what you want to do, but if you  want to discuss further, let's take this off line ([hidden email]).On Tue, May 12, 2020 at 6:41 PM Veena Nambiar <[hidden email]> wrote: Thanks, Jon. This is helpful. For my purposes, these small differences do end up mattering since we are dual processing data and both processors need to have an exact match on our results.   I’ve applied the following code to the percentages I’m calculating to employ what they call whole number rounding as required by our agency business rules:   Compute Rate2= TRUNC(Rate1) + (Rate1 > TRUNC(Rate1)).   In the case below, the percentage (Rate2) end ups being 57 versus 56 which is what the other person I’m processing with yields in SQL.   It’s only typically a very few cases that our results are mismatched, but those differences are important for our purposes.   I’m not quite sure how to resolve the issue because it’s difficult to say what each package is doing differently – but we need to ensure a consistent, explainable outcomes for these percentages every time. I was asked several questions - will using a different data type such as float or decimal in SPSS would make a difference? Would setting the variable to ratio make a difference? Would lowering the number of decimals change anything? Can FuzzBits be used in the Trunc command? I don’t think any of these suggestions would work and I’m ensure about the FuzzBits piece and its applicability in SPSS. One suggestion was to simply truncate the rate to enforce a match which would work in that case in particular but wouldn’t for other cases – for instance 195/201*100 would yield 97.01492537313433 and truncated would yield 97 instead of 98 which is what I want and is what I yield using the code above.   If anyone has any ideas for me please let me know! Thanks so much!       From: Jon Peck [mailto:[hidden email]] Sent: Tuesday, May 12, 2020 10:56 AM To: Veena Nambiar Cc: SPSS List Subject: [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences Between IBM Statistics 26 and MS Products   Calculations in SPSS Statistics are done in double precision floating point hardware.  That allows for about 15-16 significant figures.  The value of 14/25 is, in greater precision 0.560000000000000053290705182007513940334320068359375 which has to be rounded or truncated to fit in a double precision floating point number, so if rounded, it will end in 1 while if truncated it will end in 0.   Statistics uses the IEEE standard for floating point arithmetic.  Excel might use truncation or it might just be doing the calculation in a different order   Note that a difference this tiny is utterly negligible except in the case where you are comparing two floating point numbers with fractional parts for exact equality.  That is generally not a good thing to do as even a difference like 14 * 100/25 vs 14/25 * 100 can produce an infinitesimal difference.   On Tue, May 12, 2020 at 11:25 AM Veena Nambiar <[hidden email]> wrote:   I am calculating percentages in IBM Statistics 26 and finding a small number of cases where I am expecting to yield a whole number percent but am not getting this result.   For instance, 14/25*100 yields 56.00000000000001 instead of 56. I would get a value of 56 with no resulting decimal when performing the same computation in Excel or SQL. In other cases, such as 2/4 I would get 50 with no resulting decimals which matches MS products. This difference seems to only occur in a handful of cases by I’m not sure why.   I know I have come across this issue before, but can’t remember why this occurs. I believe IBM also has some documentation on this issue but can’t seem to locate it from a previous conversation with them. I also can’t seem to put in a support request with IBM at the moment.   Does anyone know the reason for this difference and if there is a workaround to yield the same result as MS products? ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD   -- Jon K Peck [hidden email] -- Jon K Peck[hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Open this post in threaded view
|

## Re: [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences Between IBM Statistics 26 and MS Products

Open this post in threaded view
|

## Re: [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences Between IBM Statistics 26 and MS Products

Open this post in threaded view
|

## Re: [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences Between IBM Statistics 26 and MS Products

Open this post in threaded view
|