Issue with Computation Differences Between IBM Statistics 26 and MS Products

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Issue with Computation Differences Between IBM Statistics 26 and MS Products

Veena Nambiar
SPSSX-L Digest - 10 May 2020 to 11 May 2020 (#2020-97)

 

I am calculating percentages in IBM Statistics 26 and finding a small number of cases where I am expecting to yield a whole number percent but am not getting this result.

 

For instance, 14/25*100 yields 56.00000000000001 instead of 56. I would get a value of 56 with no resulting decimal when performing the same computation in Excel or SQL. In other cases, such as 2/4 I would get 50 with no resulting decimals which matches MS products. This difference seems to only occur in a handful of cases by I’m not sure why.

 

I know I have come across this issue before, but can’t remember why this occurs. I believe IBM also has some documentation on this issue but can’t seem to locate it from a previous conversation with them. I also can’t seem to put in a support request with IBM at the moment.

 

Does anyone know the reason for this difference and if there is a workaround to yield the same result as MS products?

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Issue with Computation Differences Between IBM Statistics 26 and MS Products

Jon Peck
Calculations in SPSS Statistics are done in double precision floating point hardware.  That allows for about 15-16 significant figures.  The value of 14/25 is, in greater precision
0.560000000000000053290705182007513940334320068359375
which has to be rounded or truncated to fit in a double precision floating point number, so if rounded, it will end in 1 while if truncated it will end in 0.

Statistics uses the IEEE standard for floating point arithmetic.  Excel might use truncation or it might just be doing the calculation in a different order

Note that a difference this tiny is utterly negligible except in the case where you are comparing two floating point numbers with fractional parts for exact equality.  That is generally not a good thing to do as even a difference like 14 * 100/25 vs 14/25 * 100 can produce an infinitesimal difference.

On Tue, May 12, 2020 at 11:25 AM Veena Nambiar <[hidden email]> wrote:

 

I am calculating percentages in IBM Statistics 26 and finding a small number of cases where I am expecting to yield a whole number percent but am not getting this result.

 

For instance, 14/25*100 yields 56.00000000000001 instead of 56. I would get a value of 56 with no resulting decimal when performing the same computation in Excel or SQL. In other cases, such as 2/4 I would get 50 with no resulting decimals which matches MS products. This difference seems to only occur in a handful of cases by I’m not sure why.

 

I know I have come across this issue before, but can’t remember why this occurs. I believe IBM also has some documentation on this issue but can’t seem to locate it from a previous conversation with them. I also can’t seem to put in a support request with IBM at the moment.

 

Does anyone know the reason for this difference and if there is a workaround to yield the same result as MS products?

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Issue with Computation Differences Between IBM Statistics 26 and MS Products

Rich Ulrich
Reminder of what Jon is talking about: values of 1/2, 1/4, ...,
(powers of 2)   are represented precisely in binary notation of
the usual computer arithmetic; whereas 1/10, 1/100, ...,    are not.

You do run into less chance of a problem if you divide by 10 or 25, etc.,
/last/ in your computation.  For integer compares of floating point
numbers to be secure (in some sense), you can test for a narrow range
or round or truncate.

--
Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Jon Peck <[hidden email]>
Sent: Tuesday, May 12, 2020 1:55 PM
To: [hidden email] <[hidden email]>
Subject: Re: Issue with Computation Differences Between IBM Statistics 26 and MS Products
 
Calculations in SPSS Statistics are done in double precision floating point hardware.  That allows for about 15-16 significant figures.  The value of 14/25 is, in greater precision
0.560000000000000053290705182007513940334320068359375
which has to be rounded or truncated to fit in a double precision floating point number, so if rounded, it will end in 1 while if truncated it will end in 0.

Statistics uses the IEEE standard for floating point arithmetic.  Excel might use truncation or it might just be doing the calculation in a different order

Note that a difference this tiny is utterly negligible except in the case where you are comparing two floating point numbers with fractional parts for exact equality.  That is generally not a good thing to do as even a difference like 14 * 100/25 vs 14/25 * 100 can produce an infinitesimal difference.

On Tue, May 12, 2020 at 11:25 AM Veena Nambiar <[hidden email]> wrote:

 

I am calculating percentages in IBM Statistics 26 and finding a small number of cases where I am expecting to yield a whole number percent but am not getting this result.

 

For instance, 14/25*100 yields 56.00000000000001 instead of 56. I would get a value of 56 with no resulting decimal when performing the same computation in Excel or SQL. In other cases, such as 2/4 I would get 50 with no resulting decimals which matches MS products. This difference seems to only occur in a handful of cases by I’m not sure why.

 

I know I have come across this issue before, but can’t remember why this occurs. I believe IBM also has some documentation on this issue but can’t seem to locate it from a previous conversation with them. I also can’t seem to put in a support request with IBM at the moment.

 

Does anyone know the reason for this difference and if there is a workaround to yield the same result as MS products?

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Issue with Computation Differences Between IBM Statistics 26 and MS Products

Bruce Weaver
Administrator
Rich's comments prompted me to try this:

NEW FILE.
DATASET CLOSE ALL.
DATA LIST FREE / junk(F1).
BEGIN DATA
1 1 1
END DATA.

COMPUTE x1 = 14/25*100.
COMPUTE x2 = 100*14/25.
COMPUTE flag1 = x1 EQ 56.
COMPUTE flag2 = x2 EQ 56.
FORMATS flag1 flag2 (F1).
LIST.

Here is the output:

junk       x1       x2 flag1 flag2
 
  1     56.00    56.00   0     1
  1     56.00    56.00   0     1
  1     56.00    56.00   0     1




Rich Ulrich wrote

> Reminder of what Jon is talking about: values of 1/2, 1/4, ...,
> (powers of 2)   are represented precisely in binary notation of
> the usual computer arithmetic; whereas 1/10, 1/100, ...,    are not.
>
> You do run into less chance of a problem if you divide by 10 or 25, etc.,
> /last/ in your computation.  For integer compares of floating point
> numbers to be secure (in some sense), you can test for a narrow range
> or round or truncate.
>
> --
> Rich Ulrich
>
> ________________________________
> From: SPSSX(r) Discussion &lt;

> SPSSX-L@.UGA

> &gt; on behalf of Jon Peck &lt;

> jkpeck@

> &gt;
> Sent: Tuesday, May 12, 2020 1:55 PM
> To:

> SPSSX-L@.UGA

>  &lt;

> SPSSX-L@.UGA

> &gt;
> Subject: Re: Issue with Computation Differences Between IBM Statistics 26
> and MS Products
>
> Calculations in SPSS Statistics are done in double precision floating
> point hardware.  That allows for about 15-16 significant figures.  The
> value of 14/25 is, in greater precision
> 0.560000000000000053290705182007513940334320068359375
> which has to be rounded or truncated to fit in a double precision floating
> point number, so if rounded, it will end in 1 while if truncated it will
> end in 0.
>
> Statistics uses the IEEE standard for floating point arithmetic.  Excel
> might use truncation or it might just be doing the calculation in a
> different order
>
> Note that a difference this tiny is utterly negligible except in the case
> where you are comparing two floating point numbers with fractional parts
> for exact equality.  That is generally not a good thing to do as even a
> difference like 14 * 100/25 vs 14/25 * 100 can produce an infinitesimal
> difference.
>
> On Tue, May 12, 2020 at 11:25 AM Veena Nambiar &lt;

> VNambiar@.ca

> &lt;mailto:

> VNambiar@.ca

> &gt;> wrote:
>
>
>
> I am calculating percentages in IBM Statistics 26 and finding a small
> number of cases where I am expecting to yield a whole number percent but
> am not getting this result.
>
>
>
> For instance, 14/25*100 yields 56.00000000000001 instead of 56. I would
> get a value of 56 with no resulting decimal when performing the same
> computation in Excel or SQL. In other cases, such as 2/4 I would get 50
> with no resulting decimals which matches MS products. This difference
> seems to only occur in a handful of cases by I’m not sure why.
>
>
>
> I know I have come across this issue before, but can’t remember why this
> occurs. I believe IBM also has some documentation on this issue but can’t
> seem to locate it from a previous conversation with them. I also can’t
> seem to put in a support request with IBM at the moment.
>
>
>
> Does anyone know the reason for this difference and if there is a
> workaround to yield the same result as MS products?
>
> ===================== To manage your subscription to SPSSX-L, send a
> message to

> LISTSERV@.UGA

> &lt;mailto:

> LISTSERV@.UGA

> &gt; (not to SPSSX-L), with no body text except the command. To leave the
> list, send the command SIGNOFF SPSSX-L For a list of commands to manage
> subscriptions, send the command INFO REFCARD
>
>
> --
> Jon K Peck

> jkpeck@

> &lt;mailto:

> jkpeck@

> &gt;
>
> ===================== To manage your subscription to SPSSX-L, send a
> message to

> LISTSERV@.UGA

> &lt;mailto:

> LISTSERV@.UGA

> &gt; (not to SPSSX-L), with no body text except the command. To leave the
> list, send the command SIGNOFF SPSSX-L For a list of commands to manage
> subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
Reply | Threaded
Open this post in threaded view
|

Re: Issue with Computation Differences Between IBM Statistics 26 and MS Products

Jon Peck
If you  do it with the syntax below, you will see, as I said in my earlier post, that x1 and x2 are not quite equal, and x1 is the rounded value of the higher precision value.  But the difference between x1 and x2 is infinitesimal.


COMPUTE x1 = 14/25*100.
COMPUTE x2 = 100*14/25.
COMPUTE flag1 = x1 EQ 56.
COMPUTE flag2 = x2 EQ 56.
FORMATS flag1 flag2 (F1).
format x1 x2 (f30.20).
compute equal = x1 eq x2.
LIST.

On Tue, May 12, 2020 at 2:11 PM Bruce Weaver <[hidden email]> wrote:
Rich's comments prompted me to try this:

NEW FILE.
DATASET CLOSE ALL.
DATA LIST FREE / junk(F1).
BEGIN DATA
1 1 1
END DATA.

COMPUTE x1 = 14/25*100.
COMPUTE x2 = 100*14/25.
COMPUTE flag1 = x1 EQ 56.
COMPUTE flag2 = x2 EQ 56.
FORMATS flag1 flag2 (F1).
LIST.

Here is the output:

junk       x1       x2 flag1 flag2

  1     56.00    56.00   0     1
  1     56.00    56.00   0     1
  1     56.00    56.00   0     1




Rich Ulrich wrote
> Reminder of what Jon is talking about: values of 1/2, 1/4, ...,
> (powers of 2)   are represented precisely in binary notation of
> the usual computer arithmetic; whereas 1/10, 1/100, ...,    are not.
>
> You do run into less chance of a problem if you divide by 10 or 25, etc.,
> /last/ in your computation.  For integer compares of floating point
> numbers to be secure (in some sense), you can test for a narrow range
> or round or truncate.
>
> --
> Rich Ulrich
>
> ________________________________
> From: SPSSX(r) Discussion &lt;

> SPSSX-L@.UGA

> &gt; on behalf of Jon Peck &lt;

> jkpeck@

> &gt;
> Sent: Tuesday, May 12, 2020 1:55 PM
> To:

> SPSSX-L@.UGA

>  &lt;

> SPSSX-L@.UGA

> &gt;
> Subject: Re: Issue with Computation Differences Between IBM Statistics 26
> and MS Products
>
> Calculations in SPSS Statistics are done in double precision floating
> point hardware.  That allows for about 15-16 significant figures.  The
> value of 14/25 is, in greater precision
> 0.560000000000000053290705182007513940334320068359375
> which has to be rounded or truncated to fit in a double precision floating
> point number, so if rounded, it will end in 1 while if truncated it will
> end in 0.
>
> Statistics uses the IEEE standard for floating point arithmetic.  Excel
> might use truncation or it might just be doing the calculation in a
> different order
>
> Note that a difference this tiny is utterly negligible except in the case
> where you are comparing two floating point numbers with fractional parts
> for exact equality.  That is generally not a good thing to do as even a
> difference like 14 * 100/25 vs 14/25 * 100 can produce an infinitesimal
> difference.
>
> On Tue, May 12, 2020 at 11:25 AM Veena Nambiar &lt;

> VNambiar@.ca

> &lt;mailto:

> VNambiar@.ca

> &gt;> wrote:
>
>
>
> I am calculating percentages in IBM Statistics 26 and finding a small
> number of cases where I am expecting to yield a whole number percent but
> am not getting this result.
>
>
>
> For instance, 14/25*100 yields 56.00000000000001 instead of 56. I would
> get a value of 56 with no resulting decimal when performing the same
> computation in Excel or SQL. In other cases, such as 2/4 I would get 50
> with no resulting decimals which matches MS products. This difference
> seems to only occur in a handful of cases by I’m not sure why.
>
>
>
> I know I have come across this issue before, but can’t remember why this
> occurs. I believe IBM also has some documentation on this issue but can’t
> seem to locate it from a previous conversation with them. I also can’t
> seem to put in a support request with IBM at the moment.
>
>
>
> Does anyone know the reason for this difference and if there is a
> workaround to yield the same result as MS products?
>
> ===================== To manage your subscription to SPSSX-L, send a
> message to

> LISTSERV@.UGA

> &lt;mailto:

> LISTSERV@.UGA

> &gt; (not to SPSSX-L), with no body text except the command. To leave the
> list, send the command SIGNOFF SPSSX-L For a list of commands to manage
> subscriptions, send the command INFO REFCARD
>
>
> --
> Jon K Peck

> jkpeck@

> &lt;mailto:

> jkpeck@

> &gt;
>
> ===================== To manage your subscription to SPSSX-L, send a
> message to

> LISTSERV@.UGA

> &lt;mailto:

> LISTSERV@.UGA

> &gt; (not to SPSSX-L), with no body text except the command. To leave the
> list, send the command SIGNOFF SPSSX-L For a list of commands to manage
> subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences Between IBM Statistics 26 and MS Products

Veena Nambiar
In reply to this post by Jon Peck

Thanks, Jon. This is helpful. For my purposes, these small differences do end up mattering since we are dual processing data and both processors need to have an exact match on our results.

 

I’ve applied the following code to the percentages I’m calculating to employ what they call whole number rounding as required by our agency business rules:

 

Compute Rate2= TRUNC(Rate1) + (Rate1 > TRUNC(Rate1)).

 

In the case below, the percentage (Rate2) end ups being 57 versus 56 which is what the other person I’m processing with yields in SQL.

 

It’s only typically a very few cases that our results are mismatched, but those differences are important for our purposes.

 

I’m not quite sure how to resolve the issue because it’s difficult to say what each package is doing differently – but we need to ensure a consistent, explainable outcomes for these percentages every time. I was asked several questions - will using a different data type such as float or decimal in SPSS would make a difference? Would setting the variable to ratio make a difference? Would lowering the number of decimals change anything? Can FuzzBits be used in the Trunc command? I don’t think any of these suggestions would work and I’m ensure about the FuzzBits piece and its applicability in SPSS. One suggestion was to simply truncate the rate to enforce a match which would work in that case in particular but wouldn’t for other cases – for instance 195/201*100 would yield 97.01492537313433 and truncated would yield 97 instead of 98 which is what I want and is what I yield using the code above.

 

If anyone has any ideas for me please let me know! Thanks so much!

 

 

 

From: Jon Peck [mailto:[hidden email]]
Sent: Tuesday, May 12, 2020 10:56 AM
To: Veena Nambiar
Cc: SPSS List
Subject: [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences Between IBM Statistics 26 and MS Products

 

Calculations in SPSS Statistics are done in double precision floating point hardware.  That allows for about 15-16 significant figures.  The value of 14/25 is, in greater precision

0.560000000000000053290705182007513940334320068359375

which has to be rounded or truncated to fit in a double precision floating point number, so if rounded, it will end in 1 while if truncated it will end in 0.

 

Statistics uses the IEEE standard for floating point arithmetic.  Excel might use truncation or it might just be doing the calculation in a different order

 

Note that a difference this tiny is utterly negligible except in the case where you are comparing two floating point numbers with fractional parts for exact equality.  That is generally not a good thing to do as even a difference like 14 * 100/25 vs 14/25 * 100 can produce an infinitesimal difference.

 

On Tue, May 12, 2020 at 11:25 AM Veena Nambiar <[hidden email]> wrote:

 

I am calculating percentages in IBM Statistics 26 and finding a small number of cases where I am expecting to yield a whole number percent but am not getting this result.

 

For instance, 14/25*100 yields 56.00000000000001 instead of 56. I would get a value of 56 with no resulting decimal when performing the same computation in Excel or SQL. In other cases, such as 2/4 I would get 50 with no resulting decimals which matches MS products. This difference seems to only occur in a handful of cases by I’m not sure why.

 

I know I have come across this issue before, but can’t remember why this occurs. I believe IBM also has some documentation on this issue but can’t seem to locate it from a previous conversation with them. I also can’t seem to put in a support request with IBM at the moment.

 

Does anyone know the reason for this difference and if there is a workaround to yield the same result as MS products?

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


 

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences Between IBM Statistics 26 and MS Products

Jon Peck
First, note that the result Statistics produces is mathematically correct as far as is possible with double precision floating point arithmetic.  My earlier post showed the value to more precision (calculated with special code outside of Statistics that is thousands of times slower).  The Statistics result is more accurate than the Excel result, but the difference is way too tiny for anyone to argue that there is a difference that matters.  At least that's my opinion.

Statistics has only one numeric data type: all numbers are double precision floating point values.  The number of decimals in the format is just a display issue: the displayed values would be the same in Statistics and Excel to more decimals than  you would ever want.

I'm not sure what you want to do, but if you  want to discuss further, let's take this off line ([hidden email]).





On Tue, May 12, 2020 at 6:41 PM Veena Nambiar <[hidden email]> wrote:

Thanks, Jon. This is helpful. For my purposes, these small differences do end up mattering since we are dual processing data and both processors need to have an exact match on our results.

 

I’ve applied the following code to the percentages I’m calculating to employ what they call whole number rounding as required by our agency business rules:

 

Compute Rate2= TRUNC(Rate1) + (Rate1 > TRUNC(Rate1)).

 

In the case below, the percentage (Rate2) end ups being 57 versus 56 which is what the other person I’m processing with yields in SQL.

 

It’s only typically a very few cases that our results are mismatched, but those differences are important for our purposes.

 

I’m not quite sure how to resolve the issue because it’s difficult to say what each package is doing differently – but we need to ensure a consistent, explainable outcomes for these percentages every time. I was asked several questions - will using a different data type such as float or decimal in SPSS would make a difference? Would setting the variable to ratio make a difference? Would lowering the number of decimals change anything? Can FuzzBits be used in the Trunc command? I don’t think any of these suggestions would work and I’m ensure about the FuzzBits piece and its applicability in SPSS. One suggestion was to simply truncate the rate to enforce a match which would work in that case in particular but wouldn’t for other cases – for instance 195/201*100 would yield 97.01492537313433 and truncated would yield 97 instead of 98 which is what I want and is what I yield using the code above.

 

If anyone has any ideas for me please let me know! Thanks so much!

 

 

 

From: Jon Peck [mailto:[hidden email]]
Sent: Tuesday, May 12, 2020 10:56 AM
To: Veena Nambiar
Cc: SPSS List
Subject: [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences Between IBM Statistics 26 and MS Products

 

Calculations in SPSS Statistics are done in double precision floating point hardware.  That allows for about 15-16 significant figures.  The value of 14/25 is, in greater precision

0.560000000000000053290705182007513940334320068359375

which has to be rounded or truncated to fit in a double precision floating point number, so if rounded, it will end in 1 while if truncated it will end in 0.

 

Statistics uses the IEEE standard for floating point arithmetic.  Excel might use truncation or it might just be doing the calculation in a different order

 

Note that a difference this tiny is utterly negligible except in the case where you are comparing two floating point numbers with fractional parts for exact equality.  That is generally not a good thing to do as even a difference like 14 * 100/25 vs 14/25 * 100 can produce an infinitesimal difference.

 

On Tue, May 12, 2020 at 11:25 AM Veena Nambiar <[hidden email]> wrote:

 

I am calculating percentages in IBM Statistics 26 and finding a small number of cases where I am expecting to yield a whole number percent but am not getting this result.

 

For instance, 14/25*100 yields 56.00000000000001 instead of 56. I would get a value of 56 with no resulting decimal when performing the same computation in Excel or SQL. In other cases, such as 2/4 I would get 50 with no resulting decimals which matches MS products. This difference seems to only occur in a handful of cases by I’m not sure why.

 

I know I have come across this issue before, but can’t remember why this occurs. I believe IBM also has some documentation on this issue but can’t seem to locate it from a previous conversation with them. I also can’t seem to put in a support request with IBM at the moment.

 

Does anyone know the reason for this difference and if there is a workaround to yield the same result as MS products?

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


 

--

Jon K Peck
[hidden email]



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences Between IBM Statistics 26 and MS Products

Bruce Weaver
Administrator
The comparison of SPSS and Excel made me curious about what would happen
using Stata (another package I have).  First, here is my revised SPSS code
and output, with Jon's addition of the variable "equal".  (I changed F30.20
to F24.30, but I don't think that will change anything important.)  

COMPUTE x1 = 14/25*100.
COMPUTE x2 = 100*14/25.
COMPUTE flag1 = x1 EQ 56.
COMPUTE flag2 = x2 EQ 56.
COMPUTE equal = x1 EQ x2.
FORMATS flag1 flag2 equal (F1) / x1 x2 (F24.20).
LIST x1 to equal.

                      x1                       x2 flag1 flag2 equal
 
     56.0000000000000100      56.0000000000000000   0     1     0
     56.0000000000000100      56.0000000000000000   0     1     0
     56.0000000000000100      56.0000000000000000   0     1     0


Now using Stata.  One difference is that Stata has two IEEE 754-2008
floating-point types: float and double.  "float" variables (the default for
new numeric variables) are stored in 4 bytes, whereas "double" variables are
stored in 8 bytes.  (Source:
https://blog.stata.com/2012/04/02/the-penultimate-guide-to-precision/)

* Example 1 using 'float' variables (the default)
clear
set obs 3
generate x1 = 14/25*100
generate x2 = 100*14/25
generate byte flag1 = x1 == 56
generate byte flag2 = x2 == 56
generate byte equal = x1 == x2
format x1 x2 %24.20f
list, clean noobs

OUTPUT for Example 1:
. list, clean noobs

                         x1                        x2   flag1   flag2  
equal  
    56.00000000000000000000   56.00000000000000000000       1       1      
1  
    56.00000000000000000000   56.00000000000000000000       1       1      
1  
    56.00000000000000000000   56.00000000000000000000       1       1      
1  


* Example 2 using 'double' variables
clear
set obs 3
generate double x1 = 14/25*100
generate double x2 = 100*14/25
generate byte flag1 = x1 == 56
generate byte flag2 = x2 == 56
generate byte equal = x1 == x2
format x1 x2 %24.20f
list, clean noobs

Output for Example 2:


. list, clean noobs

                         x1                        x2   flag1   flag2  
equal  
    56.00000000000000710543   56.00000000000000710543       0       0      
1  
    56.00000000000000710543   56.00000000000000710543       0       0      
1  
    56.00000000000000710543   56.00000000000000710543       0       0      
1


Unlike SPSS, Stata is not affected by the order of operations:  14/25*100
returns the same value as 100*14/25, regardless of which numeric type I use
(float or double), so long as I use the same type for both x1 and x2.

The second thing that caught my eye is the differences between SPSS and
Stata for the values that do have some non-zero decimals showing.  

Stata:  56.00000000000000710543
SPSS:  56.0000000000000100

I think this note explains the difference:

 
https://www.ibm.com/support/pages/spss-does-not-display-more-16-digits-dataview

For anyone who wants more info about precision in Stata, this looks good:

  https://www.stata-journal.com/sjpdf.html?articlenum=pr0025

HTH.



Jon Peck wrote

> First, note that the result Statistics produces is mathematically correct
> as far as is possible with double precision floating point arithmetic.  My
> earlier post showed the value to more precision (calculated with special
> code outside of Statistics that is thousands of times slower).  The
> Statistics result is more accurate than the Excel result, but the
> difference is way too tiny for anyone to argue that there is a difference
> that matters.  At least that's my opinion.
>
> Statistics has only one numeric data type: all numbers are double
> precision
> floating point values.  The number of decimals in the format is just a
> display issue: the displayed values would be the same in Statistics and
> Excel to more decimals than  you would ever want.
>
> I'm not sure what you want to do, but if you  want to discuss further,
> let's take this off line (

> jkpeck@

> ).
>
>
>
>
>
> On Tue, May 12, 2020 at 6:41 PM Veena Nambiar &lt;

> VNambiar@.ca

> &gt; wrote:
>
>> Thanks, Jon. This is helpful. For my purposes, these small differences do
>> end up mattering since we are dual processing data and both processors
>> need
>> to have an exact match on our results.
>>
>>
>>
>> I’ve applied the following code to the percentages I’m calculating to
>> employ what they call whole number rounding as required by our agency
>> business rules:
>>
>>
>>
>> Compute Rate2= TRUNC(Rate1) + (Rate1 > TRUNC(Rate1)).
>>
>>
>>
>> In the case below, the percentage (Rate2) end ups being 57 versus 56
>> which
>> is what the other person I’m processing with yields in SQL.
>>
>>
>>
>> It’s only typically a very few cases that our results are mismatched, but
>> those differences are important for our purposes.
>>
>>
>>
>> I’m not quite sure how to resolve the issue because it’s difficult to say
>> what each package is doing differently – but we need to ensure a
>> consistent, explainable outcomes for these percentages every time. I was
>> asked several questions - will using a different data type such as float
>> or
>> decimal in SPSS would make a difference? Would setting the variable to
>> ratio make a difference? Would lowering the number of decimals change
>> anything? Can FuzzBits be used in the Trunc command? I don’t think any of
>> these suggestions would work and I’m ensure about the FuzzBits piece and
>> its applicability in SPSS. One suggestion was to simply truncate the rate
>> to enforce a match which would work in that case in particular but
>> wouldn’t
>> for other cases – for instance 195/201*100 would yield 97.01492537313433
>> and truncated would yield 97 instead of 98 which is what I want and is
>> what
>> I yield using the code above.
>>
>>
>>
>> If anyone has any ideas for me please let me know! Thanks so much!
>>
>>
>>
>>
>>
>>
>>
>> *From:* Jon Peck [mailto:

> jkpeck@

> ]
>> *Sent:* Tuesday, May 12, 2020 10:56 AM
>> *To:* Veena Nambiar
>> *Cc:* SPSS List
>> *Subject:* [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences
>> Between IBM Statistics 26 and MS Products
>>
>>
>>
>> Calculations in SPSS Statistics are done in double precision floating
>> point hardware.  That allows for about 15-16 significant figures.  The
>> value of 14/25 is, in greater precision
>>
>> 0.560000000000000053290705182007513940334320068359375
>>
>> which has to be rounded or truncated to fit in a double precision
>> floating
>> point number, so if rounded, it will end in 1 while if truncated it will
>> end in 0.
>>
>>
>>
>> Statistics uses the IEEE standard for floating point arithmetic.  Excel
>> might use truncation or it might just be doing the calculation in a
>> different order
>>
>>
>>
>> Note that a difference this tiny is utterly negligible except in the case
>> where you are comparing two floating point numbers with fractional parts
>> for exact equality.  That is generally not a good thing to do as even a
>> difference like 14 * 100/25 vs 14/25 * 100 can produce an infinitesimal
>> difference.
>>
>>
>>
>> On Tue, May 12, 2020 at 11:25 AM Veena Nambiar &lt;

> VNambiar@.ca

> &gt;
>> wrote:
>>
>>
>>
>> I am calculating percentages in IBM Statistics 26 and finding a small
>> number of cases where I am expecting to yield a whole number percent but
>> am
>> not getting this result.
>>
>>
>>
>> For instance, 14/25*100 yields 56.00000000000001 instead of 56. I would
>> get a value of 56 with no resulting decimal when performing the same
>> computation in Excel or SQL. In other cases, such as 2/4 I would get 50
>> with no resulting decimals which matches MS products. This difference
>> seems
>> to only occur in a handful of cases by I’m not sure why.
>>
>>
>>
>> I know I have come across this issue before, but can’t remember why this
>> occurs. I believe IBM also has some documentation on this issue but can’t
>> seem to locate it from a previous conversation with them. I also can’t
>> seem
>> to put in a support request with IBM at the moment.
>>
>>
>>
>> Does anyone know the reason for this difference and if there is a
>> workaround to yield the same result as MS products?
>>
>> ===================== To manage your subscription to SPSSX-L, send a
>> message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text
>> except the command. To leave the list, send the command SIGNOFF SPSSX-L
>> For
>> a list of commands to manage subscriptions, send the command INFO REFCARD
>>
>>
>>
>>
>> --
>>
>> Jon K Peck
>>

> jkpeck@

>>
>
>
> --
> Jon K Peck

> jkpeck@

>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences Between IBM Statistics 26 and MS Products

Jon Peck
I haven't read the article Bruce cites, but strictly speaking, I think the Stata result is wrong.  As I said in my previous post, the correct value to the number of digits displayed is
56.0000000000000053290705182007513940334320068359375  
not
56.00000000000000710543  

And, since a double has only 53 bits for the precision, numbers should not be shown with more digits than that can hold.  But I'm not going to lose any sleep over this.

On Wed, May 13, 2020 at 8:23 AM Bruce Weaver <[hidden email]> wrote:
The comparison of SPSS and Excel made me curious about what would happen
using Stata (another package I have).  First, here is my revised SPSS code
and output, with Jon's addition of the variable "equal".  (I changed F30.20
to F24.30, but I don't think that will change anything important.) 

COMPUTE x1 = 14/25*100.
COMPUTE x2 = 100*14/25.
COMPUTE flag1 = x1 EQ 56.
COMPUTE flag2 = x2 EQ 56.
COMPUTE equal = x1 EQ x2.
FORMATS flag1 flag2 equal (F1) / x1 x2 (F24.20).
LIST x1 to equal.

                      x1                       x2 flag1 flag2 equal

     56.0000000000000100      56.0000000000000000   0     1     0
     56.0000000000000100      56.0000000000000000   0     1     0
     56.0000000000000100      56.0000000000000000   0     1     0


Now using Stata.  One difference is that Stata has two IEEE 754-2008
floating-point types: float and double.  "float" variables (the default for
new numeric variables) are stored in 4 bytes, whereas "double" variables are
stored in 8 bytes.  (Source:
https://blog.stata.com/2012/04/02/the-penultimate-guide-to-precision/)

* Example 1 using 'float' variables (the default)
clear
set obs 3
generate x1 = 14/25*100
generate x2 = 100*14/25
generate byte flag1 = x1 == 56
generate byte flag2 = x2 == 56
generate byte equal = x1 == x2
format x1 x2 %24.20f
list, clean noobs

OUTPUT for Example 1:
. list, clean noobs

                         x1                        x2   flag1   flag2 
equal 
    56.00000000000000000000   56.00000000000000000000       1       1     

    56.00000000000000000000   56.00000000000000000000       1       1     

    56.00000000000000000000   56.00000000000000000000       1       1     



* Example 2 using 'double' variables
clear
set obs 3
generate double x1 = 14/25*100
generate double x2 = 100*14/25
generate byte flag1 = x1 == 56
generate byte flag2 = x2 == 56
generate byte equal = x1 == x2
format x1 x2 %24.20f
list, clean noobs

Output for Example 2:


. list, clean noobs

                         x1                        x2   flag1   flag2 
equal 
    56.00000000000000710543   56.00000000000000710543       0       0     

    56.00000000000000710543   56.00000000000000710543       0       0     

    56.00000000000000710543   56.00000000000000710543       0       0     
1


Unlike SPSS, Stata is not affected by the order of operations:  14/25*100
returns the same value as 100*14/25, regardless of which numeric type I use
(float or double), so long as I use the same type for both x1 and x2.

The second thing that caught my eye is the differences between SPSS and
Stata for the values that do have some non-zero decimals showing. 

Stata:  56.00000000000000710543
SPSS:  56.0000000000000100

I think this note explains the difference:


https://www.ibm.com/support/pages/spss-does-not-display-more-16-digits-dataview

For anyone who wants more info about precision in Stata, this looks good:

  https://www.stata-journal.com/sjpdf.html?articlenum=pr0025

HTH.



Jon Peck wrote
> First, note that the result Statistics produces is mathematically correct
> as far as is possible with double precision floating point arithmetic.  My
> earlier post showed the value to more precision (calculated with special
> code outside of Statistics that is thousands of times slower).  The
> Statistics result is more accurate than the Excel result, but the
> difference is way too tiny for anyone to argue that there is a difference
> that matters.  At least that's my opinion.
>
> Statistics has only one numeric data type: all numbers are double
> precision
> floating point values.  The number of decimals in the format is just a
> display issue: the displayed values would be the same in Statistics and
> Excel to more decimals than  you would ever want.
>
> I'm not sure what you want to do, but if you  want to discuss further,
> let's take this off line (

> jkpeck@

> ).
>
>
>
>
>
> On Tue, May 12, 2020 at 6:41 PM Veena Nambiar &lt;

> VNambiar@.ca

> &gt; wrote:
>
>> Thanks, Jon. This is helpful. For my purposes, these small differences do
>> end up mattering since we are dual processing data and both processors
>> need
>> to have an exact match on our results.
>>
>>
>>
>> I’ve applied the following code to the percentages I’m calculating to
>> employ what they call whole number rounding as required by our agency
>> business rules:
>>
>>
>>
>> Compute Rate2= TRUNC(Rate1) + (Rate1 > TRUNC(Rate1)).
>>
>>
>>
>> In the case below, the percentage (Rate2) end ups being 57 versus 56
>> which
>> is what the other person I’m processing with yields in SQL.
>>
>>
>>
>> It’s only typically a very few cases that our results are mismatched, but
>> those differences are important for our purposes.
>>
>>
>>
>> I’m not quite sure how to resolve the issue because it’s difficult to say
>> what each package is doing differently – but we need to ensure a
>> consistent, explainable outcomes for these percentages every time. I was
>> asked several questions - will using a different data type such as float
>> or
>> decimal in SPSS would make a difference? Would setting the variable to
>> ratio make a difference? Would lowering the number of decimals change
>> anything? Can FuzzBits be used in the Trunc command? I don’t think any of
>> these suggestions would work and I’m ensure about the FuzzBits piece and
>> its applicability in SPSS. One suggestion was to simply truncate the rate
>> to enforce a match which would work in that case in particular but
>> wouldn’t
>> for other cases – for instance 195/201*100 would yield 97.01492537313433
>> and truncated would yield 97 instead of 98 which is what I want and is
>> what
>> I yield using the code above.
>>
>>
>>
>> If anyone has any ideas for me please let me know! Thanks so much!
>>
>>
>>
>>
>>
>>
>>
>> *From:* Jon Peck [mailto:

> jkpeck@

> ]
>> *Sent:* Tuesday, May 12, 2020 10:56 AM
>> *To:* Veena Nambiar
>> *Cc:* SPSS List
>> *Subject:* [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences
>> Between IBM Statistics 26 and MS Products
>>
>>
>>
>> Calculations in SPSS Statistics are done in double precision floating
>> point hardware.  That allows for about 15-16 significant figures.  The
>> value of 14/25 is, in greater precision
>>
>> 0.560000000000000053290705182007513940334320068359375
>>
>> which has to be rounded or truncated to fit in a double precision
>> floating
>> point number, so if rounded, it will end in 1 while if truncated it will
>> end in 0.
>>
>>
>>
>> Statistics uses the IEEE standard for floating point arithmetic.  Excel
>> might use truncation or it might just be doing the calculation in a
>> different order
>>
>>
>>
>> Note that a difference this tiny is utterly negligible except in the case
>> where you are comparing two floating point numbers with fractional parts
>> for exact equality.  That is generally not a good thing to do as even a
>> difference like 14 * 100/25 vs 14/25 * 100 can produce an infinitesimal
>> difference.
>>
>>
>>
>> On Tue, May 12, 2020 at 11:25 AM Veena Nambiar &lt;

> VNambiar@.ca

> &gt;
>> wrote:
>>
>>
>>
>> I am calculating percentages in IBM Statistics 26 and finding a small
>> number of cases where I am expecting to yield a whole number percent but
>> am
>> not getting this result.
>>
>>
>>
>> For instance, 14/25*100 yields 56.00000000000001 instead of 56. I would
>> get a value of 56 with no resulting decimal when performing the same
>> computation in Excel or SQL. In other cases, such as 2/4 I would get 50
>> with no resulting decimals which matches MS products. This difference
>> seems
>> to only occur in a handful of cases by I’m not sure why.
>>
>>
>>
>> I know I have come across this issue before, but can’t remember why this
>> occurs. I believe IBM also has some documentation on this issue but can’t
>> seem to locate it from a previous conversation with them. I also can’t
>> seem
>> to put in a support request with IBM at the moment.
>>
>>
>>
>> Does anyone know the reason for this difference and if there is a
>> workaround to yield the same result as MS products?
>>
>> ===================== To manage your subscription to SPSSX-L, send a
>> message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text
>> except the command. To leave the list, send the command SIGNOFF SPSSX-L
>> For
>> a list of commands to manage subscriptions, send the command INFO REFCARD
>>
>>
>>
>>
>> --
>>
>> Jon K Peck
>>

> jkpeck@

>>
>
>
> --
> Jon K Peck

> jkpeck@

>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences Between IBM Statistics 26 and MS Products

Rich Ulrich
In reply to this post by Bruce Weaver
I did read both the blog and the journal article. I'm unhappy.
I'm not sure why the "float" example shows equality, but I will
accept that "internal round-off" seems to work advantageously.

My bigger problem is the double-precision example.

Writing (100*14)/25  - to force the order of computation - should
give an exact, integer answer in double precision.  Does it?  You
show that 100*14/25  has trailing digits, which must be wrong
if multiplication were done first.

So, it looks to me like Stata is giving division a higher precedence
than multiplication, performing it first, instead of taking the formula
in order. 

--
Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Bruce Weaver <[hidden email]>
Sent: Wednesday, May 13, 2020 10:23 AM
To: [hidden email] <[hidden email]>
Subject: Re: [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences Between IBM Statistics 26 and MS Products
 
The comparison of SPSS and Excel made me curious about what would happen
using Stata (another package I have).  First, here is my revised SPSS code
and output, with Jon's addition of the variable "equal".  (I changed F30.20
to F24.30, but I don't think that will change anything important.) 

COMPUTE x1 = 14/25*100.
COMPUTE x2 = 100*14/25.
COMPUTE flag1 = x1 EQ 56.
COMPUTE flag2 = x2 EQ 56.
COMPUTE equal = x1 EQ x2.
FORMATS flag1 flag2 equal (F1) / x1 x2 (F24.20).
LIST x1 to equal.

                      x1                       x2 flag1 flag2 equal
 
     56.0000000000000100      56.0000000000000000   0     1     0
     56.0000000000000100      56.0000000000000000   0     1     0
     56.0000000000000100      56.0000000000000000   0     1     0


Now using Stata.  One difference is that Stata has two IEEE 754-2008
floating-point types: float and double.  "float" variables (the default for
new numeric variables) are stored in 4 bytes, whereas "double" variables are
stored in 8 bytes.  (Source:
https://blog.stata.com/2012/04/02/the-penultimate-guide-to-precision/)

* Example 1 using 'float' variables (the default)
clear
set obs 3
generate x1 = 14/25*100
generate x2 = 100*14/25
generate byte flag1 = x1 == 56
generate byte flag2 = x2 == 56
generate byte equal = x1 == x2
format x1 x2 %24.20f
list, clean noobs

OUTPUT for Example 1:
. list, clean noobs

                         x1                        x2   flag1   flag2 
equal 
    56.00000000000000000000   56.00000000000000000000       1       1     

    56.00000000000000000000   56.00000000000000000000       1       1     

    56.00000000000000000000   56.00000000000000000000       1       1     



* Example 2 using 'double' variables
clear
set obs 3
generate double x1 = 14/25*100
generate double x2 = 100*14/25
generate byte flag1 = x1 == 56
generate byte flag2 = x2 == 56
generate byte equal = x1 == x2
format x1 x2 %24.20f
list, clean noobs

Output for Example 2:


. list, clean noobs

                         x1                        x2   flag1   flag2 
equal 
    56.00000000000000710543   56.00000000000000710543       0       0     

    56.00000000000000710543   56.00000000000000710543       0       0     

    56.00000000000000710543   56.00000000000000710543       0       0     
1


Unlike SPSS, Stata is not affected by the order of operations:  14/25*100
returns the same value as 100*14/25, regardless of which numeric type I use
(float or double), so long as I use the same type for both x1 and x2.

The second thing that caught my eye is the differences between SPSS and
Stata for the values that do have some non-zero decimals showing. 

Stata:  56.00000000000000710543
SPSS:  56.0000000000000100

I think this note explains the difference:

 
https://www.ibm.com/support/pages/spss-does-not-display-more-16-digits-dataview

For anyone who wants more info about precision in Stata, this looks good:

  https://www.stata-journal.com/sjpdf.html?articlenum=pr0025

HTH.



Jon Peck wrote
> First, note that the result Statistics produces is mathematically correct
> as far as is possible with double precision floating point arithmetic.  My
> earlier post showed the value to more precision (calculated with special
> code outside of Statistics that is thousands of times slower).  The
> Statistics result is more accurate than the Excel result, but the
> difference is way too tiny for anyone to argue that there is a difference
> that matters.  At least that's my opinion.
>
> Statistics has only one numeric data type: all numbers are double
> precision
> floating point values.  The number of decimals in the format is just a
> display issue: the displayed values would be the same in Statistics and
> Excel to more decimals than  you would ever want.
>
> I'm not sure what you want to do, but if you  want to discuss further,
> let's take this off line (

> jkpeck@

> ).
>
>
>
>
>
> On Tue, May 12, 2020 at 6:41 PM Veena Nambiar &lt;

> VNambiar@.ca

> &gt; wrote:
>
>> Thanks, Jon. This is helpful. For my purposes, these small differences do
>> end up mattering since we are dual processing data and both processors
>> need
>> to have an exact match on our results.
>>
>>
>>
>> I’ve applied the following code to the percentages I’m calculating to
>> employ what they call whole number rounding as required by our agency
>> business rules:
>>
>>
>>
>> Compute Rate2= TRUNC(Rate1) + (Rate1 > TRUNC(Rate1)).
>>
>>
>>
>> In the case below, the percentage (Rate2) end ups being 57 versus 56
>> which
>> is what the other person I’m processing with yields in SQL.
>>
>>
>>
>> It’s only typically a very few cases that our results are mismatched, but
>> those differences are important for our purposes.
>>
>>
>>
>> I’m not quite sure how to resolve the issue because it’s difficult to say
>> what each package is doing differently – but we need to ensure a
>> consistent, explainable outcomes for these percentages every time. I was
>> asked several questions - will using a different data type such as float
>> or
>> decimal in SPSS would make a difference? Would setting the variable to
>> ratio make a difference? Would lowering the number of decimals change
>> anything? Can FuzzBits be used in the Trunc command? I don’t think any of
>> these suggestions would work and I’m ensure about the FuzzBits piece and
>> its applicability in SPSS. One suggestion was to simply truncate the rate
>> to enforce a match which would work in that case in particular but
>> wouldn’t
>> for other cases – for instance 195/201*100 would yield 97.01492537313433
>> and truncated would yield 97 instead of 98 which is what I want and is
>> what
>> I yield using the code above.
>>
>>
>>
>> If anyone has any ideas for me please let me know! Thanks so much!
>>
>>
>>
>>
>>
>>
>>
>> *From:* Jon Peck [mailto:

> jkpeck@

> ]
>> *Sent:* Tuesday, May 12, 2020 10:56 AM
>> *To:* Veena Nambiar
>> *Cc:* SPSS List
>> *Subject:* [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences
>> Between IBM Statistics 26 and MS Products
>>
>>
>>
>> Calculations in SPSS Statistics are done in double precision floating
>> point hardware.  That allows for about 15-16 significant figures.  The
>> value of 14/25 is, in greater precision
>>
>> 0.560000000000000053290705182007513940334320068359375
>>
>> which has to be rounded or truncated to fit in a double precision
>> floating
>> point number, so if rounded, it will end in 1 while if truncated it will
>> end in 0.
>>
>>
>>
>> Statistics uses the IEEE standard for floating point arithmetic.  Excel
>> might use truncation or it might just be doing the calculation in a
>> different order
>>
>>
>>
>> Note that a difference this tiny is utterly negligible except in the case
>> where you are comparing two floating point numbers with fractional parts
>> for exact equality.  That is generally not a good thing to do as even a
>> difference like 14 * 100/25 vs 14/25 * 100 can produce an infinitesimal
>> difference.
>>
>>
>>
>> On Tue, May 12, 2020 at 11:25 AM Veena Nambiar &lt;

> VNambiar@.ca

> &gt;
>> wrote:
>>
>>
>>
>> I am calculating percentages in IBM Statistics 26 and finding a small
>> number of cases where I am expecting to yield a whole number percent but
>> am
>> not getting this result.
>>
>>
>>
>> For instance, 14/25*100 yields 56.00000000000001 instead of 56. I would
>> get a value of 56 with no resulting decimal when performing the same
>> computation in Excel or SQL. In other cases, such as 2/4 I would get 50
>> with no resulting decimals which matches MS products. This difference
>> seems
>> to only occur in a handful of cases by I’m not sure why.
>>
>>
>>
>> I know I have come across this issue before, but can’t remember why this
>> occurs. I believe IBM also has some documentation on this issue but can’t
>> seem to locate it from a previous conversation with them. I also can’t
>> seem
>> to put in a support request with IBM at the moment.
>>
>>
>>
>> Does anyone know the reason for this difference and if there is a
>> workaround to yield the same result as MS products?
>>
>> ===================== To manage your subscription to SPSSX-L, send a
>> message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text
>> except the command. To leave the list, send the command SIGNOFF SPSSX-L
>> For
>> a list of commands to manage subscriptions, send the command INFO REFCARD
>>
>>
>>
>>
>> --
>>
>> Jon K Peck
>>

> jkpeck@

>>
>
>
> --
> Jon K Peck

> jkpeck@

>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences Between IBM Statistics 26 and MS Products

Jon Peck
Forcing the operation order with parentheses does give an exact integer-valued result as expected.

As far as the order of operations in the absence of parenthesizing overrides is concerned, the CSR says.
The order of execution is as follows: functions; exponentiation; multiplication, division, and unary –; and
addition and subtraction.
• Operators at the same level are executed from left to right.


On Wed, May 13, 2020 at 12:11 PM Rich Ulrich <[hidden email]> wrote:
I did read both the blog and the journal article. I'm unhappy.
I'm not sure why the "float" example shows equality, but I will
accept that "internal round-off" seems to work advantageously.

My bigger problem is the double-precision example.

Writing (100*14)/25  - to force the order of computation - should
give an exact, integer answer in double precision.  Does it?  You
show that 100*14/25  has trailing digits, which must be wrong
if multiplication were done first.

So, it looks to me like Stata is giving division a higher precedence
than multiplication, performing it first, instead of taking the formula
in order. 

--
Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Bruce Weaver <[hidden email]>
Sent: Wednesday, May 13, 2020 10:23 AM
To: [hidden email] <[hidden email]>
Subject: Re: [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences Between IBM Statistics 26 and MS Products
 
The comparison of SPSS and Excel made me curious about what would happen
using Stata (another package I have).  First, here is my revised SPSS code
and output, with Jon's addition of the variable "equal".  (I changed F30.20
to F24.30, but I don't think that will change anything important.) 

COMPUTE x1 = 14/25*100.
COMPUTE x2 = 100*14/25.
COMPUTE flag1 = x1 EQ 56.
COMPUTE flag2 = x2 EQ 56.
COMPUTE equal = x1 EQ x2.
FORMATS flag1 flag2 equal (F1) / x1 x2 (F24.20).
LIST x1 to equal.

                      x1                       x2 flag1 flag2 equal
 
     56.0000000000000100      56.0000000000000000   0     1     0
     56.0000000000000100      56.0000000000000000   0     1     0
     56.0000000000000100      56.0000000000000000   0     1     0


Now using Stata.  One difference is that Stata has two IEEE 754-2008
floating-point types: float and double.  "float" variables (the default for
new numeric variables) are stored in 4 bytes, whereas "double" variables are
stored in 8 bytes.  (Source:
https://blog.stata.com/2012/04/02/the-penultimate-guide-to-precision/)

* Example 1 using 'float' variables (the default)
clear
set obs 3
generate x1 = 14/25*100
generate x2 = 100*14/25
generate byte flag1 = x1 == 56
generate byte flag2 = x2 == 56
generate byte equal = x1 == x2
format x1 x2 %24.20f
list, clean noobs

OUTPUT for Example 1:
. list, clean noobs

                         x1                        x2   flag1   flag2 
equal 
    56.00000000000000000000   56.00000000000000000000       1       1     

    56.00000000000000000000   56.00000000000000000000       1       1     

    56.00000000000000000000   56.00000000000000000000       1       1     



* Example 2 using 'double' variables
clear
set obs 3
generate double x1 = 14/25*100
generate double x2 = 100*14/25
generate byte flag1 = x1 == 56
generate byte flag2 = x2 == 56
generate byte equal = x1 == x2
format x1 x2 %24.20f
list, clean noobs

Output for Example 2:


. list, clean noobs

                         x1                        x2   flag1   flag2 
equal 
    56.00000000000000710543   56.00000000000000710543       0       0     

    56.00000000000000710543   56.00000000000000710543       0       0     

    56.00000000000000710543   56.00000000000000710543       0       0     
1


Unlike SPSS, Stata is not affected by the order of operations:  14/25*100
returns the same value as 100*14/25, regardless of which numeric type I use
(float or double), so long as I use the same type for both x1 and x2.

The second thing that caught my eye is the differences between SPSS and
Stata for the values that do have some non-zero decimals showing. 

Stata:  56.00000000000000710543
SPSS:  56.0000000000000100

I think this note explains the difference:

 
https://www.ibm.com/support/pages/spss-does-not-display-more-16-digits-dataview

For anyone who wants more info about precision in Stata, this looks good:

  https://www.stata-journal.com/sjpdf.html?articlenum=pr0025

HTH.



Jon Peck wrote
> First, note that the result Statistics produces is mathematically correct
> as far as is possible with double precision floating point arithmetic.  My
> earlier post showed the value to more precision (calculated with special
> code outside of Statistics that is thousands of times slower).  The
> Statistics result is more accurate than the Excel result, but the
> difference is way too tiny for anyone to argue that there is a difference
> that matters.  At least that's my opinion.
>
> Statistics has only one numeric data type: all numbers are double
> precision
> floating point values.  The number of decimals in the format is just a
> display issue: the displayed values would be the same in Statistics and
> Excel to more decimals than  you would ever want.
>
> I'm not sure what you want to do, but if you  want to discuss further,
> let's take this off line (

> jkpeck@

> ).
>
>
>
>
>
> On Tue, May 12, 2020 at 6:41 PM Veena Nambiar &lt;

> VNambiar@.ca

> &gt; wrote:
>
>> Thanks, Jon. This is helpful. For my purposes, these small differences do
>> end up mattering since we are dual processing data and both processors
>> need
>> to have an exact match on our results.
>>
>>
>>
>> I’ve applied the following code to the percentages I’m calculating to
>> employ what they call whole number rounding as required by our agency
>> business rules:
>>
>>
>>
>> Compute Rate2= TRUNC(Rate1) + (Rate1 > TRUNC(Rate1)).
>>
>>
>>
>> In the case below, the percentage (Rate2) end ups being 57 versus 56
>> which
>> is what the other person I’m processing with yields in SQL.
>>
>>
>>
>> It’s only typically a very few cases that our results are mismatched, but
>> those differences are important for our purposes.
>>
>>
>>
>> I’m not quite sure how to resolve the issue because it’s difficult to say
>> what each package is doing differently – but we need to ensure a
>> consistent, explainable outcomes for these percentages every time. I was
>> asked several questions - will using a different data type such as float
>> or
>> decimal in SPSS would make a difference? Would setting the variable to
>> ratio make a difference? Would lowering the number of decimals change
>> anything? Can FuzzBits be used in the Trunc command? I don’t think any of
>> these suggestions would work and I’m ensure about the FuzzBits piece and
>> its applicability in SPSS. One suggestion was to simply truncate the rate
>> to enforce a match which would work in that case in particular but
>> wouldn’t
>> for other cases – for instance 195/201*100 would yield 97.01492537313433
>> and truncated would yield 97 instead of 98 which is what I want and is
>> what
>> I yield using the code above.
>>
>>
>>
>> If anyone has any ideas for me please let me know! Thanks so much!
>>
>>
>>
>>
>>
>>
>>
>> *From:* Jon Peck [mailto:

> jkpeck@

> ]
>> *Sent:* Tuesday, May 12, 2020 10:56 AM
>> *To:* Veena Nambiar
>> *Cc:* SPSS List
>> *Subject:* [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences
>> Between IBM Statistics 26 and MS Products
>>
>>
>>
>> Calculations in SPSS Statistics are done in double precision floating
>> point hardware.  That allows for about 15-16 significant figures.  The
>> value of 14/25 is, in greater precision
>>
>> 0.560000000000000053290705182007513940334320068359375
>>
>> which has to be rounded or truncated to fit in a double precision
>> floating
>> point number, so if rounded, it will end in 1 while if truncated it will
>> end in 0.
>>
>>
>>
>> Statistics uses the IEEE standard for floating point arithmetic.  Excel
>> might use truncation or it might just be doing the calculation in a
>> different order
>>
>>
>>
>> Note that a difference this tiny is utterly negligible except in the case
>> where you are comparing two floating point numbers with fractional parts
>> for exact equality.  That is generally not a good thing to do as even a
>> difference like 14 * 100/25 vs 14/25 * 100 can produce an infinitesimal
>> difference.
>>
>>
>>
>> On Tue, May 12, 2020 at 11:25 AM Veena Nambiar &lt;

> VNambiar@.ca

> &gt;
>> wrote:
>>
>>
>>
>> I am calculating percentages in IBM Statistics 26 and finding a small
>> number of cases where I am expecting to yield a whole number percent but
>> am
>> not getting this result.
>>
>>
>>
>> For instance, 14/25*100 yields 56.00000000000001 instead of 56. I would
>> get a value of 56 with no resulting decimal when performing the same
>> computation in Excel or SQL. In other cases, such as 2/4 I would get 50
>> with no resulting decimals which matches MS products. This difference
>> seems
>> to only occur in a handful of cases by I’m not sure why.
>>
>>
>>
>> I know I have come across this issue before, but can’t remember why this
>> occurs. I believe IBM also has some documentation on this issue but can’t
>> seem to locate it from a previous conversation with them. I also can’t
>> seem
>> to put in a support request with IBM at the moment.
>>
>>
>>
>> Does anyone know the reason for this difference and if there is a
>> workaround to yield the same result as MS products?
>>
>> ===================== To manage your subscription to SPSSX-L, send a
>> message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text
>> except the command. To leave the list, send the command SIGNOFF SPSSX-L
>> For
>> a list of commands to manage subscriptions, send the command INFO REFCARD
>>
>>
>>
>>
>> --
>>
>> Jon K Peck
>>

> jkpeck@

>>
>
>
> --
> Jon K Peck

> jkpeck@

>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences Between IBM Statistics 26 and MS Products

Bruce Weaver
Administrator
In reply to this post by Rich Ulrich
Right you are, Rich, division, then multiplication.  Well-spotted.  See
section 13.2.5 here:

    https://www.stata.com/manuals/u13.pdf

And you're also right about using parentheses to force the order of
computation.  

* Example 3 using 'double' variables
clear
set obs 3
generate double x1 = 14/25*100
generate double x2 = 100*14/25
generate double x3 = (100*14)/25
generate byte flag1 = x1 == 56
generate byte flag2 = x2 == 56
generate byte flag3 = x3 == 56
generate byte eq12 = x1 == x2
generate byte eq13 = x1 == x3
generate byte eq23 = x2 == x3
format x1-x3 %24.20f
list x1-x3, clean noobs
list flag1-flag3 eq12-eq23, clean noobs

OUTPUT:


. list x1-x3, clean noobs

                         x1                        x2                      
x3  
    56.00000000000000710543   56.00000000000000710543  
56.00000000000000000000  
    56.00000000000000710543   56.00000000000000710543  
56.00000000000000000000  
    56.00000000000000710543   56.00000000000000710543  
56.00000000000000000000  

. list flag1-flag3 eq12-eq23, clean noobs

    flag1   flag2   flag3   eq12   eq13   eq23  
        0       0       1      1      0      0  
        0       0       1      1      0      0  
        0       0       1      1      0      0  



Rich Ulrich wrote

> I did read both the blog and the journal article. I'm unhappy.
> I'm not sure why the "float" example shows equality, but I will
> accept that "internal round-off" seems to work advantageously.
>
> My bigger problem is the double-precision example.
>
> Writing (100*14)/25  - to force the order of computation - should
> give an exact, integer answer in double precision.  Does it?  You
> show that 100*14/25  has trailing digits, which must be wrong
> if multiplication were done first.
>
> So, it looks to me like Stata is giving division a higher precedence
> than multiplication, performing it first, instead of taking the formula
> in order.
>
> --
> Rich Ulrich





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences Between IBM Statistics 26 and MS Products

Rich Ulrich
Bruce, thanks for the confirmation.

I'm a bit surprised to see that Stata gives multiply/divide and
add/subtract separate precedence, each, instead of following
the high school algebra rules that I assumed were universal.

If you don't want to worry about it, use extra parentheses.  I've
sometimes been generous with parentheses in COMPUTE statements,
just for clarity.  That now seems like a good idea, for this other reason.

Giving each symbol its own order -- I wonder if they did that
intentionally (Why?), or if it was a side effect of "easier programming"?

The errors you might get from overflow for multiply/divide (done
in the wrong order) are more obvious than the add/subtract problems
illustrated in one of your references. But the latter (rarely) can matter, too.

--
Rich Ulrich




From: SPSSX(r) Discussion <[hidden email]> on behalf of Bruce Weaver <[hidden email]>
Sent: Wednesday, May 13, 2020 8:53 PM
To: [hidden email] <[hidden email]>
Subject: Re: [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences Between IBM Statistics 26 and MS Products
 
Right you are, Rich, division, then multiplication.  Well-spotted.  See
section 13.2.5 here:

    https://www.stata.com/manuals/u13.pdf

And you're also right about using parentheses to force the order of
computation. 

* Example 3 using 'double' variables
clear
set obs 3
generate double x1 = 14/25*100
generate double x2 = 100*14/25
generate double x3 = (100*14)/25
generate byte flag1 = x1 == 56
generate byte flag2 = x2 == 56
generate byte flag3 = x3 == 56
generate byte eq12 = x1 == x2
generate byte eq13 = x1 == x3
generate byte eq23 = x2 == x3
format x1-x3 %24.20f
list x1-x3, clean noobs
list flag1-flag3 eq12-eq23, clean noobs

OUTPUT:


. list x1-x3, clean noobs

                         x1                        x2                      
x3 
    56.00000000000000710543   56.00000000000000710543 
56.00000000000000000000 
    56.00000000000000710543   56.00000000000000710543 
56.00000000000000000000 
    56.00000000000000710543   56.00000000000000710543 
56.00000000000000000000 

. list flag1-flag3 eq12-eq23, clean noobs

    flag1   flag2   flag3   eq12   eq13   eq23 
        0       0       1      1      0      0 
        0       0       1      1      0      0 
        0       0       1      1      0      0 



Rich Ulrich wrote
> I did read both the blog and the journal article. I'm unhappy.
> I'm not sure why the "float" example shows equality, but I will
> accept that "internal round-off" seems to work advantageously.
>
> My bigger problem is the double-precision example.
>
> Writing (100*14)/25  - to force the order of computation - should
> give an exact, integer answer in double precision.  Does it?  You
> show that 100*14/25  has trailing digits, which must be wrong
> if multiplication were done first.
>
> So, it looks to me like Stata is giving division a higher precedence
> than multiplication, performing it first, instead of taking the formula
> in order.
>
> --
> Rich Ulrich





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD