

I am using IBM SPSS Version 21 and have run into a little problem using DESCRIPTIVES. When I request the SUM stat, I get a 0 for SUM in a variable which I know has 1 case. Here is the syntax I'm using:
DESC HYPERTENSIVE /STAT DEF SUM.
My variable is coded 0 or 1 for absence or presence. The syntax works fine with other variables with the same coding, but with more than 1 case  e.g., HYPOTENSIVE has an incidence of 110 and its SUM stat = 110. The minimum is 0 and the maximum is 1, so I have no idea what's going on or how to fix it.
Here is the output I get...
Descriptive Statistics
N Minimum Maximum Sum Mean Std. Deviation
HYPERTENSIVE 244 0 1 0 .00 .064
HYPOTENSIVE 244 0 1 110 .45 .499
Thanks,
Rebecca G. Burzette, Ph.D.
Assistant Scientist
Office of Curricular & Student Assessment
2259 Veterinary Medicine
Iowa State University
1800 Christensen Drive
Ames, Iowa 500111134
=====================
To manage your subscription to SPSSXL, send a message to
[hidden email] (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Those statistics suggest that p (the mean) for HYPERTENSIVE is very small, so the result is probably rounding to 0 to the number of decimals displayed. Try increasing the number of decimals shown either by changing the variable format to show more decimals or by editing the pivot table and increasing the display precision there.
=====================
To manage your subscription to SPSSXL, send a message to
[hidden email] (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Hi Jon,
I see my table did not translate very well... Yes, the mean is very small (.044) and that shows up when I click on it in the output. My problem is that the SUM is 0 when it should be 1. The maximum of 1 tells me that there is at least one case with "1" as the value and in fact, when I look at the raw data, I do have one case of hypertension.
Thanks!
Becky Burzette
=====================
To manage your subscription to SPSSXL, send a message to
[hidden email] (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Administrator

Your SD for HYPERTENSIVE is what you would get if there is one observation with HYPERTENSIVE = 1, as shown below. So it is a mystery to me why you are getting a sum = 0. The code below shows me a sum of 1, with everything else matching your output.
NEW FILE.
DATASET CLOSE ALL.
INPUT PROGRAM.
LOOP ID = 1 to 244.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
EXECUTE.
COMPUTE Hypertensive = ID EQ 1.
COMPUTE Hypotensive = ID LE 110.
FORMATS Hypertensive Hypotensive (F1).
DESCRIPTIVES VARIABLES=Hypertensive Hypotensive
/STATISTICS=MEAN SUM STDDEV MIN MAX.
* This duplicates Rebecca's pasted output except that
* the SUM for Hypertensive = 1, as expected.
FORMATS Hypertensive Hypotensive (F5.4).
DESCRIPTIVES VARIABLES=Hypertensive Hypotensive
/STATISTICS=MEAN SUM STDDEV MIN MAX.
* Formatting the variables to display more decimals
* shows more decimals for the stats too, as JP suggested.
* But it is not changing the sums.
You could try the following to check for any cases where Hypertensive = a value other than 1 or 0. Change ID to whatever your ID variable is called (if you have one).
TEMPORARY.
SELECT IF NOT ANY(Hypertensive,0,1).
LIST ID Hypertensive.
If you don't have an ID variable, generate one via $CASENUM so that you can find the problematic case (if there is one).
COMPUTE ID = $CASENUM.
HTH.
Rebecca Burzette wrote
I am using IBM SPSS Version 21 and have run into a little problem using DESCRIPTIVES. When I request the SUM stat, I get a 0 for SUM in a variable which I know has 1 case. Here is the syntax I'm using:
DESC HYPERTENSIVE /STAT DEF SUM.
My variable is coded 0 or 1 for absence or presence. The syntax works fine with other variables with the same coding, but with more than 1 case  e.g., HYPOTENSIVE has an incidence of 110 and its SUM stat = 110. The minimum is 0 and the maximum is 1, so I have no idea what's going on or how to fix it.
Here is the output I get...
Descriptive Statistics
N Minimum Maximum Sum Mean Std. Deviation
HYPERTENSIVE 244 0 1 0 .00 .064
HYPOTENSIVE 244 0 1 110 .45 .499
Thanks,
Rebecca G. Burzette, Ph.D.
Assistant Scientist
Office of Curricular & Student Assessment
2259 Veterinary Medicine
Iowa State University
1800 Christensen Drive
Ames, Iowa 500111134
=====================
To manage your subscription to SPSSXL, send a message to
[hidden email] (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD


The problem is solved. The user's system had not been patched.
I don't have V21, but with the oldest version I have, 23, the sum is shown as 1. To understand what you are getting, you should know that sums are not calculated in the straightforward way that people would suspect but are computed with a complicated algorithm such as this https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/BinMath/addFloat.html in order to avoid roundoff errors in floating point numbers.
SPSS Statistics changed to using this type of formula in, I think, V21, but the algorithm did not correctly handle the special case of a sequence of numbers that is almost all zeros (which you might think was the easiest :)). It was off by one as you have observed. I believe that was the maximum error. Users were notified, and this was quickly fixed in a hot fix or fixpack once it was discovered, but apparently you do not have that fixpack installed. You can check by looking at the version number in Help > About. The last digit should be nonzero.
You can get the latest fixpack for V21 here You might need administrative rights to install it.
That should fix the problem.
=====================
To manage your subscription to SPSSXL, send a message to
[hidden email] (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD


I think there are some other odd things going on, which Jon's reply made me think about. For hypertensive the mean is .064 for an N of 244. So 244*.064=15.616. if there were one case with hypertensive=1, the mean should be .0041. The same story applies to hypotensive; the mean is .499 and but the sum is 110.45. .499*244=121.756 but 110.45/244=.452. I'm inclined to think that while you think both variables are 0/1, there are other values in the dataset. So: frequencies would be a useful command.
Gene Maguin
Original Message
From: SPSSX(r) Discussion [mailto: [hidden email]] On Behalf Of Rebecca Burzette
Sent: Thursday, July 20, 2017 2:54 PM
To: [hidden email]
Subject: Sum statistic in descriptives
I am using IBM SPSS Version 21 and have run into a little problem using DESCRIPTIVES. When I request the SUM stat, I get a 0 for SUM in a variable which I know has 1 case. Here is the syntax I'm using:
DESC HYPERTENSIVE /STAT DEF SUM.
My variable is coded 0 or 1 for absence or presence. The syntax works fine with other variables with the same coding, but with more than 1 case  e.g., HYPOTENSIVE has an incidence of 110 and its SUM stat = 110. The minimum is 0 and the maximum is 1, so I have no idea what's going on or how to fix it.
Here is the output I get...
Descriptive Statistics
N Minimum Maximum Sum Mean Std. Deviation
HYPERTENSIVE 244 0 1 0 .00 .064
HYPOTENSIVE 244 0 1 110 .45 .499
Thanks,
Rebecca G. Burzette, Ph.D.
Assistant Scientist
Office of Curricular & Student Assessment
2259 Veterinary Medicine
Iowa State University
1800 Christensen Drive
Ames, Iowa 500111134
=====================
To manage your subscription to SPSSXL, send a message to [hidden email] (not to SPSSXL), with no body text except the command. To leave the list, send the command SIGNOFF SPSSXL For a list of commands to manage subscriptions, send the command INFO REFCARD
=====================
To manage your subscription to SPSSXL, send a message to
[hidden email] (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Jon, thanks for posting that reference. I’ve never known how the floating point representation and computation worked. Gene Maguin
From: SPSSX(r) Discussion [mailto:[hidden email]]
On Behalf Of Jon Peck
Sent: Thursday, July 20, 2017 7:19 PM
To: [hidden email]
Subject: Re: Sum statistic in descriptives
The problem is solved. The user's system had not been patched.
I don't have V21, but with the oldest version I have, 23, the sum is shown as 1. To understand what you are getting, you should know that sums are not calculated in the straightforward way that people would
suspect but are computed with a complicated algorithm such as this https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/BinMath/addFloat.html in order to avoid
roundoff errors in floating point numbers.
SPSS Statistics changed to using this type of formula in, I think, V21, but the algorithm did not correctly handle the special case of a sequence of numbers that is almost all zeros (which you might think
was the easiest :)). It was off by one as you have observed. I believe that was the maximum error. Users were notified, and this was quickly fixed in a hot fix or fixpack once it was discovered, but apparently you do not have that fixpack installed. You
can check by looking at the version number in Help > About. The last digit should be nonzero.
You can get the latest fixpack for V21 here
You might need administrative rights to install it.
That should fix the problem.
On Thu, Jul 20, 2017 at 3:15 PM, Bruce Weaver <[hidden email]> wrote:
Your SD for HYPERTENSIVE is what you would get if there is one observation
with HYPERTENSIVE = 1, as shown below. So it is a mystery to me why you are
getting a sum = 0. The code below shows me a sum of 1, with everything else
matching your output.
NEW FILE.
DATASET CLOSE ALL.
INPUT PROGRAM.
LOOP ID = 1 to 244.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
EXECUTE.
COMPUTE Hypertensive = ID EQ 1.
COMPUTE Hypotensive = ID LE 110.
FORMATS Hypertensive Hypotensive (F1).
DESCRIPTIVES VARIABLES=Hypertensive Hypotensive
/STATISTICS=MEAN SUM STDDEV MIN MAX.
* This duplicates Rebecca's pasted output except that
* the SUM for Hypertensive = 1, as expected.
FORMATS Hypertensive Hypotensive (F5.4).
DESCRIPTIVES VARIABLES=Hypertensive Hypotensive
/STATISTICS=MEAN SUM STDDEV MIN MAX.
* Formatting the variables to display more decimals
* shows more decimals for the stats too, as JP suggested.
* But it is not changing the sums.
You could try the following to check for any cases where Hypertensive = a
value other than 1 or 0. Change ID to whatever your ID variable is called
(if you have one).
TEMPORARY.
SELECT IF NOT ANY(Hypertensive,0,1).
LIST ID Hypertensive.
If you don't have an ID variable, generate one via $CASENUM so that you can
find the problematic case (if there is one).
COMPUTE ID = $CASENUM.
HTH.
Rebecca Burzette wrote
> I am using IBM SPSS Version 21 and have run into a little problem using
> DESCRIPTIVES. When I request the SUM stat, I get a 0 for SUM in a
> variable which I know has 1 case. Here is the syntax I'm using:
> DESC HYPERTENSIVE /STAT DEF SUM.
> My variable is coded 0 or 1 for absence or presence. The syntax works
> fine with other variables with the same coding, but with more than 1 case
>  e.g., HYPOTENSIVE has an incidence of 110 and its SUM stat = 110. The
> minimum is 0 and the maximum is 1, so I have no idea what's going on or
> how to fix it.
>
> Here is the output I get...
> Descriptive Statistics
> N Minimum Maximum Sum Mean Std. Deviation
> HYPERTENSIVE 244 0 1 0 .00 .064
> HYPOTENSIVE 244 0 1 110 .45 .499
>
>
> Thanks,
>
> Rebecca G. Burzette, Ph.D.
> Assistant Scientist
> Office of Curricular & Student Assessment
> 2259 Veterinary Medicine
> Iowa State University
> 1800 Christensen Drive
> Ames, Iowa 500111134
>
> =====================
> To manage your subscription to SPSSXL, send a message to
> [hidden email]
> (not to SPSSXL), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSXL
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD


Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/
"When all else fails, RTFM."
NOTE: My Hotmail account is not monitored regularly.
To send me an email, please use the address shown above.

View this message in context:
http://spssxdiscussion.1045642.n5.nabble.com/Sumstatisticindescriptivestp5734557p5734560.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.
=====================
To manage your subscription to SPSSXL, send a message to
[hidden email] (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSXL, send a message to
[hidden email] (not to SPSSXL), with no body text except the command. To leave the list, send the command SIGNOFF SPSSXL For a list of commands to manage subscriptions, send the command INFO REFCARD
=====================
To manage your subscription to SPSSXL, send a message to
[hidden email] (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD


The link in my earlier email explains the basics of floating point computation, but the link I really meant to include explains the high precision way of computing a sum via the Kahan algorithm, which is what Statistics uses. A sum is computed by a roundabout algorithm that calculates the floating point error as each value is added and keeps correcting it. For most real data this makes no difference, but when the values vary a great deal in magnitude from each other or relative to the sum, the algorithm gives better accuracy.
Here is a link to an explanation of the Kahan algorithm.
=====================
To manage your subscription to SPSSXL, send a message to
[hidden email] (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD


This reminds me of some sayings that end in something like,
"It takes a computer to really mess things up."
Now that computers can do so much, so fast, a lesson to take away
might be that a computer should be programmed to check the clever
solution against the obvious one ... and do further checking when they disagree.

Rich Ulrich
The link in my earlier email explains the basics of floating point computation, but the link I really meant to include explains the high precision way of computing a sum via the Kahan algorithm, which is what
Statistics uses. A sum is computed by a roundabout algorithm that calculates the floating point error as each value is added and keeps correcting it. For most real data this makes no difference, but when the values vary a great deal in magnitude from each
other or relative to the sum, the algorithm gives better accuracy.
Here is a link to an explanation of the Kahan algorithm.
===================== To manage your subscription to SPSSXL, send a message to [hidden email] (not to SPSSXL), with no body text except the command. To leave the list, send the command SIGNOFF SPSSXL For a list of commands to manage subscriptions, send the command INFO REFCARD
=====================
To manage your subscription to SPSSXL, send a message to
[hidden email] (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD

