Fw: Glitch in CASESTOVARS? (SPSS 14)

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Fw: Glitch in CASESTOVARS? (SPSS 14)

SPSS Support-2

Hello Richard,

If the INDEX variable is a string and there is only one variable that varies within the ID values, then the RENAME and SEPARATOR subcommands (whether present or implied) are ignored  and the actual string values are used as the new names without using the original name as a prefix.. If the string contains characters that are not allowed in a variable name, then warnings are issued and  generic names, such as v1, are used. If the only invalid character is a space, then the index value is truncated at the first space to create the new name.


This behavior is intended for the case where there is only one variable  that varies within each ID value (i.e. one variable to be transposed to multiple variables) and one want to use the index variable to generate names for the result variables.  Suppose that your original file has the variables  ID, VAR, and VAL, where VAR is a string index variable. For example:

id        var        val
14        age        23
14        gender        1
18        age        32
18        gender        2
21        age        17
21        gender        2
27        age        42
27        gender        1


The command:

CASESTOVARS ID=id/ INDEX=var.

will generate a  file with variables ID, AGE, and GENDER, as follows:

id        age        gender
14        23        1
18        32        2
21        17        2
27        42        1



A documentation bug has been filed to request that the naming algorithm for the above situation be described in Help files and the Command Syntax Reference.
As a current work-around, if you want the new variable names to have the root and separator, you could compute the new index by concatenating the root and separator to the front of the original index.
 
string root_inst (A7) .
compute root_inst = concat("Inst.",inst).
CASESTOVARS
 /ID = Group
 /INDEX = root_inst
 /GROUPBY = VARIABLE
  /drop = Inst .

Note that the width of the new index variable must be sufficient to include the root, separator, and original index (4+1+2=7 in this case)

In your example, where the index values are invalid only because they start with a number, concatenating the root and separator in front of these values would also resolve the problem of invalid names.  Other variable-naming violations, such as spaces,  may require recodes or other transformation options to fix.


David Matheson
Statistical Support
SPSS, an IBM company


----- Forwarded by David Matheson/Chicago/IBM on 09/13/2010 10:40 AM -----
From: Richard Ristow <[hidden email]>
To: [hidden email]
Date: 09/09/2010 12:23 AM
Subject: Glitch in CASESTOVARS? (SPSS 14)
Sent by: "SPSSX(r) Discussion" <[hidden email]>





Here, in SPSS 14 (the latest version I have running), CASESTOVARS seems to behave in a way I think wrong, and that seems contrary to the documentation.

The Command Syntax Reference article on
CASESTOVARS (p. 195, in the v.14 edition) states (underscores added),

RENAME Subcommand

CASESTOVARS
creates variable groups with new variables. The first part of the new variable
name is either derived from the name of the original variable or is the rootname specified on
the
RENAME subcommand.

SEPARATOR Subcommand

CASESTOVARS
creates variable groups that contain new variables. There are two parts to the
name of a new variable—a rootname and an index. The parts are separated by a string.
The
separator string is specified on the
SEPARATOR subcommand.
..
If a separator is not specified, the default is a period.

It appears, however, that the root and separator are not used, and the index value alone becomes the variable name, if only one variable is being transposed and (probably) the index variable is a string. That can generate invalid variable names, when using the root and separator would have given valid ones.

Has this persisted in later releases? If so, is it recognized as a bug?

Example (complete data and code are at the end of this posting):
Using the dataset

|-----------------------------|---------------------------|
|Output Created               |09-SEP-2010 00:07:51       |
|-----------------------------|---------------------------|
Group      Inst Name

Primary    1F   Aaron
Primary    1L   Aardvark
Primary    2F   Beth
Primary    2L   Benny
Secondary  1F   Catherine
Secondary  1L   Clark
Secondary  2F   Douglas
Secondary  2L   Draper

Number of cases read:  8    Number of cases listed:  8

and running command

CASESTOVARS
/ID = Group
/INDEX = Inst
/GROUPBY = VARIABLE .

gives messages

Warnings
|---------------------------------------------------------|
|CASES TO VARS created the name "1F" for a new variable.  |
|Its first character is invalid in SPSS, so "v1"          |
|will be used instead.                                    |
|---------------------------------------------------------|
|CASES TO VARS created the name "1L" for a new variable.  |
|Its first character is invalid in SPSS, so "v2"          |
|will be used instead.                                    |
|---------------------------------------------------------|
|CASES TO VARS created the name "2F" for a new variable.  |
|Its first character is invalid in SPSS, so "v3"          |
|will be used instead.                                    |
|---------------------------------------------------------|
|CASES TO VARS created the name "2L" for a new variable.  |
|Its first character is invalid in SPSS, so "v4"          |
|will be used instead.                                    |
|---------------------------------------------------------|

and the result is

|-----------------------------|---------------------------|
|Output Created               |09-SEP-2010 00:07:53       |
|-----------------------------|---------------------------|
Group      v1         v2         v3         v4

Primary    Aaron      Aardvark   Beth       Benny
Secondary  Catherine  Clark      Douglas    Draper

Number of cases read:  2    Number of cases listed:  2

==============================
APPENDIX:  Test data, and code
==============================

*  C:\Documents and Settings\Richard\My Documents  .
*    \Technical\spssx-l\Z-2010abc                  .
*    \2010-09-08 Ristow-Glitch in CASESTOVARS.SPS  .

*  ................................................................. .
*  To support a posting titled "Glitch in CASESTOVARS? (SPSS 14)"    .
*                                                                    .
*  It appears that CASESTOVARS uses only the index value, rather     .
*  than the root, separator and index value, as a variable name,     .
*  when only one variable is being transposed. That can give         .
*  invalid variable names, when using the root and separator would   .
*  give valid ones.                                                  .
*  ................................................................. .

DATA LIST LIST/
   Group     Inst Name
  (A10,      A2,  A10).
BEGIN DATA
   Primary   1F   Aaron
   Primary   1L   Aardvark
   Primary   2F   Beth
   Primary   2L   Benny
   Secondary 1F   Catherine
   Secondary 1L   Clark
   Secondary 2F   Douglas
   Secondary 2L   Draper
END DATA.

LIST.

CASESTOVARS
/ID = Group
/INDEX = Inst
/GROUPBY = VARIABLE .

LIST.


====================To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Fw: Glitch in CASESTOVARS? (SPSS 14)

Richard Ristow
At 12:16 PM 9/13/2010, SPSS Support wrote:

If the INDEX variable is a string and there is only one variable that varies within the ID values, then the RENAME and SEPARATOR subcommands (whether present or implied) are ignored  and the actual string values are used as the new names without using the original name as a prefix.. If the string contains characters that are not allowed in a variable name, then warnings are issued and  generic names, such as v1, are used. This behavior is intended for the case where there is only one variable  that varies within each ID value and one want to use the index variable to generate names for the result variables.

I figured that SPSS was being 'cute' like that. The problem, of course, is that it contradicts the documentation (as you note); and that it can 'break' code that would run if SPSS behaved according to the documentation.

A documentation bug has been filed to request that the naming algorithm for the above situation be described in Help files and the Command Syntax Reference.

Thank you; I appreciate that.

Ah, for 'cute'. I used to teach mini-courses in JCL for IBM's OS/360. JCL's notoriously confusing. I found a useful mantra: "JCL isn't really so bad, except for the features that were put in to make it easy to use." It's always tempting to add such features.

Anyway, you write,

As a current work-around, if you want the new variable names to have the root and separator, you could compute the new index by concatenating the root and separator to the front of the original index:
 
string root_inst (A7) .
compute root_inst = concat("Inst.",inst).
CASESTOVARS
 /ID = Group
 /INDEX = root_inst
 /GROUPBY = VARIABLE
  /drop = Inst .

That's the solution I reached, and used in my posting

Date:    Thu, 9 Sep 2010 01:18:32 -0400
From:    Richard Ristow <[hidden email]>
Subject: Re: sorting out a nested data structure
To:      [hidden email]

But, again, making the feature significantly more complicated to use and to explain, by making it 'easier'.

-With thanks, and some sighs,
 Richard
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD