Difficult date question

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Difficult date question

Jeff A

Here’s one I can’t get my head around.

 

I can get part of this via the lag function and it’s easy with one complication that I can’t figure out. I’m assuming it needs some type of macro or a few nested loops.

 

 

What’s the best way to do the following.

 

Each person has an ID and can have several different rows. Each row as a begin date (DateIn) and end date (DateOut) but also date of an occurrence (compare date).

 

I have to compare each of the CompareDates with the DateIn and DateOut periods to see whether any fall within the period from DateIn to DateOut (inclusive of the DateIn and DateOut) across all rows with the same ID number.  Result = 1 when the CompareDate falls within any of the DateIn to DateOut periods within the same Person ID number (either in rows before or after).

 

So it’s possible that the CompareDate can be within an interval on the row either before it or after it and that’s the part that’s throwing me.

 

In getting started, I can do the following easily, but unsure how to finish.

 

If((CompareDate >= DateIn) and (CompareDate <= DateOut))        Result = 1.

If((CompareDate >= lag(DateIn)) and (CompareDate <= lag(DateOut)))        Result = 1.

If((CompareDate >= lag(DateIn 2)) and (CompareDate <= lag(DateOut,2)))  Result = 1.

If((CompareDate >= lag(DateIn,3)) and (CompareDate <= lag(DateOut 3)))  Result = 1.

 

Example with just the first 2 people. Each person will have a different number of rows.

 

ID   DateIn            DateOut          CompareDate          Result

1     01-Sep-04      09-Sep-04          05-Oct-04                 1

1     23-Sep-04      30-Sep-04          15-Jan-04                  0

1     10-Sep-04      09-Nov-04          06-Sep-04                1

 

2     01-Jan-04      09-Feb-04          05-Oct-04                  0

2     20-Feb-04      15-Mar-04          24-Sept-04               0     

 

 

Thanks in advance.

 

Jeff

 


Virus-free. www.avast.com
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Difficult date question

Bruce Weaver
Administrator
If I follow, on each row, you wish to set a flag to 1 if CompareDate on that
row falls between MIN(DateIn) and MAX(DateOut), where the min and max values
are across all rows for a given ID.  Right?  If so, use AGGREGATE with ID as
the BREAK variable and the "add variables" mode to add new variables
MinDateIn and MaxDateOut to the data set.  Then, something like this
(untested):

COMPUTE Flag = RANGE(CompareDate, MinDateIn, MaxDateOut).
FORMATS Flag (F1).
FREQUENCIES Flag.

By the way, MinDateIn and MaxDateOut will be ordinary numeric variables.
Use ALTER TYPE if you want to make them display as date variables.  

HTH.



Jeff A wrote

> Here's one I can't get my head around.
>
>  
>
> I can get part of this via the lag function and it's easy with one
> complication that I can't figure out. I'm assuming it needs some type of
> macro or a few nested loops.
>
>  
>
>  
>
> What's the best way to do the following.
>
>  
>
> Each person has an ID and can have several different rows. Each row as a
> begin date (DateIn) and end date (DateOut) but also date of an occurrence
> (compare date).
>
>  
>
> I have to compare each of the CompareDates with the DateIn and DateOut
> periods to see whether any fall within the period from DateIn to DateOut
> (inclusive of the DateIn and DateOut) across all rows with the same ID
> number.  Result = 1 when the CompareDate falls within any of the DateIn to
> DateOut periods within the same Person ID number (either in rows before or
> after).
>
>  
>
> So it's possible that the CompareDate can be within an interval on the row
> either before it or after it and that's the part that's throwing me.
>
>  
>
> In getting started, I can do the following easily, but unsure how to
> finish.
>
>
>  
>
> If((CompareDate >= DateIn) and (CompareDate <= DateOut))        Result =
> 1.
>
> If((CompareDate >= lag(DateIn)) and (CompareDate <= lag(DateOut)))
> Result = 1.
>
> If((CompareDate >= lag(DateIn 2)) and (CompareDate <= lag(DateOut,2)))
> Result = 1.
>
> If((CompareDate >= lag(DateIn,3)) and (CompareDate <= lag(DateOut 3)))
> Result = 1.
>
>  
>
> Example with just the first 2 people. Each person will have a different
> number of rows.
>
>  
>
> ID   DateIn            DateOut          CompareDate          Result
>
> 1     01-Sep-04      09-Sep-04          05-Oct-04                 1
>
> 1     23-Sep-04      30-Sep-04          15-Jan-04                  0
>
> 1     10-Sep-04      09-Nov-04          06-Sep-04                1
>
>  
>
> 2     01-Jan-04      09-Feb-04          05-Oct-04                  0
>
> 2     20-Feb-04      15-Mar-04          24-Sept-04               0      
>
>  
>
>  
>
> Thanks in advance.
>
>  
>
> Jeff
>
>  
>
>
>
> --
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
Reply | Threaded
Open this post in threaded view
|

Re: Difficult date question

Mario Giesel-2
@Bruce: I think aggregating can create problems if there are dates between min and max that are not included in any time interval.

@Jeff: This can be done along the following logic. I'm sure there are more elegant solutions but this is  straightforward and easy to understand.

DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate (DATE).
BEGIN DATA
1,01-Sep-04,09-Sep-04,05-Oct-04
1,23-Sep-04,30-Sep-04,15-Jan-04
1,10-Sep-04,09-Nov-04,06-Sep-04
2,01-Jan-04,09-Feb-04,05-Oct-04
2,20-Feb-04,15-Mar-04,24-Sep-04
END DATA.

* Restructuring to wide format.
CASESTOVARS /ID = ID.

* Test if CompareDate lies between In and Out.
DEFINE !result(!POS !TOKENS(1) /!POS !TOKENS(1) /!POS !TOKENS(1) /!POS !TOKENS(1))
COMPUTE test1 = DATEDIFF(!3, !1, 'days').
COMPUTE test2 = DATEDIFF(!3, !2, 'days').
IF (test1 >= 0) AND (test2 <= 0) !4 = 1.
!ENDDEFINE.

* Initializing (add more variables if needed).
COMPUTE Result.1 = 0.
COMPUTE Result.2 = 0.
COMPUTE Result.3 = 0.

* Repeat for each available CompareDate several times (add additional variables if needed - same principle).
!result DateIn.1 DateOut.1 CompareDate.1 Result.1.
!result DateIn.2 DateOut.2 CompareDate.1 Result.1.
!result DateIn.3 DateOut.3 CompareDate.1 Result.1.
!result DateIn.1 DateOut.1 CompareDate.2 Result.2.
!result DateIn.2 DateOut.2 CompareDate.2 Result.2.
!result DateIn.3 DateOut.3 CompareDate.2 Result.2.
!result DateIn.1 DateOut.1 CompareDate.3 Result.3.
!result DateIn.2 DateOut.2 CompareDate.3 Result.3.
!result DateIn.3 DateOut.3 CompareDate.3 Result.3.
* ... .

FREQUENCIES Result.1 Result.2 Result.3.

MATCH FILES FILE = * /DROP test1 test2.

* Append additional measurement rows if needed.
VARSTOCASES
 /MAKE DateIn FROM DateIn.1 DateIn.2 DateIn.3
 /MAKE DateOut FROM DateOut.1 DateOut.2 DateOut.3
 /MAKE CompareDate FROM  CompareDate.1 CompareDate.2 CompareDate.3
 /MAKE Result FROM Result.1 Result.2 Result.3 .

SELECT IF NOT SYSMIS(CompareDate).
EXECUTE.

Good luck!

Mario
Munich, Germany


Am Samstag, 27. Juni 2020, 03:58:33 MESZ hat Bruce Weaver <[hidden email]> Folgendes geschrieben:


If I follow, on each row, you wish to set a flag to 1 if CompareDate on that
row falls between MIN(DateIn) and MAX(DateOut), where the min and max values
are across all rows for a given ID.  Right?  If so, use AGGREGATE with ID as
the BREAK variable and the "add variables" mode to add new variables
MinDateIn and MaxDateOut to the data set.  Then, something like this
(untested):

COMPUTE Flag = RANGE(CompareDate, MinDateIn, MaxDateOut).
FORMATS Flag (F1).
FREQUENCIES Flag.

By the way, MinDateIn and MaxDateOut will be ordinary numeric variables.
Use ALTER TYPE if you want to make them display as date variables. 

HTH.



Jeff A wrote

> Here's one I can't get my head around.
>

>
> I can get part of this via the lag function and it's easy with one
> complication that I can't figure out. I'm assuming it needs some type of
> macro or a few nested loops.
>

>

>
> What's the best way to do the following.
>

>
> Each person has an ID and can have several different rows. Each row as a
> begin date (DateIn) and end date (DateOut) but also date of an occurrence
> (compare date).
>

>
> I have to compare each of the CompareDates with the DateIn and DateOut
> periods to see whether any fall within the period from DateIn to DateOut
> (inclusive of the DateIn and DateOut) across all rows with the same ID
> number.  Result = 1 when the CompareDate falls within any of the DateIn to
> DateOut periods within the same Person ID number (either in rows before or
> after).
>

>
> So it's possible that the CompareDate can be within an interval on the row
> either before it or after it and that's the part that's throwing me.
>

>
> In getting started, I can do the following easily, but unsure how to
> finish.
>
>

>
> If((CompareDate >= DateIn) and (CompareDate <= DateOut))        Result =
> 1.
>
> If((CompareDate >= lag(DateIn)) and (CompareDate <= lag(DateOut)))
> Result = 1.
>
> If((CompareDate >= lag(DateIn 2)) and (CompareDate <= lag(DateOut,2)))
> Result = 1.
>
> If((CompareDate >= lag(DateIn,3)) and (CompareDate <= lag(DateOut 3)))
> Result = 1.
>

>
> Example with just the first 2 people. Each person will have a different
> number of rows.
>

>
> ID  DateIn            DateOut          CompareDate          Result
>
> 1    01-Sep-04      09-Sep-04          05-Oct-04                1
>
> 1    23-Sep-04      30-Sep-04          15-Jan-04                  0
>
> 1    10-Sep-04      09-Nov-04          06-Sep-04                1
>

>
> 2    01-Jan-04      09-Feb-04          05-Oct-04                  0
>
> 2    20-Feb-04      15-Mar-04          24-Sept-04              0     
>

>

>
> Thanks in advance.
>

>
> Jeff
>

>
>
>
> --
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> [hidden email]

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/


=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Difficult date question

Bruce Weaver
Administrator
Here's what I had in mind, using Mario's sample dataset.

NEW FILE.
DATASET CLOSE ALL.
DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate
(DATE).
BEGIN DATA
1,01-Sep-04,09-Sep-04,05-Oct-04
1,23-Sep-04,30-Sep-04,15-Jan-04
1,10-Sep-04,09-Nov-04,06-Sep-04
2,01-Jan-04,09-Feb-04,05-Oct-04
2,20-Feb-04,15-Mar-04,24-Sep-04
END DATA.

AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES OVERWRITE=YES
  /BREAK=ID
  /MinDateIn=MIN(DateIn)
  /MaxDateOut=MAX(DateOut).

ALTER TYPE MinDateIn MaxDateOut (DATE11).
COMPUTE Result = RANGE(CompareDate, MinDateIn, MaxDateOut).
FORMATS Result (F1).
LIST ID DateIn DateOut CompareDate Result .

Here is the output from my LIST command:

      ID      DateIn     DateOut CompareDate Result
 
       1 01-SEP-2004 09-SEP-2004 05-OCT-2004    1
       1 23-SEP-2004 30-SEP-2004 15-JAN-2004    0
       1 10-SEP-2004 09-NOV-2004 06-SEP-2004    1
       2 01-JAN-2004 09-FEB-2004 05-OCT-2004    0
       2 20-FEB-2004 15-MAR-2004 24-SEP-2004    0

Here is the output I get if I run Mario's code, but change his final EXECUTE
to LIST:

      ID      DateIn     DateOut CompareDate   Result
 
       1 01-SEP-2004 09-SEP-2004 05-OCT-2004     1.00
       1 23-SEP-2004 30-SEP-2004 15-JAN-2004      .00
       1 10-SEP-2004 09-NOV-2004 06-SEP-2004     1.00
       2 01-JAN-2004 09-FEB-2004 05-OCT-2004      .00
       2 20-FEB-2004 15-MAR-2004 24-SEP-2004      .00


The results look the same to me.  ;-)  



Mario Giesel-2 wrote
> @Bruce: I think aggregating can create problems if there are dates between
> min and max that are not included in any time interval.
> @Jeff: This can be done along the following logic. I'm sure there are more
> elegant solutions but this is  straightforward and easy to understand.
> --- snip---





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
Reply | Threaded
Open this post in threaded view
|

Re: Difficult date question

Rich Ulrich
Bruce,
Yabbut ... His is always right, and yours can fail.  I'm reading
the requirement like he does - that the separate ranges have
to be used. Using RANGE( ) would make his Macro easier to read.

The problem with his is, Do you have a (reasonably small) maximum
number of lines for a case?
--
Rich Ulrich

From: SPSSX(r) Discussion <[hidden email]> on behalf of Bruce Weaver <[hidden email]>
Sent: Saturday, June 27, 2020 9:55 AM
To: [hidden email] <[hidden email]>
Subject: Re: Difficult date question
 
Here's what I had in mind, using Mario's sample dataset.

NEW FILE.
DATASET CLOSE ALL.
DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate
(DATE).
BEGIN DATA
1,01-Sep-04,09-Sep-04,05-Oct-04
1,23-Sep-04,30-Sep-04,15-Jan-04
1,10-Sep-04,09-Nov-04,06-Sep-04
2,01-Jan-04,09-Feb-04,05-Oct-04
2,20-Feb-04,15-Mar-04,24-Sep-04
END DATA.

AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES OVERWRITE=YES
  /BREAK=ID
  /MinDateIn=MIN(DateIn)
  /MaxDateOut=MAX(DateOut).

ALTER TYPE MinDateIn MaxDateOut (DATE11).
COMPUTE Result = RANGE(CompareDate, MinDateIn, MaxDateOut).
FORMATS Result (F1).
LIST ID DateIn DateOut CompareDate Result .

Here is the output from my LIST command:

      ID      DateIn     DateOut CompareDate Result
 
       1 01-SEP-2004 09-SEP-2004 05-OCT-2004    1
       1 23-SEP-2004 30-SEP-2004 15-JAN-2004    0
       1 10-SEP-2004 09-NOV-2004 06-SEP-2004    1
       2 01-JAN-2004 09-FEB-2004 05-OCT-2004    0
       2 20-FEB-2004 15-MAR-2004 24-SEP-2004    0

Here is the output I get if I run Mario's code, but change his final EXECUTE
to LIST:

      ID      DateIn     DateOut CompareDate   Result
 
       1 01-SEP-2004 09-SEP-2004 05-OCT-2004     1.00
       1 23-SEP-2004 30-SEP-2004 15-JAN-2004      .00
       1 10-SEP-2004 09-NOV-2004 06-SEP-2004     1.00
       2 01-JAN-2004 09-FEB-2004 05-OCT-2004      .00
       2 20-FEB-2004 15-MAR-2004 24-SEP-2004      .00


The results look the same to me.  ;-) 



Mario Giesel-2 wrote
> @Bruce: I think aggregating can create problems if there are dates between
> min and max that are not included in any time interval.
> @Jeff: This can be done along the following logic. I'm sure there are more
> elegant solutions but this is  straightforward and easy to understand.
> --- snip---





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Difficult date question

Bruce Weaver
Administrator
Hi Rich.  You prompted me to re-read the original post more carefully, and I
believe you and Mario are right.  

Here's my first stab at it using multiple ranges.  My main aim was to
eliminate the need to manually intervene to indicate the maximum number of
rows per person in the code.  As a test, I added a 3rd person with 5 rows to
Mario's dataset.

NEW FILE.
DATASET CLOSE ALL.
DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate
(DATE).
BEGIN DATA
1,01-Sep-04,09-Sep-04,05-Oct-04
1,23-Sep-04,30-Sep-04,15-Jan-04
1,10-Sep-04,09-Nov-04,06-Sep-04
2,01-Jan-04,09-Feb-04,05-Oct-04
2,20-Feb-04,15-Mar-04,24-Sep-04
3,01-Sep-04,09-Sep-04,05-Oct-04
3,23-Sep-04,30-Sep-04,15-Jan-04
3,10-Sep-04,09-Nov-04,06-Sep-04
3,01-Jan-04,09-Feb-04,05-Oct-04
3,20-Feb-04,15-Mar-04,24-Sep-04
END DATA.

DATASET NAME raw.
DATASET COPY indates.
DATASET COPY outdates.

* Restructure InDates to wide format.
DATASET ACTIVATE indates.
DELETE VARIABLES CompareDate DateOut.
CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX.
NUMERIC junk1 (F1).
EXECUTE.

* Restructure OutDates to wide format.
DATASET ACTIVATE outdates.
DELETE VARIABLES CompareDate DateIn.
CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX.
NUMERIC junk2 (F1).
EXECUTE.

* Add wide InDate and OutDate variables to the original dataset.
DATASET ACTIVATE raw.
MATCH FILES
 FILE = * /
 TABLE='indates' /
 TABLE='outdates' /
 BY=ID.
EXECUTE.
DATASET CLOSE indates.
DATASET CLOSE outdates.

COMPUTE RESULT = 0.
* Set Result = 1 if CompareDate falls within any of the intervals for that
ID.
DO REPEAT in = DateIn1 to junk1 / out = DateOut1 to junk2.
  IF NOT Result AND NOT MISSING(in) and RANGE(CompareDate,in,out) Result =
1.
END REPEAT.
EXECUTE.
DELETE VARIABLES DateIn1 to junk2.
FORMATS Result (F1).
LIST.

Output from the LIST command (spaces between IDs inserted manually):

      ID      DateIn     DateOut CompareDate RESULT
 
       1 01-SEP-2004 09-SEP-2004 05-OCT-2004    1
       1 23-SEP-2004 30-SEP-2004 15-JAN-2004    0
       1 10-SEP-2004 09-NOV-2004 06-SEP-2004    1

       2 01-JAN-2004 09-FEB-2004 05-OCT-2004    0
       2 20-FEB-2004 15-MAR-2004 24-SEP-2004    0

       3 01-SEP-2004 09-SEP-2004 05-OCT-2004    1
       3 23-SEP-2004 30-SEP-2004 15-JAN-2004    1
       3 10-SEP-2004 09-NOV-2004 06-SEP-2004    1
       3 01-JAN-2004 09-FEB-2004 05-OCT-2004    1
       3 20-FEB-2004 15-MAR-2004 24-SEP-2004    1

I was trying to figure out a way to use VECTOR & LOOP rather than DO-REPEAT
so that I could exit the loop as soon as Result = 0.  But I could not work
out how to supply the maximum number of loops needed without hands on
intervention.  So I went with the (probably) less efficient DO-REPEAT.  

Cheers,
Bruce


Rich Ulrich wrote

> Bruce,
> Yabbut ... His is always right, and yours can fail.  I'm reading
> the requirement like he does - that the separate ranges have
> to be used. Using RANGE( ) would make his Macro easier to read.
>
> The problem with his is, Do you have a (reasonably small) maximum
> number of lines for a case?
> --
> Rich Ulrich
>
> --- snip ---





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
Reply | Threaded
Open this post in threaded view
|

Re: Difficult date question

Rich Ulrich
Bruce, and all,
I've noticed a solution that allows an infinite number of
lines for one case, so long as all the dates fall in one limited
range.  For the data in these examples, you could use day-of-
year, for the year 2004. I don't know how much the processing
would slow down when using 2 or 3 or 10 years.

Define a vector, InTreat001 to InTreat366, for day of year. 
Initialize:  RECODE InTreat001 to InTreat366(Else=0). 
Compute DateIn_day and DateOut_day as day-of-year (for each).
First pass - Loop from DateIn_day to DateOut_day to set values of InTreat to 1.
Aggregate by ID, keeping the MAX value, putting it on each line.
Next pass - COMPUTE Response = Intreat(CompareDate_day).
Drop InTreat.  I would be tempted to count the days while I had them.

--
Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Bruce Weaver <[hidden email]>
Sent: Saturday, June 27, 2020 3:45 PM
To: [hidden email] <[hidden email]>
Subject: Re: Difficult date question
 
Hi Rich.  You prompted me to re-read the original post more carefully, and I
believe you and Mario are right. 

Here's my first stab at it using multiple ranges.  My main aim was to
eliminate the need to manually intervene to indicate the maximum number of
rows per person in the code.  As a test, I added a 3rd person with 5 rows to
Mario's dataset.

NEW FILE.
DATASET CLOSE ALL.
DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate
(DATE).
BEGIN DATA
1,01-Sep-04,09-Sep-04,05-Oct-04
1,23-Sep-04,30-Sep-04,15-Jan-04
1,10-Sep-04,09-Nov-04,06-Sep-04
2,01-Jan-04,09-Feb-04,05-Oct-04
2,20-Feb-04,15-Mar-04,24-Sep-04
3,01-Sep-04,09-Sep-04,05-Oct-04
3,23-Sep-04,30-Sep-04,15-Jan-04
3,10-Sep-04,09-Nov-04,06-Sep-04
3,01-Jan-04,09-Feb-04,05-Oct-04
3,20-Feb-04,15-Mar-04,24-Sep-04
END DATA.

DATASET NAME raw.
DATASET COPY indates.
DATASET COPY outdates.

* Restructure InDates to wide format.
DATASET ACTIVATE indates.
DELETE VARIABLES CompareDate DateOut.
CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX.
NUMERIC junk1 (F1).
EXECUTE.

* Restructure OutDates to wide format.
DATASET ACTIVATE outdates.
DELETE VARIABLES CompareDate DateIn.
CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX.
NUMERIC junk2 (F1).
EXECUTE.

* Add wide InDate and OutDate variables to the original dataset.
DATASET ACTIVATE raw.
MATCH FILES
 FILE = * /
 TABLE='indates' /
 TABLE='outdates' /
 BY=ID.
EXECUTE.
DATASET CLOSE indates.
DATASET CLOSE outdates.

COMPUTE RESULT = 0.
* Set Result = 1 if CompareDate falls within any of the intervals for that
ID.
DO REPEAT in = DateIn1 to junk1 / out = DateOut1 to junk2.
  IF NOT Result AND NOT MISSING(in) and RANGE(CompareDate,in,out) Result =
1.
END REPEAT.
EXECUTE.
DELETE VARIABLES DateIn1 to junk2.
FORMATS Result (F1).
LIST.

Output from the LIST command (spaces between IDs inserted manually):

      ID      DateIn     DateOut CompareDate RESULT
 
       1 01-SEP-2004 09-SEP-2004 05-OCT-2004    1
       1 23-SEP-2004 30-SEP-2004 15-JAN-2004    0
       1 10-SEP-2004 09-NOV-2004 06-SEP-2004    1

       2 01-JAN-2004 09-FEB-2004 05-OCT-2004    0
       2 20-FEB-2004 15-MAR-2004 24-SEP-2004    0

       3 01-SEP-2004 09-SEP-2004 05-OCT-2004    1
       3 23-SEP-2004 30-SEP-2004 15-JAN-2004    1
       3 10-SEP-2004 09-NOV-2004 06-SEP-2004    1
       3 01-JAN-2004 09-FEB-2004 05-OCT-2004    1
       3 20-FEB-2004 15-MAR-2004 24-SEP-2004    1

I was trying to figure out a way to use VECTOR & LOOP rather than DO-REPEAT
so that I could exit the loop as soon as Result = 0.  But I could not work
out how to supply the maximum number of loops needed without hands on
intervention.  So I went with the (probably) less efficient DO-REPEAT. 

Cheers,
Bruce


Rich Ulrich wrote
> Bruce,
> Yabbut ... His is always right, and yours can fail.  I'm reading
> the requirement like he does - that the separate ranges have
> to be used. Using RANGE( ) would make his Macro easier to read.
>
> The problem with his is, Do you have a (reasonably small) maximum
> number of lines for a case?
> --
> Rich Ulrich
>
> --- snip ---





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Difficult date question

Nkem Ntonghanwah
The following should equally do the job assuming that the earliest day is one of the DayIns and the latest day is one of the DayOuts.
**Its easy to modify if that is not the case.

***Obtain Start date and End date for each person.

AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /BREAK=ID
  /DIn_Min 'Start Date in'=MIN(DateIn) 
 /DOut_max 'Last day out'=MAX(DateOut).
 
COMPUTE RESULT = CompareDate >= DIn_Min) & (CompareDate <= DOut_max.
VARIABLE LABELS  Day_Within 'Compare day is within Start and end'.
EXECUTE.

FORMATS Day_Within (F1).
DELETE VARIABLES DIn_Min DOut_max.

Best wishes
Forcheh


tested using Bruce data augmented by the original


NEW FILE.
DATASET CLOSE ALL.
DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate
(DATE).
BEGIN DATA
1,01-Sep-04,09-Sep-04,05-Oct-04
1,23-Sep-04,30-Sep-04,15-Jan-04
1,10-Sep-04,09-Nov-04,06-Sep-04
2,01-Jan-04,09-Feb-04,05-Oct-04
2,20-Feb-04,15-Mar-04,24-Sep-04
3,01-Sep-04,09-Sep-04,05-Oct-04
3,23-Sep-04,30-Sep-04,15-Jan-04
3,10-Sep-04,09-Nov-04,06-Sep-04
3,01-Jan-04,09-Feb-04,05-Oct-04
3,20-Feb-04,15-Mar-04,24-Sep-04
5,01-Sep-04,09-Sep-04,05-Oct-04
5,23-Sep-04,30-Sep-04,15-Jan-04
5,10-Sep-04,09-Nov-04,06-Sep-04
6,01-Jan-04,09-Feb-04,05-Oct-04
6,20-Feb-04,15-Mar-04,24-Sept-04
END DATA.
EXECUTE.
 
AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /BREAK=ID
  /DIn_Min 'Start Date in'=MIN(DateIn) 
 /DOut_max 'Last day out'=MAX(DateOut).

COMPUTE Result=CompareDate >= DIn_Min & CompareDate <= DOut_max.
VARIABLE LABELS  Day_Within 'Compare day is within Start and end'.
EXECUTE.

FORMATS Result (F1).
DELETE VARIABLES DIn_Min DOut_max.


best wishes
Forcheh

On Sat, Jun 27, 2020 at 11:36 PM Rich Ulrich <[hidden email]> wrote:
Bruce, and all,
I've noticed a solution that allows an infinite number of
lines for one case, so long as all the dates fall in one limited
range.  For the data in these examples, you could use day-of-
year, for the year 2004. I don't know how much the processing
would slow down when using 2 or 3 or 10 years.

Define a vector, InTreat001 to InTreat366, for day of year. 
Initialize:  RECODE InTreat001 to InTreat366(Else=0). 
Compute DateIn_day and DateOut_day as day-of-year (for each).
First pass - Loop from DateIn_day to DateOut_day to set values of InTreat to 1.
Aggregate by ID, keeping the MAX value, putting it on each line.
Next pass - COMPUTE Response = Intreat(CompareDate_day).
Drop InTreat.  I would be tempted to count the days while I had them.

--
Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Bruce Weaver <[hidden email]>
Sent: Saturday, June 27, 2020 3:45 PM
To: [hidden email] <[hidden email]>
Subject: Re: Difficult date question
 
Hi Rich.  You prompted me to re-read the original post more carefully, and I
believe you and Mario are right. 

Here's my first stab at it using multiple ranges.  My main aim was to
eliminate the need to manually intervene to indicate the maximum number of
rows per person in the code.  As a test, I added a 3rd person with 5 rows to
Mario's dataset.

NEW FILE.
DATASET CLOSE ALL.
DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate
(DATE).
BEGIN DATA
1,01-Sep-04,09-Sep-04,05-Oct-04
1,23-Sep-04,30-Sep-04,15-Jan-04
1,10-Sep-04,09-Nov-04,06-Sep-04
2,01-Jan-04,09-Feb-04,05-Oct-04
2,20-Feb-04,15-Mar-04,24-Sep-04
3,01-Sep-04,09-Sep-04,05-Oct-04
3,23-Sep-04,30-Sep-04,15-Jan-04
3,10-Sep-04,09-Nov-04,06-Sep-04
3,01-Jan-04,09-Feb-04,05-Oct-04
3,20-Feb-04,15-Mar-04,24-Sep-04
END DATA.

DATASET NAME raw.
DATASET COPY indates.
DATASET COPY outdates.

* Restructure InDates to wide format.
DATASET ACTIVATE indates.
DELETE VARIABLES CompareDate DateOut.
CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX.
NUMERIC junk1 (F1).
EXECUTE.

* Restructure OutDates to wide format.
DATASET ACTIVATE outdates.
DELETE VARIABLES CompareDate DateIn.
CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX.
NUMERIC junk2 (F1).
EXECUTE.

* Add wide InDate and OutDate variables to the original dataset.
DATASET ACTIVATE raw.
MATCH FILES
 FILE = * /
 TABLE='indates' /
 TABLE='outdates' /
 BY=ID.
EXECUTE.
DATASET CLOSE indates.
DATASET CLOSE outdates.

COMPUTE RESULT = 0.
* Set Result = 1 if CompareDate falls within any of the intervals for that
ID.
DO REPEAT in = DateIn1 to junk1 / out = DateOut1 to junk2.
  IF NOT Result AND NOT MISSING(in) and RANGE(CompareDate,in,out) Result =
1.
END REPEAT.
EXECUTE.
DELETE VARIABLES DateIn1 to junk2.
FORMATS Result (F1).
LIST.

Output from the LIST command (spaces between IDs inserted manually):

      ID      DateIn     DateOut CompareDate RESULT
 
       1 01-SEP-2004 09-SEP-2004 05-OCT-2004    1
       1 23-SEP-2004 30-SEP-2004 15-JAN-2004    0
       1 10-SEP-2004 09-NOV-2004 06-SEP-2004    1

       2 01-JAN-2004 09-FEB-2004 05-OCT-2004    0
       2 20-FEB-2004 15-MAR-2004 24-SEP-2004    0

       3 01-SEP-2004 09-SEP-2004 05-OCT-2004    1
       3 23-SEP-2004 30-SEP-2004 15-JAN-2004    1
       3 10-SEP-2004 09-NOV-2004 06-SEP-2004    1
       3 01-JAN-2004 09-FEB-2004 05-OCT-2004    1
       3 20-FEB-2004 15-MAR-2004 24-SEP-2004    1

I was trying to figure out a way to use VECTOR & LOOP rather than DO-REPEAT
so that I could exit the loop as soon as Result = 0.  But I could not work
out how to supply the maximum number of loops needed without hands on
intervention.  So I went with the (probably) less efficient DO-REPEAT. 

Cheers,
Bruce


Rich Ulrich wrote
> Bruce,
> Yabbut ... His is always right, and yours can fail.  I'm reading
> the requirement like he does - that the separate ranges have
> to be used. Using RANGE( ) would make his Macro easier to read.
>
> The problem with his is, Do you have a (reasonably small) maximum
> number of lines for a case?
> --
> Rich Ulrich
>
> --- snip ---





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Difficult date question

Nkem Ntonghanwah
corrected line  : COMPUTE RESULT = CompareDate >= DIn_Min) & (CompareDate <= DOut_max.  to:

COMPUTE Result=CompareDate >= DIn_Min & CompareDate <= DOut_max.

On Sun, Jun 28, 2020 at 1:11 AM Nkem Ntonghanwah <[hidden email]> wrote:
The following should equally do the job assuming that the earliest day is one of the DayIns and the latest day is one of the DayOuts.
**Its easy to modify if that is not the case.

***Obtain Start date and End date for each person.

AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /BREAK=ID
  /DIn_Min 'Start Date in'=MIN(DateIn) 
 /DOut_max 'Last day out'=MAX(DateOut).
 
COMPUTE RESULT = CompareDate >= DIn_Min) & (CompareDate <= DOut_max.
VARIABLE LABELS  Day_Within 'Compare day is within Start and end'.
EXECUTE.

FORMATS Day_Within (F1).
DELETE VARIABLES DIn_Min DOut_max.

Best wishes
Forcheh


tested using Bruce data augmented by the original


NEW FILE.
DATASET CLOSE ALL.
DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate
(DATE).
BEGIN DATA
1,01-Sep-04,09-Sep-04,05-Oct-04
1,23-Sep-04,30-Sep-04,15-Jan-04
1,10-Sep-04,09-Nov-04,06-Sep-04
2,01-Jan-04,09-Feb-04,05-Oct-04
2,20-Feb-04,15-Mar-04,24-Sep-04
3,01-Sep-04,09-Sep-04,05-Oct-04
3,23-Sep-04,30-Sep-04,15-Jan-04
3,10-Sep-04,09-Nov-04,06-Sep-04
3,01-Jan-04,09-Feb-04,05-Oct-04
3,20-Feb-04,15-Mar-04,24-Sep-04
5,01-Sep-04,09-Sep-04,05-Oct-04
5,23-Sep-04,30-Sep-04,15-Jan-04
5,10-Sep-04,09-Nov-04,06-Sep-04
6,01-Jan-04,09-Feb-04,05-Oct-04
6,20-Feb-04,15-Mar-04,24-Sept-04
END DATA.
EXECUTE.
 
AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /BREAK=ID
  /DIn_Min 'Start Date in'=MIN(DateIn) 
 /DOut_max 'Last day out'=MAX(DateOut).

COMPUTE Result=CompareDate >= DIn_Min & CompareDate <= DOut_max.
VARIABLE LABELS  Day_Within 'Compare day is within Start and end'.
EXECUTE.

FORMATS Result (F1).
DELETE VARIABLES DIn_Min DOut_max.


best wishes
Forcheh

On Sat, Jun 27, 2020 at 11:36 PM Rich Ulrich <[hidden email]> wrote:
Bruce, and all,
I've noticed a solution that allows an infinite number of
lines for one case, so long as all the dates fall in one limited
range.  For the data in these examples, you could use day-of-
year, for the year 2004. I don't know how much the processing
would slow down when using 2 or 3 or 10 years.

Define a vector, InTreat001 to InTreat366, for day of year. 
Initialize:  RECODE InTreat001 to InTreat366(Else=0). 
Compute DateIn_day and DateOut_day as day-of-year (for each).
First pass - Loop from DateIn_day to DateOut_day to set values of InTreat to 1.
Aggregate by ID, keeping the MAX value, putting it on each line.
Next pass - COMPUTE Response = Intreat(CompareDate_day).
Drop InTreat.  I would be tempted to count the days while I had them.

--
Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Bruce Weaver <[hidden email]>
Sent: Saturday, June 27, 2020 3:45 PM
To: [hidden email] <[hidden email]>
Subject: Re: Difficult date question
 
Hi Rich.  You prompted me to re-read the original post more carefully, and I
believe you and Mario are right. 

Here's my first stab at it using multiple ranges.  My main aim was to
eliminate the need to manually intervene to indicate the maximum number of
rows per person in the code.  As a test, I added a 3rd person with 5 rows to
Mario's dataset.

NEW FILE.
DATASET CLOSE ALL.
DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate
(DATE).
BEGIN DATA
1,01-Sep-04,09-Sep-04,05-Oct-04
1,23-Sep-04,30-Sep-04,15-Jan-04
1,10-Sep-04,09-Nov-04,06-Sep-04
2,01-Jan-04,09-Feb-04,05-Oct-04
2,20-Feb-04,15-Mar-04,24-Sep-04
3,01-Sep-04,09-Sep-04,05-Oct-04
3,23-Sep-04,30-Sep-04,15-Jan-04
3,10-Sep-04,09-Nov-04,06-Sep-04
3,01-Jan-04,09-Feb-04,05-Oct-04
3,20-Feb-04,15-Mar-04,24-Sep-04
END DATA.

DATASET NAME raw.
DATASET COPY indates.
DATASET COPY outdates.

* Restructure InDates to wide format.
DATASET ACTIVATE indates.
DELETE VARIABLES CompareDate DateOut.
CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX.
NUMERIC junk1 (F1).
EXECUTE.

* Restructure OutDates to wide format.
DATASET ACTIVATE outdates.
DELETE VARIABLES CompareDate DateIn.
CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX.
NUMERIC junk2 (F1).
EXECUTE.

* Add wide InDate and OutDate variables to the original dataset.
DATASET ACTIVATE raw.
MATCH FILES
 FILE = * /
 TABLE='indates' /
 TABLE='outdates' /
 BY=ID.
EXECUTE.
DATASET CLOSE indates.
DATASET CLOSE outdates.

COMPUTE RESULT = 0.
* Set Result = 1 if CompareDate falls within any of the intervals for that
ID.
DO REPEAT in = DateIn1 to junk1 / out = DateOut1 to junk2.
  IF NOT Result AND NOT MISSING(in) and RANGE(CompareDate,in,out) Result =
1.
END REPEAT.
EXECUTE.
DELETE VARIABLES DateIn1 to junk2.
FORMATS Result (F1).
LIST.

Output from the LIST command (spaces between IDs inserted manually):

      ID      DateIn     DateOut CompareDate RESULT
 
       1 01-SEP-2004 09-SEP-2004 05-OCT-2004    1
       1 23-SEP-2004 30-SEP-2004 15-JAN-2004    0
       1 10-SEP-2004 09-NOV-2004 06-SEP-2004    1

       2 01-JAN-2004 09-FEB-2004 05-OCT-2004    0
       2 20-FEB-2004 15-MAR-2004 24-SEP-2004    0

       3 01-SEP-2004 09-SEP-2004 05-OCT-2004    1
       3 23-SEP-2004 30-SEP-2004 15-JAN-2004    1
       3 10-SEP-2004 09-NOV-2004 06-SEP-2004    1
       3 01-JAN-2004 09-FEB-2004 05-OCT-2004    1
       3 20-FEB-2004 15-MAR-2004 24-SEP-2004    1

I was trying to figure out a way to use VECTOR & LOOP rather than DO-REPEAT
so that I could exit the loop as soon as Result = 0.  But I could not work
out how to supply the maximum number of loops needed without hands on
intervention.  So I went with the (probably) less efficient DO-REPEAT. 

Cheers,
Bruce


Rich Ulrich wrote
> Bruce,
> Yabbut ... His is always right, and yours can fail.  I'm reading
> the requirement like he does - that the separate ranges have
> to be used. Using RANGE( ) would make his Macro easier to read.
>
> The problem with his is, Do you have a (reasonably small) maximum
> number of lines for a case?
> --
> Rich Ulrich
>
> --- snip ---





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Difficult date question

Art Kendall
In reply to this post by Rich Ulrich
Interesting approach.

The use of RECODE to initialize is something I'll try to remember.
I appreciate that you frequently bring insight to the questions posed on
this list.

 "I don't know how much the processing would slow down when using 2 or 3 or
10 years. "

That gives you away as one of us old-timers.

It is several years since I timed SPSS runs using different approaches.
YMMV. With today's computers the difference in CPU time, memory, or storage
is usually negligible. It is even becoming increasingly rare for me to
bother redrafting syntax only on a subset of the data.

Often the personnel time (1) to quality-assurance check the syntax and (2)
explain the syntax to clients or critics is much more of a consideration.
So I emphasize the readability of syntax.








-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Difficult date question

Nkem Ntonghanwah
In reply to this post by Rich Ulrich
Dear All,
I included the following cases to the data and tested using Bruce update with restructuring etc.
Bruce version gives error messages - "In range 1st value greater than the second value, result set to missing" 
But the final result is exactly the same as using the simple aggregate and compute that I posted (below) which never generates any errors and don't depend on date range, etc..

5,01-Jul-04,09-Sep-04,05-Oct-04
5,23-Jan-05,30-Dec-04,15-Jan-04
5,10-Sep-04,09-Nov-04,06-Sep-04

 
AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /BREAK=ID
  /DIn_Min 'Start Date in'=MIN(DateIn) 
 /DOut_max 'Last day out'=MAX(DateOut).

COMPUTE Result=CompareDate >= DIn_Min & CompareDate <= DOut_max.
VARIABLE LABELS Result 'Compare day is within Start and end'.
EXECUTE.

FORMATS Result (F1).
DELETE VARIABLES DIn_Min DOut_max.

 I would like a counter example where this doesn't work
Thanks.
Forcheh

On Sat, Jun 27, 2020 at 11:36 PM Rich Ulrich <[hidden email]> wrote:
Bruce, and all,
I've noticed a solution that allows an infinite number of
lines for one case, so long as all the dates fall in one limited
range.  For the data in these examples, you could use day-of-
year, for the year 2004. I don't know how much the processing
would slow down when using 2 or 3 or 10 years.

Define a vector, InTreat001 to InTreat366, for day of year. 
Initialize:  RECODE InTreat001 to InTreat366(Else=0). 
Compute DateIn_day and DateOut_day as day-of-year (for each).
First pass - Loop from DateIn_day to DateOut_day to set values of InTreat to 1.
Aggregate by ID, keeping the MAX value, putting it on each line.
Next pass - COMPUTE Response = Intreat(CompareDate_day).
Drop InTreat.  I would be tempted to count the days while I had them.

--
Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Bruce Weaver <[hidden email]>
Sent: Saturday, June 27, 2020 3:45 PM
To: [hidden email] <[hidden email]>
Subject: Re: Difficult date question
 
Hi Rich.  You prompted me to re-read the original post more carefully, and I
believe you and Mario are right. 

Here's my first stab at it using multiple ranges.  My main aim was to
eliminate the need to manually intervene to indicate the maximum number of
rows per person in the code.  As a test, I added a 3rd person with 5 rows to
Mario's dataset.

NEW FILE.
DATASET CLOSE ALL.
DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate
(DATE).
BEGIN DATA
1,01-Sep-04,09-Sep-04,05-Oct-04
1,23-Sep-04,30-Sep-04,15-Jan-04
1,10-Sep-04,09-Nov-04,06-Sep-04
2,01-Jan-04,09-Feb-04,05-Oct-04
2,20-Feb-04,15-Mar-04,24-Sep-04
3,01-Sep-04,09-Sep-04,05-Oct-04
3,23-Sep-04,30-Sep-04,15-Jan-04
3,10-Sep-04,09-Nov-04,06-Sep-04
3,01-Jan-04,09-Feb-04,05-Oct-04
3,20-Feb-04,15-Mar-04,24-Sep-04
END DATA.

DATASET NAME raw.
DATASET COPY indates.
DATASET COPY outdates.

* Restructure InDates to wide format.
DATASET ACTIVATE indates.
DELETE VARIABLES CompareDate DateOut.
CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX.
NUMERIC junk1 (F1).
EXECUTE.

* Restructure OutDates to wide format.
DATASET ACTIVATE outdates.
DELETE VARIABLES CompareDate DateIn.
CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX.
NUMERIC junk2 (F1).
EXECUTE.

* Add wide InDate and OutDate variables to the original dataset.
DATASET ACTIVATE raw.
MATCH FILES
 FILE = * /
 TABLE='indates' /
 TABLE='outdates' /
 BY=ID.
EXECUTE.
DATASET CLOSE indates.
DATASET CLOSE outdates.

COMPUTE RESULT = 0.
* Set Result = 1 if CompareDate falls within any of the intervals for that
ID.
DO REPEAT in = DateIn1 to junk1 / out = DateOut1 to junk2.
  IF NOT Result AND NOT MISSING(in) and RANGE(CompareDate,in,out) Result =
1.
END REPEAT.
EXECUTE.
DELETE VARIABLES DateIn1 to junk2.
FORMATS Result (F1).
LIST.

Output from the LIST command (spaces between IDs inserted manually):

      ID      DateIn     DateOut CompareDate RESULT
 
       1 01-SEP-2004 09-SEP-2004 05-OCT-2004    1
       1 23-SEP-2004 30-SEP-2004 15-JAN-2004    0
       1 10-SEP-2004 09-NOV-2004 06-SEP-2004    1

       2 01-JAN-2004 09-FEB-2004 05-OCT-2004    0
       2 20-FEB-2004 15-MAR-2004 24-SEP-2004    0

       3 01-SEP-2004 09-SEP-2004 05-OCT-2004    1
       3 23-SEP-2004 30-SEP-2004 15-JAN-2004    1
       3 10-SEP-2004 09-NOV-2004 06-SEP-2004    1
       3 01-JAN-2004 09-FEB-2004 05-OCT-2004    1
       3 20-FEB-2004 15-MAR-2004 24-SEP-2004    1

I was trying to figure out a way to use VECTOR & LOOP rather than DO-REPEAT
so that I could exit the loop as soon as Result = 0.  But I could not work
out how to supply the maximum number of loops needed without hands on
intervention.  So I went with the (probably) less efficient DO-REPEAT. 

Cheers,
Bruce


Rich Ulrich wrote
> Bruce,
> Yabbut ... His is always right, and yours can fail.  I'm reading
> the requirement like he does - that the separate ranges have
> to be used. Using RANGE( ) would make his Macro easier to read.
>
> The problem with his is, Do you have a (reasonably small) maximum
> number of lines for a case?
> --
> Rich Ulrich
>
> --- snip ---





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Difficult date question

Bruce Weaver
Administrator
In reply to this post by Rich Ulrich
I had a go at implementing Rich's pseudo-code.  And after enlisting his help
on one point that was confusing me, I believe I've now got a working
version.  

* Implementation of Rich Ulrich's suggestion, posted 27-Jun-2020.

NEW FILE.
DATASET CLOSE ALL.
DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate
(DATE).
BEGIN DATA
1,01-Sep-04,09-Sep-04,05-Oct-04
1,23-Sep-04,30-Sep-04,15-Jan-04
1,10-Sep-04,09-Nov-04,06-Sep-04
2,01-Jan-04,09-Feb-04,05-Oct-04
2,20-Feb-04,15-Mar-04,24-Sep-04
3,01-Sep-04,09-Sep-04,05-Oct-04
3,23-Sep-04,30-Sep-04,15-Jan-04
3,10-Sep-04,09-Nov-04,06-Sep-04
3,01-Jan-04,09-Feb-04,05-Oct-04
3,20-Feb-04,15-Mar-04,24-Sep-04
END DATA.

COMPUTE doy1 = XDATE.JDATE(DateIn).
COMPUTE doy2 = XDATE.JDATE(DateOut).
COMPUTE doycd = XDATE.JDATE(CompareDate).
FORMATS doy1 to doycd (F5.0).

* Define a vector, InTreat001 to InTreat366, for day of year.
* Initialize:  RECODE InTreat001 to InTreat366(Else=0).

NUMERIC InTx001 to InTx366 (F1).
RECODE InTx001 to InTx366 (ELSE=0).

* First pass - Loop from DateIn_day to DateOut_day to set values of InTreat
to 1.
VECTOR InTx = InTx001 to InTx366.
LOOP # = doy1 to doy2.
  COMPUTE InTx(#) = 1.
END LOOP.
EXECUTE.

* Aggregate by ID, keeping the MAX values of InTx, writing them to each
line.
  AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES OVERWRITE=YES
  /BREAK=ID
  /InTx001 to InTx366 = MAX(InTx001 to InTx366).

* Next pass - COMPUTE Response = Intreat(CompareDate_day).

VECTOR InTx = InTx001 to InTx366.
COMPUTE Result = InTx(doycd).
COMPUTE DaysInTx = SUM(InTx001 to InTx366).
FORMATS Result (F1) / DaysInTx (F5.0).
EXECUTE.  /* Needed before DELETE VARIABLES.
DELETE VARIABLES InTx001 to InTx366.
LIST.

Output from LIST:  

      ID      DateIn     DateOut CompareDate  doy1  doy2 doycd Result
DaysInTx
 
       1 01-SEP-2004 09-SEP-2004 05-OCT-2004   245   253   279    1      
70
       1 23-SEP-2004 30-SEP-2004 15-JAN-2004   267   274    15    0      
70
       1 10-SEP-2004 09-NOV-2004 06-SEP-2004   254   314   250    1      
70
       2 01-JAN-2004 09-FEB-2004 05-OCT-2004     1    40   279    0      
65
       2 20-FEB-2004 15-MAR-2004 24-SEP-2004    51    75   268    0      
65
       3 01-SEP-2004 09-SEP-2004 05-OCT-2004   245   253   279    1      
135
       3 23-SEP-2004 30-SEP-2004 15-JAN-2004   267   274    15    1      
135
       3 10-SEP-2004 09-NOV-2004 06-SEP-2004   254   314   250    1      
135
       3 01-JAN-2004 09-FEB-2004 05-OCT-2004     1    40   279    1      
135
       3 20-FEB-2004 15-MAR-2004 24-SEP-2004    51    75   268    1      
135


As Rich noted, tweaks would be needed if dates are not all within the same
calendar year.


Rich Ulrich wrote

> Bruce,
> Yabbut ... His is always right, and yours can fail.  I'm reading
> the requirement like he does - that the separate ranges have
> to be used. Using RANGE( ) would make his Macro easier to read.
>
> The problem with his is, Do you have a (reasonably small) maximum
> number of lines for a case?
> --
> Rich Ulrich
>
> --- snip ---





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
Reply | Threaded
Open this post in threaded view
|

Re: Difficult date question

Rich Ulrich
Bruce, thanks for showing it.

It occurs to me that the method can save more information that might
be interesting if the Aggregate uses SUM( ) instead of MAX( ).

In that case, Result will contain the number of treatments that
a person was enrolled in at the time of the event, instead of No/Yes.
(To obtain No/Yes like the original Result, I would use RECODE  .... /INTO ) .

And SUM( ) for DaysInTx will report a subtly different value - the sum
of days in treatments, instead of the count of calendar days for which
the person was in any treatment. You could get the latter from the
vector by using  COUNT(1,2,... max num of simultaneous treatments).

As an old-timer, I respect the possibility that someone might want to
generalize the algorithm. "Thousands" will offer little slowdown so long
as everything stays in memory, not on disk.  A file of millions of records
could slow down. Using a vector-length that is in the millions (seconds,
not days?) would be slow merely if that initializing takes time; I suspect
that initializing isn't required. Still, anything that has to page to disk will
take time.

--
Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Bruce Weaver <[hidden email]>
Sent: Sunday, June 28, 2020 1:45 PM
To: [hidden email] <[hidden email]>
Subject: Re: Difficult date question
 
I had a go at implementing Rich's pseudo-code.  And after enlisting his help
on one point that was confusing me, I believe I've now got a working
version. 

* Implementation of Rich Ulrich's suggestion, posted 27-Jun-2020.

NEW FILE.
DATASET CLOSE ALL.
DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate
(DATE).
BEGIN DATA
1,01-Sep-04,09-Sep-04,05-Oct-04
1,23-Sep-04,30-Sep-04,15-Jan-04
1,10-Sep-04,09-Nov-04,06-Sep-04
2,01-Jan-04,09-Feb-04,05-Oct-04
2,20-Feb-04,15-Mar-04,24-Sep-04
3,01-Sep-04,09-Sep-04,05-Oct-04
3,23-Sep-04,30-Sep-04,15-Jan-04
3,10-Sep-04,09-Nov-04,06-Sep-04
3,01-Jan-04,09-Feb-04,05-Oct-04
3,20-Feb-04,15-Mar-04,24-Sep-04
END DATA.

COMPUTE doy1 = XDATE.JDATE(DateIn).
COMPUTE doy2 = XDATE.JDATE(DateOut).
COMPUTE doycd = XDATE.JDATE(CompareDate).
FORMATS doy1 to doycd (F5.0).

* Define a vector, InTreat001 to InTreat366, for day of year.
* Initialize:  RECODE InTreat001 to InTreat366(Else=0).

NUMERIC InTx001 to InTx366 (F1).
RECODE InTx001 to InTx366 (ELSE=0).

* First pass - Loop from DateIn_day to DateOut_day to set values of InTreat
to 1.
VECTOR InTx = InTx001 to InTx366.
LOOP # = doy1 to doy2.
  COMPUTE InTx(#) = 1.
END LOOP.
EXECUTE.

* Aggregate by ID, keeping the MAX values of InTx, writing them to each
line.
  AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES OVERWRITE=YES
  /BREAK=ID
  /InTx001 to InTx366 = MAX(InTx001 to InTx366).

* Next pass - COMPUTE Response = Intreat(CompareDate_day).

VECTOR InTx = InTx001 to InTx366.
COMPUTE Result = InTx(doycd).
COMPUTE DaysInTx = SUM(InTx001 to InTx366).
FORMATS Result (F1) / DaysInTx (F5.0).
EXECUTE.  /* Needed before DELETE VARIABLES.
DELETE VARIABLES InTx001 to InTx366.
LIST.

Output from LIST: 

      ID      DateIn     DateOut CompareDate  doy1  doy2 doycd Result
DaysInTx
 
       1 01-SEP-2004 09-SEP-2004 05-OCT-2004   245   253   279    1      
70
       1 23-SEP-2004 30-SEP-2004 15-JAN-2004   267   274    15    0      
70
       1 10-SEP-2004 09-NOV-2004 06-SEP-2004   254   314   250    1      
70
       2 01-JAN-2004 09-FEB-2004 05-OCT-2004     1    40   279    0      
65
       2 20-FEB-2004 15-MAR-2004 24-SEP-2004    51    75   268    0      
65
       3 01-SEP-2004 09-SEP-2004 05-OCT-2004   245   253   279    1     
135
       3 23-SEP-2004 30-SEP-2004 15-JAN-2004   267   274    15    1     
135
       3 10-SEP-2004 09-NOV-2004 06-SEP-2004   254   314   250    1     
135
       3 01-JAN-2004 09-FEB-2004 05-OCT-2004     1    40   279    1     
135
       3 20-FEB-2004 15-MAR-2004 24-SEP-2004    51    75   268    1     
135


As Rich noted, tweaks would be needed if dates are not all within the same
calendar year.


Rich Ulrich wrote
> Bruce,
> Yabbut ... His is always right, and yours can fail.  I'm reading
> the requirement like he does - that the separate ranges have
> to be used. Using RANGE( ) would make his Macro easier to read.
>
> The problem with his is, Do you have a (reasonably small) maximum
> number of lines for a case?
> --
> Rich Ulrich
>
> --- snip ---





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Difficult date question

Mario Giesel-2
In reply to this post by Nkem Ntonghanwah
Counter example:

5,01-Jul-04,09-Sep-04,05-Oct-04
5,23-Jan-05,30-Apr-04,15-Jun-04
5,10-Sep-04,09-Nov-04,06-Sep-04

The second instance will probably be wrong.

--------------------------------------------

Am Dienstag, 30. Juni 2020, 13:50:26 MESZ hat Nkem Ntonghanwah <[hidden email]> Folgendes geschrieben:


Dear All,
I included the following cases to the data and tested using Bruce update with restructuring etc.
Bruce version gives error messages - "In range 1st value greater than the second value, result set to missing" 
But the final result is exactly the same as using the simple aggregate and compute that I posted (below) which never generates any errors and don't depend on date range, etc..

5,01-Jul-04,09-Sep-04,05-Oct-04
5,23-Jan-05,30-Dec-04,15-Jan-04
5,10-Sep-04,09-Nov-04,06-Sep-04

 
AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /BREAK=ID
  /DIn_Min 'Start Date in'=MIN(DateIn) 
 /DOut_max 'Last day out'=MAX(DateOut).

COMPUTE Result=CompareDate >= DIn_Min & CompareDate <= DOut_max.
VARIABLE LABELS Result 'Compare day is within Start and end'.
EXECUTE.

FORMATS Result (F1).
DELETE VARIABLES DIn_Min DOut_max.

 I would like a counter example where this doesn't work
Thanks.
Forcheh

On Sat, Jun 27, 2020 at 11:36 PM Rich Ulrich <[hidden email]> wrote:
Bruce, and all,
I've noticed a solution that allows an infinite number of
lines for one case, so long as all the dates fall in one limited
range.  For the data in these examples, you could use day-of-
year, for the year 2004. I don't know how much the processing
would slow down when using 2 or 3 or 10 years.

Define a vector, InTreat001 to InTreat366, for day of year. 
Initialize:  RECODE InTreat001 to InTreat366(Else=0). 
Compute DateIn_day and DateOut_day as day-of-year (for each).
First pass - Loop from DateIn_day to DateOut_day to set values of InTreat to 1.
Aggregate by ID, keeping the MAX value, putting it on each line.
Next pass - COMPUTE Response = Intreat(CompareDate_day).
Drop InTreat.  I would be tempted to count the days while I had them.

--
Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Bruce Weaver <[hidden email]>
Sent: Saturday, June 27, 2020 3:45 PM
To: [hidden email] <[hidden email]>
Subject: Re: Difficult date question
 
Hi Rich.  You prompted me to re-read the original post more carefully, and I
believe you and Mario are right. 

Here's my first stab at it using multiple ranges.  My main aim was to
eliminate the need to manually intervene to indicate the maximum number of
rows per person in the code.  As a test, I added a 3rd person with 5 rows to
Mario's dataset.

NEW FILE.
DATASET CLOSE ALL.
DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate
(DATE).
BEGIN DATA
1,01-Sep-04,09-Sep-04,05-Oct-04
1,23-Sep-04,30-Sep-04,15-Jan-04
1,10-Sep-04,09-Nov-04,06-Sep-04
2,01-Jan-04,09-Feb-04,05-Oct-04
2,20-Feb-04,15-Mar-04,24-Sep-04
3,01-Sep-04,09-Sep-04,05-Oct-04
3,23-Sep-04,30-Sep-04,15-Jan-04
3,10-Sep-04,09-Nov-04,06-Sep-04
3,01-Jan-04,09-Feb-04,05-Oct-04
3,20-Feb-04,15-Mar-04,24-Sep-04
END DATA.

DATASET NAME raw.
DATASET COPY indates.
DATASET COPY outdates.

* Restructure InDates to wide format.
DATASET ACTIVATE indates.
DELETE VARIABLES CompareDate DateOut.
CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX.
NUMERIC junk1 (F1).
EXECUTE.

* Restructure OutDates to wide format.
DATASET ACTIVATE outdates.
DELETE VARIABLES CompareDate DateIn.
CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX.
NUMERIC junk2 (F1).
EXECUTE.

* Add wide InDate and OutDate variables to the original dataset.
DATASET ACTIVATE raw.
MATCH FILES
 FILE = * /
 TABLE='indates' /
 TABLE='outdates' /
 BY=ID.
EXECUTE.
DATASET CLOSE indates.
DATASET CLOSE outdates.

COMPUTE RESULT = 0.
* Set Result = 1 if CompareDate falls within any of the intervals for that
ID.
DO REPEAT in = DateIn1 to junk1 / out = DateOut1 to junk2.
  IF NOT Result AND NOT MISSING(in) and RANGE(CompareDate,in,out) Result =
1.
END REPEAT.
EXECUTE.
DELETE VARIABLES DateIn1 to junk2.
FORMATS Result (F1).
LIST.

Output from the LIST command (spaces between IDs inserted manually):

      ID      DateIn     DateOut CompareDate RESULT
 
       1 01-SEP-2004 09-SEP-2004 05-OCT-2004    1
       1 23-SEP-2004 30-SEP-2004 15-JAN-2004    0
       1 10-SEP-2004 09-NOV-2004 06-SEP-2004    1

       2 01-JAN-2004 09-FEB-2004 05-OCT-2004    0
       2 20-FEB-2004 15-MAR-2004 24-SEP-2004    0

       3 01-SEP-2004 09-SEP-2004 05-OCT-2004    1
       3 23-SEP-2004 30-SEP-2004 15-JAN-2004    1
       3 10-SEP-2004 09-NOV-2004 06-SEP-2004    1
       3 01-JAN-2004 09-FEB-2004 05-OCT-2004    1
       3 20-FEB-2004 15-MAR-2004 24-SEP-2004    1

I was trying to figure out a way to use VECTOR & LOOP rather than DO-REPEAT
so that I could exit the loop as soon as Result = 0.  But I could not work
out how to supply the maximum number of loops needed without hands on
intervention.  So I went with the (probably) less efficient DO-REPEAT. 

Cheers,
Bruce


Rich Ulrich wrote
> Bruce,
> Yabbut ... His is always right, and yours can fail.  I'm reading
> the requirement like he does - that the separate ranges have
> to be used. Using RANGE( ) would make his Macro easier to read.
>
> The problem with his is, Do you have a (reasonably small) maximum
> number of lines for a case?
> --
> Rich Ulrich
>
> --- snip ---





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Difficult date question

Bruce Weaver
Administrator
In reply to this post by Nkem Ntonghanwah
For some reason, I did not see any of Forcheh's posts in the Nabble archive
until today.  Presumably, they were visible sooner to those who receive
posts via email.  

If the reference below is to the revised code I posted on 27-Jun-2020 (after
being corrected by MG and RU), then yes, when I use Forcheh's new cases for
ID=5 where DateIn > DateOut on one row, I also get those warning messages.
(Note that they are warning messages, not error messages.)  But I was able
to eliminate them by making a change in the DO-IF structure near the end of
that code.  Here are the relevant bits of code, original version first
followed by the revised version.

From my code posted on 27-Jun-2020:

COMPUTE RESULT = 0.
* Set Result = 1 if CompareDate falls within any of the intervals for that
ID.
DO REPEAT in = DateIn1 to junk1 / out = DateOut1 to junk2.
  IF NOT Result AND NOT MISSING(in) and RANGE(CompareDate,in,out) Result =
1.
END REPEAT.
EXECUTE.

Revised version posted on 30-Jun-2020:

COMPUTE RESULT = 0.
* Set Result = 1 if CompareDate falls within any of the intervals for that
ID.
DO REPEAT in = DateIn1 to junk1 / out = DateOut1 to junk2.
 DO IF NOT MISSING(in) AND (in LE out) and NOT Result.
  IF RANGE(CompareDate,in,out) Result = 1.
 END IF.
END REPEAT.
EXECUTE.

HTH.

PS- To avoid confusion about which version this is referring to, here is the
complete code.

* Revised version of BW's second attempt after being corrected by MG & RU.
* Original version of this code was posted on Jun 27, 2020.
* In the dataset below, DateIn > DateOut for ID 5's 2nd row.

NEW FILE.
DATASET CLOSE ALL.
DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate
(DATE).
BEGIN DATA
1,01-Sep-04,09-Sep-04,05-Oct-04
1,23-Sep-04,30-Sep-04,15-Jan-04
1,10-Sep-04,09-Nov-04,06-Sep-04
2,01-Jan-04,09-Feb-04,05-Oct-04
2,20-Feb-04,15-Mar-04,24-Sep-04
3,01-Sep-04,09-Sep-04,05-Oct-04
3,23-Sep-04,30-Sep-04,15-Jan-04
3,10-Sep-04,09-Nov-04,06-Sep-04
3,01-Jan-04,09-Feb-04,05-Oct-04
3,20-Feb-04,15-Mar-04,24-Sep-04
5,01-Jul-04,09-Sep-04,05-Oct-04
5,23-Jan-05,30-Dec-04,15-Jan-04
5,10-Sep-04,09-Nov-04,06-Sep-04
END DATA.

* Notice that DateIn > DateOut ID 5's second observation.
* My original code would result in Warning messages for ID 5.
* I have changed the conditional statements in the DO-REPEAT
* structure below to eliminate those warnings.

DATASET NAME raw.
DATASET COPY indates.
DATASET COPY outdates.

* Restructure InDates to wide format.
DATASET ACTIVATE indates.
DELETE VARIABLES CompareDate DateOut.
CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX.
NUMERIC junk1 (F1).
EXECUTE.

* Restructure OutDates to wide format.
DATASET ACTIVATE outdates.
DELETE VARIABLES CompareDate DateIn.
CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX.
NUMERIC junk2 (F1).
EXECUTE.

* Add wide InDate and OutDate variables to the original dataset.
DATASET ACTIVATE raw.
MATCH FILES
 FILE = * /
 TABLE='indates' /
 TABLE='outdates' /
 BY=ID.
EXECUTE.
DATASET CLOSE indates.
DATASET CLOSE outdates.

COMPUTE RESULT = 0.
* Set Result = 1 if CompareDate falls within any of the intervals for that
ID.
DO REPEAT in = DateIn1 to junk1 / out = DateOut1 to junk2.
 DO IF NOT MISSING(in) AND (in LE out) and NOT Result.
  IF RANGE(CompareDate,in,out) Result = 1.
 END IF.
END REPEAT.
EXECUTE.
DELETE VARIABLES DateIn1 to junk2.
FORMATS Result (F1).
LIST.



Nkem Ntonghanwah wrote

> Dear All,
> I included the following cases to the data and tested using Bruce update
> with restructuring etc.
> Bruce version gives error messages - "In range 1st value greater than the
> second value, result set to missing"
> But the final result is exactly the same as using the simple aggregate and
> compute that I posted (below) which never generates any errors and
> don't depend on date range, etc..
>
> 5,01-Jul-04,09-Sep-04,05-Oct-04
> 5,23-Jan-05,30-Dec-04,15-Jan-04
> 5,10-Sep-04,09-Nov-04,06-Sep-04
>
>
> AGGREGATE
>   /OUTFILE=* MODE=ADDVARIABLES
>   /BREAK=ID
>   /DIn_Min 'Start Date in'=MIN(DateIn)
>  /DOut_max 'Last day out'=MAX(DateOut).
>
> COMPUTE Result=CompareDate >= DIn_Min & CompareDate <= DOut_max.
> VARIABLE LABELS Result 'Compare day is within Start and end'.
> EXECUTE.
>
> FORMATS Result (F1).
> DELETE VARIABLES DIn_Min DOut_max.
>
>  I would like a counter example where this doesn't work
> Thanks.
> Forcheh





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
Reply | Threaded
Open this post in threaded view
|

Re: Difficult date question

Rich Ulrich
Data clean-up should come first, if qualiity wasn't assured on data entry.

If I am not working with "clean data" - so that IN may be later than OUT -
then I should prefer to see early warning that the data need fixing, instead
of facing awkward questions (days later) about inconsistencies.

Patching to code to avoid warnings is not a good solution.

--
Rich Ulrich

From: SPSSX(r) Discussion <[hidden email]> on behalf of Bruce Weaver <[hidden email]>
Sent: Tuesday, June 30, 2020 10:48 AM
To: [hidden email] <[hidden email]>
Subject: Re: Difficult date question
 
For some reason, I did not see any of Forcheh's posts in the Nabble archive
until today.  Presumably, they were visible sooner to those who receive
posts via email. 

If the reference below is to the revised code I posted on 27-Jun-2020 (after
being corrected by MG and RU), then yes, when I use Forcheh's new cases for
ID=5 where DateIn > DateOut on one row, I also get those warning messages.
(Note that they are warning messages, not error messages.)  But I was able
to eliminate them by making a change in the DO-IF structure near the end of
that code.  Here are the relevant bits of code, original version first
followed by the revised version.

From my code posted on 27-Jun-2020:

COMPUTE RESULT = 0.
* Set Result = 1 if CompareDate falls within any of the intervals for that
ID.
DO REPEAT in = DateIn1 to junk1 / out = DateOut1 to junk2.
  IF NOT Result AND NOT MISSING(in) and RANGE(CompareDate,in,out) Result =
1.
END REPEAT.
EXECUTE.

Revised version posted on 30-Jun-2020:

COMPUTE RESULT = 0.
* Set Result = 1 if CompareDate falls within any of the intervals for that
ID.
DO REPEAT in = DateIn1 to junk1 / out = DateOut1 to junk2.
 DO IF NOT MISSING(in) AND (in LE out) and NOT Result.
  IF RANGE(CompareDate,in,out) Result = 1.
 END IF.
END REPEAT.
EXECUTE.

HTH.

PS- To avoid confusion about which version this is referring to, here is the
complete code.

* Revised version of BW's second attempt after being corrected by MG & RU.
* Original version of this code was posted on Jun 27, 2020.
* In the dataset below, DateIn > DateOut for ID 5's 2nd row.

NEW FILE.
DATASET CLOSE ALL.
DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate
(DATE).
BEGIN DATA
1,01-Sep-04,09-Sep-04,05-Oct-04
1,23-Sep-04,30-Sep-04,15-Jan-04
1,10-Sep-04,09-Nov-04,06-Sep-04
2,01-Jan-04,09-Feb-04,05-Oct-04
2,20-Feb-04,15-Mar-04,24-Sep-04
3,01-Sep-04,09-Sep-04,05-Oct-04
3,23-Sep-04,30-Sep-04,15-Jan-04
3,10-Sep-04,09-Nov-04,06-Sep-04
3,01-Jan-04,09-Feb-04,05-Oct-04
3,20-Feb-04,15-Mar-04,24-Sep-04
5,01-Jul-04,09-Sep-04,05-Oct-04
5,23-Jan-05,30-Dec-04,15-Jan-04
5,10-Sep-04,09-Nov-04,06-Sep-04
END DATA.

* Notice that DateIn > DateOut ID 5's second observation.
* My original code would result in Warning messages for ID 5.
* I have changed the conditional statements in the DO-REPEAT
* structure below to eliminate those warnings.

DATASET NAME raw.
DATASET COPY indates.
DATASET COPY outdates.

* Restructure InDates to wide format.
DATASET ACTIVATE indates.
DELETE VARIABLES CompareDate DateOut.
CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX.
NUMERIC junk1 (F1).
EXECUTE.

* Restructure OutDates to wide format.
DATASET ACTIVATE outdates.
DELETE VARIABLES CompareDate DateIn.
CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX.
NUMERIC junk2 (F1).
EXECUTE.

* Add wide InDate and OutDate variables to the original dataset.
DATASET ACTIVATE raw.
MATCH FILES
 FILE = * /
 TABLE='indates' /
 TABLE='outdates' /
 BY=ID.
EXECUTE.
DATASET CLOSE indates.
DATASET CLOSE outdates.

COMPUTE RESULT = 0.
* Set Result = 1 if CompareDate falls within any of the intervals for that
ID.
DO REPEAT in = DateIn1 to junk1 / out = DateOut1 to junk2.
 DO IF NOT MISSING(in) AND (in LE out) and NOT Result.
  IF RANGE(CompareDate,in,out) Result = 1.
 END IF.
END REPEAT.
EXECUTE.
DELETE VARIABLES DateIn1 to junk2.
FORMATS Result (F1).
LIST.



Nkem Ntonghanwah wrote
> Dear All,
> I included the following cases to the data and tested using Bruce update
> with restructuring etc.
> Bruce version gives error messages - "In range 1st value greater than the
> second value, result set to missing"
> But the final result is exactly the same as using the simple aggregate and
> compute that I posted (below) which never generates any errors and
> don't depend on date range, etc..
>
> 5,01-Jul-04,09-Sep-04,05-Oct-04
> 5,23-Jan-05,30-Dec-04,15-Jan-04
> 5,10-Sep-04,09-Nov-04,06-Sep-04
>
>
> AGGREGATE
>   /OUTFILE=* MODE=ADDVARIABLES
>   /BREAK=ID
>   /DIn_Min 'Start Date in'=MIN(DateIn)
>  /DOut_max 'Last day out'=MAX(DateOut).
>
> COMPUTE Result=CompareDate >= DIn_Min & CompareDate <= DOut_max.
> VARIABLE LABELS Result 'Compare day is within Start and end'.
> EXECUTE.
>
> FORMATS Result (F1).
> DELETE VARIABLES DIn_Min DOut_max.
>
>  I would like a counter example where this doesn't work
> Thanks.
> Forcheh





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Difficult date question

Nkem Ntonghanwah
In reply to this post by Mario Giesel-2
What is the correct result for each of the 3 lines?
This is what I get.

5,01-Jul-04,09-Sep-04,05-Oct-04  1
5,23-Jan-05,30-Apr-04,15-Jun-04  0
5,10-Sep-04,09-Nov-04,06-Sep-04  1
 But Bruce code fails to produce any results when these lines are added -- 
See below.

NEW FILE. 
DATASET CLOSE ALL. 
DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate 
(DATE). 
BEGIN DATA 
1,01-Sep-04,09-Sep-04,05-Oct-04 
1,23-Sep-04,30-Sep-04,15-Jan-04 
1,10-Sep-04,09-Nov-04,06-Sep-04 
2,01-Jan-04,09-Feb-04,05-Oct-04 
2,20-Feb-04,15-Mar-04,24-Sep-04 
3,01-Sep-04,09-Sep-04,05-Oct-04 
3,23-Sep-04,30-Sep-04,15-Jan-04 
3,10-Sep-04,09-Nov-04,06-Sep-04 
3,01-Jan-04,09-Feb-04,05-Oct-04 
3,20-Feb-04,15-Mar-04,24-Sep-04 
5,01-Jul-04,09-Sep-04,05-Oct-04 
5,23-Jan-05,30-Dec-04,15-Jan-04 
5,10-Sep-04,09-Nov-04,06-Sep-04 
6,01-Jan-04,09-Feb-05,05-Oct-04 
6,20-Feb-04,15-Mar-04,24-Sept-04 
4,01-Jul-04,09-Sep-04,05-Oct-04 
4,23-Jan-05,30-Apr-04,15-Jun-04 
4,10-Sep-04,09-Nov-04,06-Sep-04 
END DATA. 
EXECUTE. 
 
AGGREGATE 
  /OUTFILE=* MODE=ADDVARIABLES 
  /BREAK=ID 
  /DIn_Min 'Start Date in'=MIN(DateIn) 
 /DOut_max 'Last day out'=MAX(DateOut). 
 
COMPUTE Result=CompareDate >= DIn_Min & CompareDate <= DOut_max. 
VARIABLE LABELS Result 'Compare day is within Start and end'. 
EXECUTE. 
FORMATS Result (F1). 
list.


ID      DateIn     DateOut CompareDate     DIn_Min    DOut_max Result 
 
       1 01-SEP-2004 09-SEP-2004 05-OCT-2004 01-SEP-2004 09-NOV-2004    1 
       1 23-SEP-2004 30-SEP-2004 15-JAN-2004 01-SEP-2004 09-NOV-2004    0 
       1 10-SEP-2004 09-NOV-2004 06-SEP-2004 01-SEP-2004 09-NOV-2004    1 
       2 01-JAN-2004 09-FEB-2004 05-OCT-2004 01-JAN-2004 15-MAR-2004    0 
       2 20-FEB-2004 15-MAR-2004 24-SEP-2004 01-JAN-2004 15-MAR-2004    0 
       3 01-SEP-2004 09-SEP-2004 05-OCT-2004 01-JAN-2004 09-NOV-2004    1 
       3 23-SEP-2004 30-SEP-2004 15-JAN-2004 01-JAN-2004 09-NOV-2004    1 
       3 10-SEP-2004 09-NOV-2004 06-SEP-2004 01-JAN-2004 09-NOV-2004    1 
       3 01-JAN-2004 09-FEB-2004 05-OCT-2004 01-JAN-2004 09-NOV-2004    1 
       3 20-FEB-2004 15-MAR-2004 24-SEP-2004 01-JAN-2004 09-NOV-2004    1 
       5 01-JUL-2004 09-SEP-2004 05-OCT-2004 01-JUL-2004 30-DEC-2004    1 
       5 23-JAN-2005 30-DEC-2004 15-JAN-2004 01-JUL-2004 30-DEC-2004    0 
       5 10-SEP-2004 09-NOV-2004 06-SEP-2004 01-JUL-2004 30-DEC-2004    1 
       6 01-JAN-2004 09-FEB-2005 05-OCT-2004 01-JAN-2004 09-FEB-2005    1 
       6 20-FEB-2004 15-MAR-2004 24-SEP-2004 01-JAN-2004 09-FEB-2005    1 
       4 01-JUL-2004 09-SEP-2004 05-OCT-2004 01-JUL-2004 09-NOV-2004    1 
       4 23-JAN-2005 30-APR-2004 15-JUN-2004 01-JUL-2004 09-NOV-2004    0 
       4 10-SEP-2004 09-NOV-2004 06-SEP-2004 01-JUL-2004 09-NOV-2004    1 
 
Number of cases read:  18    Number of cases listed:  18

USING BRUCE OLD AND REVISED CODE, I get warning message
"The first sequence error was found at case 16. The result data may not be correct."
AND the following output
      ID      DateIn     DateOut CompareDate RESULT 
 
       1 01-SEP-2004 09-SEP-2004 05-OCT-2004    0 
       1 23-SEP-2004 30-SEP-2004 15-JAN-2004    0 
       1 10-SEP-2004 09-NOV-2004 06-SEP-2004    0 
       2 01-JAN-2004 09-FEB-2004 05-OCT-2004    0 
       2 20-FEB-2004 15-MAR-2004 24-SEP-2004    0 
       3 01-SEP-2004 09-SEP-2004 05-OCT-2004    0 
       3 23-SEP-2004 30-SEP-2004 15-JAN-2004    0 
       3 10-SEP-2004 09-NOV-2004 06-SEP-2004    0 
       3 01-JAN-2004 09-FEB-2004 05-OCT-2004    0 
       3 20-FEB-2004 15-MAR-2004 24-SEP-2004    0 
       5 01-JUL-2004 09-SEP-2004 05-OCT-2004    0 
       5 23-JAN-2005 30-DEC-2004 15-JAN-2004    0 
       5 10-SEP-2004 09-NOV-2004 06-SEP-2004    0 
       6 01-JAN-2004 09-FEB-2005 05-OCT-2004    0 
       6 20-FEB-2004 15-MAR-2004 24-SEP-2004    0 
       4 01-JUL-2004 09-SEP-2004 05-OCT-2004    0 
       4 23-JAN-2005 30-APR-2004 15-JUN-2004    0   
       4 10-SEP-2004 09-NOV-2004 06-SEP-2004    0

Regards
Forcheh
 


On Tue, Jun 30, 2020 at 8:33 AM [hidden email] <[hidden email]> wrote:
Counter example:

5,01-Jul-04,09-Sep-04,05-Oct-04
5,23-Jan-05,30-Apr-04,15-Jun-04
5,10-Sep-04,09-Nov-04,06-Sep-04

The second instance will probably be wrong.

--------------------------------------------

Am Dienstag, 30. Juni 2020, 13:50:26 MESZ hat Nkem Ntonghanwah <[hidden email]> Folgendes geschrieben:


Dear All,
I included the following cases to the data and tested using Bruce update with restructuring etc.
Bruce version gives error messages - "In range 1st value greater than the second value, result set to missing" 
But the final result is exactly the same as using the simple aggregate and compute that I posted (below) which never generates any errors and don't depend on date range, etc..

5,01-Jul-04,09-Sep-04,05-Oct-04
5,23-Jan-05,30-Dec-04,15-Jan-04
5,10-Sep-04,09-Nov-04,06-Sep-04

 
AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /BREAK=ID
  /DIn_Min 'Start Date in'=MIN(DateIn) 
 /DOut_max 'Last day out'=MAX(DateOut).

COMPUTE Result=CompareDate >= DIn_Min & CompareDate <= DOut_max.
VARIABLE LABELS Result 'Compare day is within Start and end'.
EXECUTE.

FORMATS Result (F1).
DELETE VARIABLES DIn_Min DOut_max.

 I would like a counter example where this doesn't work
Thanks.
Forcheh

On Sat, Jun 27, 2020 at 11:36 PM Rich Ulrich <[hidden email]> wrote:
Bruce, and all,
I've noticed a solution that allows an infinite number of
lines for one case, so long as all the dates fall in one limited
range.  For the data in these examples, you could use day-of-
year, for the year 2004. I don't know how much the processing
would slow down when using 2 or 3 or 10 years.

Define a vector, InTreat001 to InTreat366, for day of year. 
Initialize:  RECODE InTreat001 to InTreat366(Else=0). 
Compute DateIn_day and DateOut_day as day-of-year (for each).
First pass - Loop from DateIn_day to DateOut_day to set values of InTreat to 1.
Aggregate by ID, keeping the MAX value, putting it on each line.
Next pass - COMPUTE Response = Intreat(CompareDate_day).
Drop InTreat.  I would be tempted to count the days while I had them.

--
Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Bruce Weaver <[hidden email]>
Sent: Saturday, June 27, 2020 3:45 PM
To: [hidden email] <[hidden email]>
Subject: Re: Difficult date question
 
Hi Rich.  You prompted me to re-read the original post more carefully, and I
believe you and Mario are right. 

Here's my first stab at it using multiple ranges.  My main aim was to
eliminate the need to manually intervene to indicate the maximum number of
rows per person in the code.  As a test, I added a 3rd person with 5 rows to
Mario's dataset.

NEW FILE.
DATASET CLOSE ALL.
DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate
(DATE).
BEGIN DATA
1,01-Sep-04,09-Sep-04,05-Oct-04
1,23-Sep-04,30-Sep-04,15-Jan-04
1,10-Sep-04,09-Nov-04,06-Sep-04
2,01-Jan-04,09-Feb-04,05-Oct-04
2,20-Feb-04,15-Mar-04,24-Sep-04
3,01-Sep-04,09-Sep-04,05-Oct-04
3,23-Sep-04,30-Sep-04,15-Jan-04
3,10-Sep-04,09-Nov-04,06-Sep-04
3,01-Jan-04,09-Feb-04,05-Oct-04
3,20-Feb-04,15-Mar-04,24-Sep-04
END DATA.

DATASET NAME raw.
DATASET COPY indates.
DATASET COPY outdates.

* Restructure InDates to wide format.
DATASET ACTIVATE indates.
DELETE VARIABLES CompareDate DateOut.
CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX.
NUMERIC junk1 (F1).
EXECUTE.

* Restructure OutDates to wide format.
DATASET ACTIVATE outdates.
DELETE VARIABLES CompareDate DateIn.
CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX.
NUMERIC junk2 (F1).
EXECUTE.

* Add wide InDate and OutDate variables to the original dataset.
DATASET ACTIVATE raw.
MATCH FILES
 FILE = * /
 TABLE='indates' /
 TABLE='outdates' /
 BY=ID.
EXECUTE.
DATASET CLOSE indates.
DATASET CLOSE outdates.

COMPUTE RESULT = 0.
* Set Result = 1 if CompareDate falls within any of the intervals for that
ID.
DO REPEAT in = DateIn1 to junk1 / out = DateOut1 to junk2.
  IF NOT Result AND NOT MISSING(in) and RANGE(CompareDate,in,out) Result =
1.
END REPEAT.
EXECUTE.
DELETE VARIABLES DateIn1 to junk2.
FORMATS Result (F1).
LIST.

Output from the LIST command (spaces between IDs inserted manually):

      ID      DateIn     DateOut CompareDate RESULT
 
       1 01-SEP-2004 09-SEP-2004 05-OCT-2004    1
       1 23-SEP-2004 30-SEP-2004 15-JAN-2004    0
       1 10-SEP-2004 09-NOV-2004 06-SEP-2004    1

       2 01-JAN-2004 09-FEB-2004 05-OCT-2004    0
       2 20-FEB-2004 15-MAR-2004 24-SEP-2004    0

       3 01-SEP-2004 09-SEP-2004 05-OCT-2004    1
       3 23-SEP-2004 30-SEP-2004 15-JAN-2004    1
       3 10-SEP-2004 09-NOV-2004 06-SEP-2004    1
       3 01-JAN-2004 09-FEB-2004 05-OCT-2004    1
       3 20-FEB-2004 15-MAR-2004 24-SEP-2004    1

I was trying to figure out a way to use VECTOR & LOOP rather than DO-REPEAT
so that I could exit the loop as soon as Result = 0.  But I could not work
out how to supply the maximum number of loops needed without hands on
intervention.  So I went with the (probably) less efficient DO-REPEAT. 

Cheers,
Bruce


Rich Ulrich wrote
> Bruce,
> Yabbut ... His is always right, and yours can fail.  I'm reading
> the requirement like he does - that the separate ranges have
> to be used. Using RANGE( ) would make his Macro easier to read.
>
> The problem with his is, Do you have a (reasonably small) maximum
> number of lines for a case?
> --
> Rich Ulrich
>
> --- snip ---





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Difficult date question

Bruce Weaver
Administrator
In reply to this post by Rich Ulrich
Fair point.


Rich Ulrich wrote

> Data clean-up should come first, if qualiity wasn't assured on data entry.
>
> If I am not working with "clean data" - so that IN may be later than OUT -
> then I should prefer to see early warning that the data need fixing,
> instead
> of facing awkward questions (days later) about inconsistencies.
>
> Patching to code to avoid warnings is not a good solution.
>
> --
> Rich Ulrich





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
Reply | Threaded
Open this post in threaded view
|

Re: Difficult date question

Art Kendall
In reply to this post by Rich Ulrich
Hear! Hear!

Data quality checks should be done before any further analysis steps are
done. For example,
Sysmis reserved as a debugging help. All missing values are labeled user
missing with meaningful labels.
All variable values are in a plausible range.
A reasonable set of between variable checks should have been done, e.g.,
post-test date should be before pre-test date, intake-date before
discharge-date, cells in age by grade crosstab should be checked, etc.
The "check for anomalies" procedure is a major benefit.

YMMV but I am amazed when less than 95% of the work time in a project is in
cleaning and checking the data.

Out of many hundreds of datasets I have seen over the years, I can only
recall 3 that surprised me by being clean.   One of the problems with how
stat is taught is that there is no mention that the example data sets have
the peculiarity of being clean.





-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Difficult date question

Art Kendall
"Patching to code to avoid warnings is not a good solution."

That is an understatement.
The consensus in social and behavioral sciences is that Murphy was an
optimist

When there is complex logic it is often helpful to code to *get *warnings.

For example, in a long DO IF sequence, it is often helpful to end with
something like
ELSE.
      PRINT 'OOPS unaccounted for combination of conditions.'.
      PRINT /$casenum ID CondVar1 Condvar2 CondVar3.
END IF.



-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants