How to correct sequential and/or between dates of events in SPSS?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How to correct sequential and/or between dates of events in SPSS?

FelipeMS
Hi guys, I need some help here.
I have a dataset with hospitalization events (1 per row) with the date that
the patient was admitted (EntryDate) and discharged (OutDate). The situation
is that the hospitals "break" these dates in order to charge the health
operators time by time. So, I need to correct the dates of the rows that are
for the same patient and have sequential dates or are between the dates of
another event. Below I paste an example:
Patient ID EntryDate OutDate
0000001 14/nov/14 19/nov/14
0000001 14/nov/14 14/nov/14
0000001 15/nov/14 15/nov/14
0000001 19/nov/14 22/nov/14
These 4 rows represent the same event (you can see that every date is
sequential - its starts on 14/nov/14 and goes until 22/nov/14), but it was
broken by the hospital in order to charge the health operators more often.
One of the solutions I thought was to compare one row with the previous row
to check if the EntryDate is between the previous EntryDate and OutDate OR
if the EntryDate is equal or ate least have only one day of difference of
the previous OutDate. The problem is that this logical test fails in the
last row, because I am only able to correct the one date variable at a time.
Can someone help me with this? The database has more than 1 million rows so
manually correcting is impossible.
I'm using SPSS version 24.

Thanks!



--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: How to correct sequential and/or between dates of events in SPSS?

William Dudley-2
Felipe
It might work to use the aggregate function to create min max entry dates .
These dates with then show up on each record for a given patient thus 
you do not have to use the lag function to compare the previous records.

Maybe something like this would work.
Not sure about the details but maybe this is a start.

SORT CASES BY PT_ID.
AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /PRESORTED
  /BREAK=PT_ID
  /Entry_min=MIN(entrydate) 
  /Entry_max=MAX(entrydate).

Bill


On Sat, May 5, 2018 at 9:31 PM, FelipeMS <[hidden email]> wrote:
Hi guys, I need some help here.
I have a dataset with hospitalization events (1 per row) with the date that
the patient was admitted (EntryDate) and discharged (OutDate). The situation
is that the hospitals "break" these dates in order to charge the health
operators time by time. So, I need to correct the dates of the rows that are
for the same patient and have sequential dates or are between the dates of
another event. Below I paste an example:
Patient ID EntryDate OutDate
0000001 14/nov/14 19/nov/14
0000001 14/nov/14 14/nov/14
0000001 15/nov/14 15/nov/14
0000001 19/nov/14 22/nov/14
These 4 rows represent the same event (you can see that every date is
sequential - its starts on 14/nov/14 and goes until 22/nov/14), but it was
broken by the hospital in order to charge the health operators more often.
One of the solutions I thought was to compare one row with the previous row
to check if the EntryDate is between the previous EntryDate and OutDate OR
if the EntryDate is equal or ate least have only one day of difference of
the previous OutDate. The problem is that this logical test fails in the
last row, because I am only able to correct the one date variable at a time.
Can someone help me with this? The database has more than 1 million rows so
manually correcting is impossible.
I'm using SPSS version 24.

Thanks!



--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
William N. Dudley, PhD
Professor - Public Health Education
The School of Health and Human Sciences
The University of North Carolina at Greensboro
437-L Coleman Building
Greensboro, NC 27402-6170
See my research on
ResearchGate
VOICE 336.256 2475

email signature image example.png

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: How to correct sequential and/or between dates of events in SPSS?

Andy W
Aggregate will work if you assume a patient ID is unique. If a single patient
can have multiple stays though AGGREGATE will not quite work. See the
example below and one potential option for code to work around it -- I
imagine though you can find some tricky cases that this does not work for
(especially with missing data).

*****************************************************.
DATA LIST FREE /PatientID (N7) EntryDate OutDate (2DATE8).
BEGIN DATA
0000001 14/nov/14 19/nov/14
0000001 14/nov/14 14/nov/14
0000001 15/nov/14 15/nov/14
0000001 19/nov/14 22/nov/14
0000001 25/nov/14 30/nov/14
END DATA.
DATASET NAME TestDays.
DATASET ACTIVATE TestDays.
*Note case #5 should be a separate set of days.

SORT CASES BY PatientID EntryDate OutDate.
DO IF PatientID <> LAG(PatientID) OR $casenum = 1.
  COMPUTE Set = 1.
  COMPUTE End = OutDate.
ELSE.
  *Due to sort, this rows EntryDate should be larger.
  *Or equal to prior rows EntryDate.
  DO IF EntryDate <= LAG(End).
    COMPUTE Set = LAG(Set).
    COMPUTE End = MAX(LAG(End),OutDate).
  ELSE.
    COMPUTE Set = LAG(Set) + 1.
    COMPUTE End = OutDate.
  END IF.
END IF.
EXECUTE.
*Before this will want to make sure OutDate >= EntryDate.
*And that there is no missing data.

*Now we can aggregate up to the different sets of days per each patient.
DATASET DECLARE Stays.
AGGREGATE OUTFILE='Stays'
  /BREAK PatientID Set
  /BeginTime = MIN(EntryDate)
  /EndTime = MAX(OutDate).
*****************************************************.



-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: How to correct sequential and/or between dates of events in SPSS?

FelipeMS
In reply to this post by William Dudley-2
Hi William,

Many thanks for sparing some time to try to solve my problem!
 
That would work if I had an unique ID to identify the event, but I don't have.
My unique ID is for the patient.
And each patient has several events of hospitalization.
So my intention with the correction of dates is to also identify separated events.
For that, my criteria is that events that occur with more than 1 day of difference would be different events.
Rereading my explanation of the problem I realized that I wasn't clear on that...

Kind regards,

Felipe

2018-05-06 15:53 GMT-03:00 William Dudley <[hidden email]>:
Felipe
It might work to use the aggregate function to create min max entry dates .
These dates with then show up on each record for a given patient thus 
you do not have to use the lag function to compare the previous records.

Maybe something like this would work.
Not sure about the details but maybe this is a start.

SORT CASES BY PT_ID.
AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /PRESORTED
  /BREAK=PT_ID
  /Entry_min=MIN(entrydate) 
  /Entry_max=MAX(entrydate).

Bill


On Sat, May 5, 2018 at 9:31 PM, FelipeMS <[hidden email]> wrote:
Hi guys, I need some help here.
I have a dataset with hospitalization events (1 per row) with the date that
the patient was admitted (EntryDate) and discharged (OutDate). The situation
is that the hospitals "break" these dates in order to charge the health
operators time by time. So, I need to correct the dates of the rows that are
for the same patient and have sequential dates or are between the dates of
another event. Below I paste an example:
Patient ID EntryDate OutDate
0000001 14/nov/14 19/nov/14
0000001 14/nov/14 14/nov/14
0000001 15/nov/14 15/nov/14
0000001 19/nov/14 22/nov/14
These 4 rows represent the same event (you can see that every date is
sequential - its starts on 14/nov/14 and goes until 22/nov/14), but it was
broken by the hospital in order to charge the health operators more often.
One of the solutions I thought was to compare one row with the previous row
to check if the EntryDate is between the previous EntryDate and OutDate OR
if the EntryDate is equal or ate least have only one day of difference of
the previous OutDate. The problem is that this logical test fails in the
last row, because I am only able to correct the one date variable at a time.
Can someone help me with this? The database has more than 1 million rows so
manually correcting is impossible.
I'm using SPSS version 24.

Thanks!



--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
William N. Dudley, PhD
Professor - Public Health Education
The School of Health and Human Sciences
The University of North Carolina at Greensboro
437-L Coleman Building
Greensboro, NC 27402-6170
See my research on
ResearchGate
VOICE 336.256 2475

email signature image example.png


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: How to correct sequential and/or between dates of events in SPSS?

FelipeMS
In reply to this post by Andy W
Hi Andy,

Yes, I have multiple events for the same patient. I'm defining separated
events as those that have more than 1 day of difference. I will try your
output later and I will provide here a feedback, but many thanks for trying
to help too!



--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: How to correct sequential and/or between dates of events in SPSS?

David Marso
Administrator
In reply to this post by Andy W
Here is a similar approach.  Probably similar caveats ;-)
--
DATA LIST FREE /PatientID (N7) EntryDate OutDate (2DATE9).
BEGIN DATA
0000001 14/nov/14 19/nov/14
0000001 14/nov/14 14/nov/14
0000001 15/nov/14 15/nov/14
0000001 19/nov/14 22/nov/14
0000001 25/nov/14 30/nov/14
0000002 14/nov/14 19/nov/14
0000002 14/nov/14 14/nov/14
0000002 15/nov/14 15/nov/14
0000002 19/nov/14 22/nov/14
0000002 25/nov/14 30/nov/14
END DATA.
DATASET NAME TestDays.
DATASET ACTIVATE TestDays.
*Note cases #5 and 10 should be a separate sets of days.
SORT CASES BY PatientID EntryDate OutDate.
IF ($CASENUM EQ 1 OR PatientID NE LAG(PatientID)) Group=1.
IF PatientID EQ LAG(PatientID)
Group=SUM(LAG(Group),DATEDIFF(EntryDate,LAG(OutDate),"days") GT 1).
AGGREGATE OUTFILE * MODE ADDVARIABLES /BREAK PatientID Group /NewEntry
=MIN(EntryDate)/NewOut=MAX(OutDate).
IF PatientID EQ LAG(PatientID)
Group=SUM(LAG(Group),DATEDIFF(NewEntry,LAG(NewOut),"days") GT 1).
AGGREGATE OUTFILE * MODE ADDVARIABLES OVERWRITE=YES /BREAK PatientID Group
/NewEntry =MIN(EntryDate)/NewOut=MAX(OutDate).




Andy W wrote

> Aggregate will work if you assume a patient ID is unique. If a single
> patient
> can have multiple stays though AGGREGATE will not quite work. See the
> example below and one potential option for code to work around it -- I
> imagine though you can find some tricky cases that this does not work for
> (especially with missing data).
>
> *****************************************************.
> DATA LIST FREE /PatientID (N7) EntryDate OutDate (2DATE8).
> BEGIN DATA
> 0000001 14/nov/14 19/nov/14
> 0000001 14/nov/14 14/nov/14
> 0000001 15/nov/14 15/nov/14
> 0000001 19/nov/14 22/nov/14
> 0000001 25/nov/14 30/nov/14
> END DATA.
> DATASET NAME TestDays.
> DATASET ACTIVATE TestDays.
> *Note case #5 should be a separate set of days.
>
> SORT CASES BY PatientID EntryDate OutDate.
> DO IF PatientID <> LAG(PatientID) OR $casenum = 1.
>   COMPUTE Set = 1.
>   COMPUTE End = OutDate.
> ELSE.
>   *Due to sort, this rows EntryDate should be larger.
>   *Or equal to prior rows EntryDate.
>   DO IF EntryDate <= LAG(End).
>     COMPUTE Set = LAG(Set).
>     COMPUTE End = MAX(LAG(End),OutDate).
>   ELSE.
>     COMPUTE Set = LAG(Set) + 1.
>     COMPUTE End = OutDate.
>   END IF.
> END IF.
> EXECUTE.
> *Before this will want to make sure OutDate >= EntryDate.
> *And that there is no missing data.
>
> *Now we can aggregate up to the different sets of days per each patient.
> DATASET DECLARE Stays.
> AGGREGATE OUTFILE='Stays'
>   /BREAK PatientID Set
>   /BeginTime = MIN(EntryDate)
>   /EndTime = MAX(OutDate).
> *****************************************************.
>
>
>
> -----
> Andy W

> apwheele@

> http://andrewpwheeler.wordpress.com/
> --
> Sent from: http://spssx-discussion.1045642.n5.nabble.com/
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"