Quantcast

Delete cases with more than 80% of missing data; Handling duplicates

classic Classic list List threaded Threaded
2 messages Options
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Delete cases with more than 80% of missing data; Handling duplicates

Nogitsune
Good evening, 

I'm new to SPSS and trying my best to find all sorts of manuals and guides to help me understand it better. Right now I am dealing with a substantial dataset (3000 cases and 1500 variables). It consists of respondents providing answers to various psychological measures. I need to be able to delete cases with more than 80% of missing data. How can I automate this process? 

Also, I have about 500 duplicate cases. I need to compare cases with identical names to each other and delete the one that has less data filled in. Is there any way to do it without going manually through each pair over all 1500 variables? 

Thank you! 
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Delete cases with more than 80% of missing data; Handling duplicates

Jon Peck
On the first point, see Transform > Count Values within Cases (COUNT) and select system or system and user missing as the value to count.  Then you can use Data > Select Cases (SELECT IF) to delete cases with too many missings.

For the second point, you will have missing counts from the first step.  Then with the file sorted by name and the count and SELECT IF with the lag function, you can pick out the cases to keep.  Exact syntax depends on details such as whether there can be more than one duplicate for a case.

On Mon, Feb 20, 2017 at 8:22 PM, Kseniya Katsman <[hidden email]> wrote:
Good evening, 

I'm new to SPSS and trying my best to find all sorts of manuals and guides to help me understand it better. Right now I am dealing with a substantial dataset (3000 cases and 1500 variables). It consists of respondents providing answers to various psychological measures. I need to be able to delete cases with more than 80% of missing data. How can I automate this process? 

Also, I have about 500 duplicate cases. I need to compare cases with identical names to each other and delete the one that has less data filled in. Is there any way to do it without going manually through each pair over all 1500 variables? 

Thank you! 
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Loading...