# How can I compare two columns of Text of different length?

7 messages
Open this post in threaded view
|
Report Content as Inappropriate

## How can I compare two columns of Text of different length?

 Thank you very much for your help! I am using SPSS24. I have two groups of people writing definitions for the same course. The definitions are of different length using different words. Of course some Key words may match. I want to find out how each row of one column is similar to each row of another column. If they are similar, do they match on one Key word or three Key words? Can I get a number of those match for each category, say how many rows have three key word matching, or how many rows have 5 key word matching?
Open this post in threaded view
|
Report Content as Inappropriate

## Re: How can I compare two columns of Text of different length?

 Do you have pairs of responses?  or do you have a single columns and 2 groups of cases? Please create a small subset of your data so we can better understand how it is set up. Then create a syntax file that says DISPLAY DICTIONARY. Copy that output and paste it into a reply on this list Art Kendall Social Research Consultants
Open this post in threaded view
|
Report Content as Inappropriate

## Re: How can I compare two columns of Text of different length?

 For example, one column says,     Examines the psychological development of individuals moving from their early twenties into old age. The other says,     Developmental and Child Psychology Here the word Psychology may be counted as match. There are over 40,000 cases in both columns. But we don't know what words they are in each case of each row.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: How can I compare two columns of Text of different length?

 Administrator But psychological  and Psychology are *NOT* the same word. How do you propose to resolve this? -- Some ideas. 1. SPLIT the strings into two VECTORS (search this archive for Parse). Two alternatives. 2a. Take these vectors from wide to long using VARSTOCASES. 3a. Do a Cartesian merge of the two vectors (search archives for this). 2-3b.  Compare the two vectors with a nested LOOP. 4ab. Decide if the various substrings should be considered the same by applying an appropriate distance function (see archives, this has been discussed. 5ab.  Ennumerate matches with AGGREGATE and merge to original file. Sorry for lack of specific detail but I'm slammed.  Maybe this will get the ball rolling in the right direction. HTH. -- ljttet wrote For example, one column says,     Examines the psychological development of individuals moving from their early twenties into old age. The other says,     Developmental and Child Psychology Here the word Psychology may be counted as match. There are over 40,000 cases in both columns. But we don't know what words they are in each case of each row. Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Open this post in threaded view
|
Report Content as Inappropriate

## Re: How can I compare two columns of Text of different length?

 In reply to this post by ljttet You should really have full-fledged text analysis software to handle stemming of word forms, but a rough approach could be carried out if you can provide more details, such as how do you deal with strings where the same word appears more than once?  And, presumably, you would want to ignore common words such as a, the, if, and, but, .... and perhaps do something rough about plurals such as always ignoring a final s.  Similarity of words could be exact match or, say, within a few keystrokes.  If you wanted to make lists of word forms for important words, that could also be addressed.  Are you always comparing two variables in the same case, or is cross-case comparison required?On Wed, May 10, 2017 at 11:22 AM, ljttet wrote:For example, one column says, Examines the psychological development of individuals moving from their early twenties into old age. The other says, Developmental and Child Psychology Here the word Psychology may be counted as match. There are over 40,000 cases in both columns. But we don't know what words they are in each case of each row. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/How-can-I-compare-two-columns-of-Text-of-different-length-tp5734133p5734135.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD -- Jon K Peck[hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Open this post in threaded view
|
Report Content as Inappropriate