Read and Merge Excel Files Utility

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Read and Merge Excel Files Utility

SPSSNewb89
I have been trying to use the above tool to extract a particular cell range
from a set of excel workbooks (602 separate files).

The issue I have is that the excel files are the result of scanning some pdf
documents (original data source was lost) and they do not conform to normal
formatting. I am interested in a cell range of rows 8 - 18 and columns 2 -
10 within each workbook.

I have installed xlrd and it is working. Can you suggest syntax that would
do the job?

thanks

Damian

_____________________________________
Sent from http://spssx-discussion.1045642.n5.nabble.com



_____________________________________
Sent from http://spssx-discussion.1045642.n5.nabble.com





--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Read and Merge Excel Files Utility

Jon Peck
Why are you using sled instead of the regular SPSS Database Wizard or the direct Excel import command?  I’m presuming you want this in SPSS.  And a sample of the data, one of the Excel files, would be helpful.

On Mon, Nov 11, 2019 at 7:04 PM SPSSNewb89 <[hidden email]> wrote:
I have been trying to use the above tool to extract a particular cell range
from a set of excel workbooks (602 separate files).

The issue I have is that the excel files are the result of scanning some pdf
documents (original data source was lost) and they do not conform to normal
formatting. I am interested in a cell range of rows 8 - 18 and columns 2 -
10 within each workbook.

I have installed xlrd and it is working. Can you suggest syntax that would
do the job?

thanks

Damian

_____________________________________
Sent from http://spssx-discussion.1045642.n5.nabble.com



_____________________________________
Sent from http://spssx-discussion.1045642.n5.nabble.com





--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Fwd: [SPSSX-L] Read and Merge Excel Files Utility

Jon Peck


(posted to the listserv without the attachment).



You can read the data area with this code.

GET DATA
  /TYPE=XLSX
  /FILE='R:\IBM\users\gibbins\example.xlsx'
  /CELLRANGE=RANGE 'B8:I18'
  /READNAMES=ON
  /HIDDEN IGNORE=YES.

The values will all come in as strings, since they don't look like SPSS numeric formats, and there are some spurious empty variables V3, v5, v7, but you can delete them later.  I specified row 8 as having the variable names.  The sav file generated for your example Excel file is attached.

That is only the beginning of the work, however.  The rest depends on what you want to do with these datasets after reading.  In order to process all the files at once, you can use the SPSSINC PROCESS FILES extension command to iterate a file of syntax over all the files in a directory or some wildcard specification.  You would have a SAVE command plus the GET DATA command in the file to be applied to each.

Then you could use the same approach to merge all these save files together, but it is possible that the string variables will sometimes have different widths depending on the data, since Excel cell widths vary with the data.  That can be fixed with the STATS ADJUST WIDTHS extension command to make everything consistent.

Finally you can merge all these together using ADD FILES.  However, that command can only merge 50 files at a time.  Another application of SPSSINC PROCESS FILES could incrementally combine these.

You can install extension commands using the Extensions > Extension Hub menu.

To see all the details on how to specify SPSSINC PROCESS FILES, look at the syntax help for the command.  To see this, type SPSSINC PROCESS FILES in the syntax editor, put the cursor on that command, and press F1.  It will take a bit of work to get that set up right, so read the help carefully.



On Mon, Nov 11, 2019 at 10:43 PM Damian Gibbins <[hidden email]> wrote:
Hi Jon

Thanks for getting back to me.

I have attached one of the typical forms.

If I could extract all data in the range B9:I18 I would be happy. I am aiming to extract this range for all 601 other similarly formatted files.

I am happy to follow any methodology you suggest.

thanks

Damian.

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Read and Merge Excel Files Utility

Rick Oliver
In reply to this post by Jon Peck

As Jon notes, you can import Excel files directly into SPSS using the GET DATA /TYPE=XLS, and you can use the CELLRANGE subcommand to specify the block of rows and columns to import. If you have to import 600+ files, you can wrap it in a simple Python program that iterates through all the files.

 

 

Sent from Mail for Windows 10

 

From: [hidden email]
Sent: Monday, November 11, 2019 10:03 PM
To: [hidden email]
Subject: Re: Read and Merge Excel Files Utility

 

Why are you using sled instead of the regular SPSS Database Wizard or the direct Excel import command?  I’m presuming you want this in SPSS.  And a sample of the data, one of the Excel files, would be helpful.

 

On Mon, Nov 11, 2019 at 7:04 PM SPSSNewb89 <[hidden email]> wrote:

I have been trying to use the above tool to extract a particular cell range
from a set of excel workbooks (602 separate files).

The issue I have is that the excel files are the result of scanning some pdf
documents (original data source was lost) and they do not conform to normal
formatting. I am interested in a cell range of rows 8 - 18 and columns 2 -
10 within each workbook.

I have installed xlrd and it is working. Can you suggest syntax that would
do the job?

thanks

Damian

_____________________________________
Sent from http://spssx-discussion.1045642.n5.nabble.com



_____________________________________
Sent from http://spssx-discussion.1045642.n5.nabble.com





--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Read and Merge Excel Files Utility

Mario Giesel-2
Based on Ricks comments this code could work (without Python):

------------------------------------------------------------------------
* Insert your working directory.
FILE HANDLE path /NAME = "mypath". 

* Insert your filenames; get them with "dir" command in the command prompt window.
DEFINE !files() /* files without filename extension
myfile1
myfile2
myfile3
!ENDDEFINE.

* Verify subcommands TYPE, SHEET, READNAMES.
DEFINE !extract()
PRESERVE.
SET MPRINT = ON PRINTBACK = ON.
!DO !f !IN (!EVAL(!files))
GET DATA
 /TYPE = XLSX
 /FILE = !QUOTE(!CONCAT('path\',!f,'.xlsx'))
 /SHEET= INDEX 1
 /CELLRANGE = RANGE 'B8:J18'
 /READNAMES = ON.
SAVE OUTFILE = !QUOTE(!CONCAT('path\',!f,'.sav')).
!DOEND
RESTORE.
!ENDDEFINE.

* Call.
!extract.

------------------------------------------------------------------------

Am Dienstag, 12. November 2019, 15:40:47 MEZ hat Rick Oliver <[hidden email]> Folgendes geschrieben:


As Jon notes, you can import Excel files directly into SPSS using the GET DATA /TYPE=XLS, and you can use the CELLRANGE subcommand to specify the block of rows and columns to import. If you have to import 600+ files, you can wrap it in a simple Python program that iterates through all the files.

 

 

Sent from Mail for Windows 10

 

From: [hidden email]
Sent: Monday, November 11, 2019 10:03 PM
To: [hidden email]
Subject: Re: Read and Merge Excel Files Utility

 

Why are you using sled instead of the regular SPSS Database Wizard or the direct Excel import command?  I’m presuming you want this in SPSS.  And a sample of the data, one of the Excel files, would be helpful.

 

On Mon, Nov 11, 2019 at 7:04 PM SPSSNewb89 <[hidden email]> wrote:

I have been trying to use the above tool to extract a particular cell range
from a set of excel workbooks (602 separate files).

The issue I have is that the excel files are the result of scanning some pdf
documents (original data source was lost) and they do not conform to normal
formatting. I am interested in a cell range of rows 8 - 18 and columns 2 -
10 within each workbook.

I have installed xlrd and it is working. Can you suggest syntax that would
do the job?

thanks

Damian

_____________________________________
Sent from http://spssx-discussion.1045642.n5.nabble.com



_____________________________________
Sent from http://spssx-discussion.1045642.n5.nabble.com





--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD