|
SPSS 20
am I right, that if in a GLM procedure SPSS encounters an overestimated model (for example more colums than rows - more predictors than cases), SPSS not just returns an error message and stops, but tries to solve the equations by deleting 'unnecessary' predictors ? (R just returned error messages in these cases). Frank Dr. Frank Gaeth
FU-Berlin |
|
Frank,
Try it and report back!
|
|
David,
just because it works in the following example doesn't mean, it always works like that. (In the following example the pre_1 fits exactly: R = 1. However, the t- and sig.-values are no longer printed.) input program. loop a =1 to 100 by 1. end case. end loop. end file. end input program. EXECUTE. VECTOR v(120). DO REPEAT #i = v1 to v120. COMPUTE #i=RV.NORMAL(0,1). END REPEAT. EXECUTE. COMPUTE w=SUM(v1 to v120) + RV.NORMAL(0,1). EXECUTE. REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT w /METHOD=ENTER v1 to v120 /SAVE PRED. Dr. Frank Gaeth
FU-Berlin |
|
What else would you expect?
--
|
|
well, R would just return an error message.
The R-extention 'robust regression' also: Warnings Subcommand : ENTER must specify a valid variable list. Execution of this command stops. So, am I right, that SPSS is <always> deleting the list of variables until the model is no longer overestimated? Frank Dr. Frank Gaeth
FU-Berlin |
|
Frank,
Look at the algorithms ( ftp://public.dhe.ibm.com/software/analytics/spss/documentation/statistics/20.0/en/client/Manuals/IBM_SPSS_Statistics_Algorithms.pdf )! Variables are *NOT* deleted, they are simply not entered if the tolerance is too low. With more variables than cases, at some point (k <=N) the remaining variables will have a tolerance of 0. Why are you trying to regress more variables than you have cases? Sounds fishy to me.
|
|
Why are you trying to regress more variables than you have cases? Sounds fishy to me.
Because I'm a fisher ; -) Fishing for the best odds in sportsbetting. (Engeneering approach: "erlaubt ist, was funktioniert" *) Frank *) allowed is, what works - Dr. Frank Gaeth
FU-Berlin |
|
If you aren't careful you'll find this sucker in your net ;-)
![]()
|
|
I'll bet.
Dr. Frank Gaeth
FU-Berlin |
|
In reply to this post by drfg2008
For more effective fishing - Use Step 1 to put into the equation
only the one or two variables that you are sure must belong there. Then, ask for SPSS to print out the statistics on everything "not in the equation." That's what you want to look at, anyway. -- Rich Ulrich > Date: Sun, 19 Feb 2012 00:38:43 -0800 > From: [hidden email] > Subject: Re: overestimated model (GLM) > To: [hidden email] > > /Why are you trying to regress more variables than you have cases? Sounds > fishy to me. / > > > Because I'm a fisher ; -) > > Fishing for the best odds in sportsbetting. (Engeneering approach: "erlaubt > ist, was funktioniert" *) > > Frank > > > *) > allowed is, what works - "Whatever works, do it." Or just, "Whatever works ...." > ... |
|
@ Rich Ulrich
"Whatever works, do it." Or just, "Whatever works ...." Sure. Sorry for my google translations. However: Would it make sense use blocks, and to put into block 1 of the equation only the one or two variables that I am sure must belong there and to use all the others in a block 2 ? (to print out anything wouldn't make sense, since the system is expected to run automatically and as fast as possible, and during the night, when I want to sleep ;- ) Dr. Frank Gaeth
FU-Berlin |
|
What I had in mind --
Enter Must-belong in block 1, and examine the out-statistics on the rest. Figure out what to do next. There never was a block 2, so far as Entering was concerned. Data-mining is potentially legitimate, but stepwise inclusion from EVERYTHING has extremely limited value. Almost none. When you start with a very large sample, you can use some cases for "training" and most of the cases for very extensive cross-validation. Otherwise, your results are mainly capitalizing on chance. I presume that you want something that might replicate. Selecting from 100 variables almost guarantees that your next variables, beyond the obvious and face-valid ones, will include a large share of "random contributors". You can Google for < Frank Harrell stepwise > to get some good comments on the drawbacks of stepwise. Especially with limited N -- I would want to get rid of variables, either by dumping a bunch entirely, or by creating composites to replace them. -- Rich Ulrich > Date: Sun, 19 Feb 2012 14:10:29 -0800 > From: [hidden email] > Subject: Re: overestimated model (GLM) > To: [hidden email] > > @ Rich Ulrich > > / > "Whatever works, do it." Or just, > "Whatever works ...."/ > > > Sure. Sorry for my google translations. > > > However: Would it make sense use blocks, and to put into block 1 of the > equation > only the one or two variables that I am sure must belong > there and to use all the others in a block 2 ? > > (to print out anything wouldn't make sense, since the system is expected to > run automatically and as fast as possible, and during the night, when I want > to sleep ;- ) > >... |
| Powered by Nabble | See how NAML generates this page |
