Kind of a summary; these are points that have been made

At 12:02 PM 6/25/2006, Martin Sherman wrote:

>A continuation of the evacuation question. I went ahead and looked at

>the time for evacuation (started to evacuate to exited onto the

>street). The correlations between floor started and time was .75 for

>one building and .77 for the other building.

It's a good idea to take a naive look. In this case, what the

correlation means is, it takes longer to walk down more stairs. It's

fine to look at the correlation, but in this case, I don't think it can

be regarded as telling you much you didn't know.

>Next I did a regression analysis allowing for linear and quadratic. In

>both instances the linear and quadratic functions were significant.

>Next I allowed for a cubic function and this is what happened. For

>building 1 Linear was significant, quadratic and cubic were not

>significant. For building 2 linear was not significant, but quadratic

>and cubic were significant. My question is how would I lose

>significance for the quadratic when the cubic was allowed to enter for

>building 1. Also why would I lose significance for linear (given the

>very large zero-order correlation-my expectation was that the linear

>would stay in) for building 2 while

>picking up the cubic (besides the quadratic). Befuddled. TIA.

You should probably post the exact model you fit, and means and SDs of

the dependent and independent variables.

To say again what others have said: The linear, quadratic, and cubic

functions of a variable are VERY HIGHLY CORRELATED under very ordinary

conditions. Having all values positive (as yours are) is enough. The

larger the mean is relative to the SD, the worse. Look at the

correlation matrix of the predictors - linear, quadratic, and cubic -

that you used.

As Paul Swank said, you can solve the correlation between linear and

quadratic terms by centering the independent variable. Choosing a

convenient point near the middle will do; it needn't be the exact mean.

In your case, if time to evacuate is the dependent and starting floor

the independent, I'd keep the linear term uncentered, so its

coefficient has the natural meaning, and the constant has a reasonable

interpretation: mean time to start evacuation. But for 75 or so floors

to evacuate, I'd use, say, (Floor-40)**2 as the quadratic term. You can

interpret this as taking time per floor from the 40th floor as the

norm, and the quadratic term as the systematic change in time per floor

above and below the 40th.

Centering won't remove the correlation between linear and cubic terms.

As a cubic term, you might try subtracting the linear component:

(Floor-40)**3 - (Floor-40)

Notice that, here, I'm centering both the linear and the cubic

components of the variable. It make its shape, if plot it, much more

meaningful.

Finally, it's a common rule, in fitting polynomials, that if any term

is included, all lower terms must also be included. Keep the linear

term, whether it loses 'significance' or not. But I think, if you

transform like this, that the linear term will stay dominant.

Has the question of data censoring been settled? One would think this

involved censored data, since there's no observation for people who

failed to make it out of the buildings. However, as I understand it,

the pattern was peculiar: there was enough time to evacuate, and little

or no cutoff for people who were lost because the building collapsed

before they completed evacuation. The losses were people above the

points of impact, who were cut off from evacuation altogether, and need

to be excluded from this model.

A sad business, this. It is worth knowing about, though.