Here are a couple general comments.

While the normal distribution might be a useful

assumed distribution for errors in regression, there

is no reason to think that it is necessarily useful for

summarizing all phenomena out there in the world.

As you have described your data, they are counts.

In other words, values are 1, 2, 3 etc., and not

real values in some interval.

Are you looking at consumption in some fixed unit of time -

say week, month, year? Given some assumptions, there

are distributions such as the poisson that might

be appropriate. It also could be the case that

what you are studying represents a mixture of types,

say usage types (low, medium, high), though that may or

may not be the case here.

Pete Fader(Wharton) and Bruce Hardie(London Business School)

have a nice course on probability models in marketing that is

regularly given at AMA events.

-----Original Message-----

From: SPSSX(r) Discussion [mailto:

[hidden email]] On Behalf Of

Stevan Nielsen

Sent: Thursday, July 13, 2006 10:12 AM

To:

[hidden email]
Subject: A Distinctly Non-Normal Distribution

Dear Colleagues,

I have stumbled upon an interesting phenomenon: I have discovered that

consumption of a valuable resource conforms to a very regular, reverse

J-shaped distribution. The modal case in our large sample (N = 16,000)

consumes one unit, the next most common case consumes two units, the

next most common three units, the next most common four units -- and

this is the median case, and so on. The average is at about 9.7 units,

which falls between the 72nd and 73rd percentile in the distribution --

clearly NOT an indicator of central tendency.

I used SPSS Curve Estimation to examine five functional relationships

between units consumed and proportion of consumers in the sample,

testing proportion of consumers in the sample as linear, logarithmic,

inverse, quadratic, or cubic functions of number of units consumed. I

found that the reciprocal model, estimating proportion of cases as the

inverse of units consumed, was clearly the best solution, yielding a

remarkable, and very reliable R2 = .966. All five models were reliable,

but the next best was the logarithmic solution, with R2 = .539; worst

was the linear model, with R2 = .102.

These seems like a remarkably regular, quite predictable relationship.

I've spent my career so enamored with normal distributions that I'm not

sure what to make of this distribution. I have several questions for

your consideration:

Do any of you have experience with such functions? (I believe it would

be correct to call this a decay functions.)

Where are such functions most likely to occur in nature, commerce,

epidemiology, genetics, healthcare, and so on?

What complications arise when attempting to form statistical inferences

where such population distributions are present? (We have other

measurements for subjects in this distributions, measurements which are

quite nicely normal in their distributions.)

Your curious colleague,

lars nielsen

Stevan Lars Nielsen, Ph.D.

Brigham Young University

801-422-3035; fax 801-422-0175