Correlational Example Involving COVID-19 Useful for Classes

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Correlational Example Involving COVID-19 Useful for Classes

Mike
This Washington Post article provides an interesting example of correlation
between the "percentage of people one knows with COVID-19 symptoms"
and "percent of people wearing masks in public" -- the unit of analysis is U.S.
states.  There is a nice scatterplot and the data used is provided in a table
(comes from Delphi CPVODCast, Carnegie-Mellon U). The R^2 is 0.72 (r = 0.85).
I have not checked the Carnegie-Mellon source but they may be more
interesting data/analysis available there.


-Mike Palij
New York University

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Fwd: Correlational Example Involving COVID-19 Useful for Classes

Mike
Note:  r = -0.85; the R^2 is provided in the article and I used a calculator to
get the square root and forgot to include the negative sign (as the percentage
of mask users in a state increases, the fewer one knows people with COVID-19
symptoms).  Sorry about that.

-Mike Palij
New York University



---------- Forwarded message ---------
From: Michael Palij <[hidden email]>
Date: Fri, Oct 23, 2020 at 1:17 PM
Subject: Correlational Example Involving COVID-19 Useful for Classes
To: SPSS list <[hidden email]>
Cc: Michael Palij <[hidden email]>


This Washington Post article provides an interesting example of correlation
between the "percentage of people one knows with COVID-19 symptoms"
and "percent of people wearing masks in public" -- the unit of analysis is U.S.
states.  There is a nice scatterplot and the data used is provided in a table
(comes from Delphi CPVODCast, Carnegie-Mellon U). The R^2 is 0.72 (r = 0.85).
I have not checked the Carnegie-Mellon source but they may be more
interesting data/analysis available there.


-Mike Palij
New York University

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Correlational Example Involving COVID-19 Useful for Classes

Bruce Weaver
Administrator
Thanks Mike.  It's a nice example of an ecological correlation.  For some
reason, it reminded me of this other well-known example:

https://i.insider.com/5353e29b6da8115322dd4816?width=1000&format=jpeg&auto=webp

Unfortunately, some readers didn't understand that the NEJM article
describing the link between chocolate consumption and Nobel laureates was
meant to be a joke.  :-)  

https://www.wbur.org/commonhealth/2012/10/15/nobel-chocolate-joke

I'm not suggesting the correlation in the Washington Post article is a joke,
by the way.  But I am suggesting that we must always be mindful of the
ecological and atomistic fallacies when examining associations at different
levels.  

Cheers,
Bruce



Mike wrote

> Note:  r = -0.85; the R^2 is provided in the article and I used a
> calculator to
> get the square root and forgot to include the negative sign (as the
> percentage
> of mask users in a state increases, the fewer one knows people with
> COVID-19
> symptoms).  Sorry about that.
>
> -Mike Palij
> New York University

> mp26@

>





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
Reply | Threaded
Open this post in threaded view
|

Re: Correlational Example Involving COVID-19 Useful for Classes

spss.giesel@yahoo.de
"... we must always be mindful of the ecological and atomistic fallacies when examining associations at different levels."

Well spoken, Bruce.

Cheers,
Mario

Am Freitag, 23. Oktober 2020, 21:50:02 MESZ hat Bruce Weaver <[hidden email]> Folgendes geschrieben:


Thanks Mike.  It's a nice example of an ecological correlation.  For some
reason, it reminded me of this other well-known example:

https://i.insider.com/5353e29b6da8115322dd4816?width=1000&format=jpeg&auto=webp

Unfortunately, some readers didn't understand that the NEJM article
describing the link between chocolate consumption and Nobel laureates was
meant to be a joke.  :-) 

https://www.wbur.org/commonhealth/2012/10/15/nobel-chocolate-joke

I'm not suggesting the correlation in the Washington Post article is a joke,
by the way.  But I am suggesting that we must always be mindful of the
ecological and atomistic fallacies when examining associations at different
levels. 

Cheers,
Bruce



Mike wrote

> Note:  r = -0.85; the R^2 is provided in the article and I used a
> calculator to
> get the square root and forgot to include the negative sign (as the
> percentage
> of mask users in a state increases, the fewer one knows people with
> COVID-19
> symptoms).  Sorry about that.
>
> -Mike Palij
> New York University

> mp26@

>





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/


=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Correlational Example Involving COVID-19 Useful for Classes

Mike
In reply to this post by Bruce Weaver
Thanks for bringing up the correlation between chocolate consumption (CC) and the number
of Nobel Laureates (#NL); I remember when it first came out.  However, although the correlation
is between group/aggregate values, I think that this is a better example of spurious correlation
than an ecological correlation.  It can be argued that the correlation between CC and #NL is
dependent on a third variable Z which might be national wealth/GDP, number of graduate
degree granting institutions, and/or other variables (or systems of variables) that are causally
related to #NL.

My understanding of ecological correlation/inference (also known as the ecological fallacy) is
that statistics and relationships based on aggregate/grouped data do not necessarily reflect
the statistics or relationships based on individual level data (or whatever the lowest unit of analysis
is; in the social sciences, this would usually be the person level).  The Wikipedia entry on the
Ecological Fallacy (see: https://en.wikipedia.org/wiki/Ecological_fallacy ) is consistent with
this view but I think Simpson's Paradox presents the fallacy most directly (see:
<a href="https://en.wikipedia.org/wiki/Ecological_fallacy#Simpson&#39;s_paradox">https://en.wikipedia.org/wiki/Ecological_fallacy#Simpson's_paradox ).

However, to make my comments relevant to the WaPo/Carnegie-Mellon correlation
between percentage of mask wearers and percentage who knew a person with COVID-19
symptoms, I think a 1999 paper by David Freedman provides a good review of the
state of the art in ecological analysis back then (though, in part, it incorporates some
previous analysis and writing that is critical of Gary King's 1998 model for ecological analysis;
King has updated his model but I'll put that aside for now).  The reference for the
Freedman article is:
Freedman, D. A. (1999). Ecological inference and the ecological fallacy. International
Encyclopedia of the Social & Behavioral Sciences, 6(4027-4030), 1-7.
And a Pdf can be accessed at:

Freedman reviews some of the procedures that have been developed over the decades
to perform ecological analysis and how to determine whether such analysis is valid.
Remember that the ecological fallacy does not always occur, that is, the results seen
with aggregate data may well be consistent with analysis based on individual cases/units.
A positive correlation between two variables when one is using state-level data may
very well turn out to positive and of the similar magnitude when calculated on individual
persons/cases.  Thinking in terms of levels and that the ecological fallacy arises from
some inconsistency or problem across levels helps to better understand when and
how the ecological fallacy occurs.

This is one reason why current approaches to ecological analysis make use of
multilevel modeling.  One review of such work in the pharmaceutical area is the
following:
Greenland, Sander (2018)  Ecologic Inference.  in Chow, S. C. (Ed.). (2018).
Encyclopedia of Biopharmaceutical Statistics-Four Volume Set. CRC Press.

This article is available on books.google.com though the first page is not
available for preview; see:

There are many ways that ecological analysis can go wrong but knowing what
the specific problems are (e.g., cross-level interactions that are not modeled,
covariates in different measurement formats, etc.) can help a researcher
achieve more valid conclusions from the data that is available.  So, though
the WaPo/Carnegie-Mellon is an ecological correlation it doesn't necessarily
follow that one won't see a negative correlation between percent of 100% mask
use and knowing people with COVID-19 symptoms (and deaths) when
individual person data is used.  It is, after all, an empirical question.

-Mike Palij
New York University


On Fri, Oct 23, 2020 at 3:50 PM Bruce Weaver <[hidden email]> wrote:
Thanks Mike.  It's a nice example of an ecological correlation.  For some
reason, it reminded me of this other well-known example:

https://urldefense.proofpoint.com/v2/url?u=https-3A__i.insider.com_5353e29b6da8115322dd4816-3Fwidth-3D1000-26format-3Djpeg-26auto-3Dwebp&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=byOGLy7wAVV4QNewrknFEsl8rZu6Z5AmfHbf1ZI7mJw&s=uG6KusBzr-OA1Wny0rmLrT0-S5DKqTszUBXF53aTb3g&e=

Unfortunately, some readers didn't understand that the NEJM article
describing the link between chocolate consumption and Nobel laureates was
meant to be a joke.  :-) 

https://urldefense.proofpoint.com/v2/url?u=https-3A__www.wbur.org_commonhealth_2012_10_15_nobel-2Dchocolate-2Djoke&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=byOGLy7wAVV4QNewrknFEsl8rZu6Z5AmfHbf1ZI7mJw&s=f_GrBIkg8VFP7E6E2GmIOhiHLa2zHTzj3qCsGlRWU4U&e=

I'm not suggesting the correlation in the Washington Post article is a joke,
by the way.  But I am suggesting that we must always be mindful of the
ecological and atomistic fallacies when examining associations at different
levels. 

Cheers,
Bruce



Mike wrote
> Note:  r = -0.85; the R^2 is provided in the article and I used a
> calculator to
> get the square root and forgot to include the negative sign (as the
> percentage
> of mask users in a state increases, the fewer one knows people with
> COVID-19
> symptoms).  Sorry about that.
>
> -Mike Palij
> New York University

> mp26@

>





-----
--
Bruce Weaver
[hidden email]
https://urldefense.proofpoint.com/v2/url?u=http-3A__sites.google.com_a_lakeheadu.ca_bweaver_&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=byOGLy7wAVV4QNewrknFEsl8rZu6Z5AmfHbf1ZI7mJw&s=Rn7uTOXmL5juye1Y3UL46jUfoHH5c-FTl7SyfBcp7UA&e=

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: https://urldefense.proofpoint.com/v2/url?u=http-3A__spssx-2Ddiscussion.1045642.n5.nabble.com_&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=byOGLy7wAVV4QNewrknFEsl8rZu6Z5AmfHbf1ZI7mJw&s=KPIUVs9IFyYI9VXKnE3mgyc_FQ0VJUqqtOH9yfDX9do&e=

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Correlational Example Involving COVID-19 Useful for Classes

Rich Ulrich
In reply to this post by Mike
Based strictly on the data presented, one could draw the
arrow of causation in either direction. 

Fauci assumes that wearing a mask prevents disease.

Trump might argue that those people who have experience
(illness in friends) realize that those mask-wearing is a result
of the panic caused by the medical profession.  People "more
familiar" with the disease do not bother with masks.

For these data, interviews with persons who have switched status
(to or from mask-wearing) would be helpful for interpretation.

Herman Rubin in the stats groups offered the example of the
correlation between the number of trucks responding to a fire
alarm and the cost of the subsequent damage.  More respondents
mean ("result in"?) more damage.

--
Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Michael Palij <[hidden email]>
Sent: Friday, October 23, 2020 1:21 PM
To: [hidden email] <[hidden email]>
Subject: Fwd: Correlational Example Involving COVID-19 Useful for Classes
 
Note:  r = -0.85; the R^2 is provided in the article and I used a calculator to
get the square root and forgot to include the negative sign (as the percentage
of mask users in a state increases, the fewer one knows people with COVID-19
symptoms).  Sorry about that.

-Mike Palij
New York University



---------- Forwarded message ---------
From: Michael Palij <[hidden email]>
Date: Fri, Oct 23, 2020 at 1:17 PM
Subject: Correlational Example Involving COVID-19 Useful for Classes
To: SPSS list <[hidden email]>
Cc: Michael Palij <[hidden email]>


This Washington Post article provides an interesting example of correlation
between the "percentage of people one knows with COVID-19 symptoms"
and "percent of people wearing masks in public" -- the unit of analysis is U.S.
states.  There is a nice scatterplot and the data used is provided in a table
(comes from Delphi CPVODCast, Carnegie-Mellon U). The R^2 is 0.72 (r = 0.85).
I have not checked the Carnegie-Mellon source but they may be more
interesting data/analysis available there.


-Mike Palij
New York University

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Correlational Example Involving COVID-19 Useful for Classes

Mike
Just remember Sir Ronald Fisher's argument against cigarette smoking causing
multiple health problems (e.g., lung cancer, heart disease, etc.) in humans was
that all researchers have was a correlation between amount of cigarette use
and illness condition at some age (usually middle aged and older).  Fisher's
justification for his explanation was that people who smoke and went on to
develop, say, lung cancer may have been genetically predisposed to have
lung cancer and, perhaps, a tendency to (enjoy) smoking. 

Remember that the evidence for a causal relationship of smoking leading
to lung cancer is based on observational research because it would be
unethical to do a randomized clinical trial (RCT) like the following:

(1)  Take several thousand male and females who are randomly assigned
to one of two levels: (a) starting at age 18, equal numbers of males and
femaies to a smoking condition where everyone is required to smoke one
pack of cigarettes per day for an indefinite period of time but on the order
of decades, and (b) also starting at age 18, equal numbers of males and
females are required to be tobacco abstinent and to avoid situations where
one might be exposed to second-hand smoke.

(2) At 5 to 10 year intervals all participants are given medical examinations
to screen for the presence of major diseases/illnesses.  20 years of such
data collection would probably be a minimum but it might best (if funding
can be obtained) to do up to 30 or 40 years of follow-up, at which point
mortality rates may become a more important dependent measure.

(3)  Care should be taken to make sure that participants in the two groups
should have similar representation of racial/ethnic groups, SES and education
level, live in similar neighborhoods/environments, etc.  Identification of other
relevant variables that might be used as covariates should be an ongoing
process, especially to better understand people who have smoked their
entire life (some into their 90s) but have not developed any significant
health conditions.  This group may have a genetically based protection
against the damaging effects of smoking, something compared to people
with HIV infection for decades but do not develop AIDS -- the viral load
is kept very low by the person's immune system, suggesting that some
genetic condition bolsters the resistance to HIV developing into AIDS.

I'm sure that the above design has to be polished up before it would
be a viable undertaking but ethical considerations would probably not
permit such research from ever being done.  Which is unfortunate
because this would be an experimental based procedure to establish
a causal link between cigarette smoking and health problems in HUMANS.

So, one can use observational research to address this situation (which
has been the traditional method of studying smoking in humans).
Unfortunately, there are a large number of problems with such research;
for more on this point, see:
Vandenbroucke, J. P., Von Elm, E., Altman, D. G., Gøtzsche, P. C., Mulrow, C. D.,
Pocock, S. J., ... & Strobe Initiative. (2007). Strengthening the Reporting of Observational
Studies in Epidemiology (STROBE): explanation and elaboration. PLoS Med, 4(10), e297.

The article can be accessed here:

A somewhat more cynical view of the medical research process is provided by
John P. A. Ioannidis in publications such as the following:
Ioannidis, J. P. (2005). Why most published research findings are false. PLoS medicine, 2(8), e124.
This article can be accessed at:

So, the contemporary medical understanding and treatment of the effects of cigarette
smoking on human health is not based on RCTs showing a causal relationship between
smoking and human health.  RCTs with animals (e.g., smoking vs non-smoking dogs),
appear to support the smoking causes illness proposition but effects shown in animals
don't always transfer to humans -- the animal effect is not replicated in humans where
the effect is weak, nonexistent, or is manifested in different ways.  Consequently,
the warrant to claim that cigarette smoking produces serious medical problem has to
be based on (a) observational studies with humans (but with the problem having to explain
why a number of humans do NOT develop illnesses), (b) RCT/experimental studies with
animals to show that they indeed develop serious illness as a function of time spend
smoking, and (c) bench research examining cellular and biochemical effects of the
chemicals (poisons) found in cigarette smoking, trying to determine why and which
physiological systems are being damaged. 

To summarize an overlong yadda-yadda:  correlations can provide information about
causation (or its absence) but one needs to know a large variety of evidence in order
to make the argument that a correlation means one thing (i.e., smoking is positively
related with the development of medical illness) and not another thing (e.g., genetic
factors that predispose one to develop illnesses [compare to the diathesis-stress
model] may also lead one to engage in smoking but plays a lesser role in the
development of an illness).  Experimental designs that can be used to determine
which interpretation of the correlation relationship should be accepted cannot be
done for ethical reasons, so using info from a variety of sources that converge on an
overwhelming conclusion (i.e., smoking causes illness) is the way that the argument
has to be established.

We can leave it as an exercise to interested parties to determine which experimental
design might provide evidence that wearing a face mask reduces the number of
illness/deaths, and the willingness to wear a mask may depend upon a number of
factors, including how many people one knows that had COVID-19 and how bad
a case it was.

On an aside, the argument that people have AGAINST wearing masks reminds me
of the early part of the movie "Aliens" when Ripley is trying to convince the corporate
suits to take seriously the threat that the nearly unstoppable exeomorphs are a danger
not only on LV-426 but to earth as well.  And suits being suits, blow Ripley off. At least
until earth is not able to communicate with LV-426 and someone has to go there to find
out why.  Ripley understand that if she goes there, she may be walking into a deathtrap
while the Marine and the suit Burke think they can handle anything there.  Well, most
of know how that turned out.  People who don't wear mask are like the marines and Burke
but have to be hit in the face with a 2x4 to realize how much danger they are in -- it's
not until their first encounter with the aliens that they realized how unprepared they are
to deal with them though Ripley's pleadings tried to get them to understand how bad
it is.  Similar to the coronavirus and its resultant illness COVID-19 -  until you see how
terrible it can be, one can make believe that the virus is no worse than the flu or that
it is a hoax or it's just an attempt to undermine the president.  Sometimes one might
have to let a kid touch a hot pot on a stove to realize that they shouldn't touch hot
pots and pans on stoves.  But some kids might need several such learning trials while
a few might turn out to be Darwin award winners.  One can always give advice but
it is foolish to expect people to follow it unless they understand what is really going on.

tl;cw

-Mike Palij
New York Univerisity

P.S. It's late.  Sorry about the typos and sentences that appear to suggest that
I had a temporary psychotic break with reality. ;-). 

On Sun, Oct 25, 2020 at 1:03 AM Rich Ulrich <[hidden email]> wrote:
Based strictly on the data presented, one could draw the
arrow of causation in either direction. 

Fauci assumes that wearing a mask prevents disease.

Trump might argue that those people who have experience
(illness in friends) realize that those mask-wearing is a result
of the panic caused by the medical profession.  People "more
familiar" with the disease do not bother with masks.

For these data, interviews with persons who have switched status
(to or from mask-wearing) would be helpful for interpretation.

Herman Rubin in the stats groups offered the example of the
correlation between the number of trucks responding to a fire
alarm and the cost of the subsequent damage.  More respondents
mean ("result in"?) more damage.

--
Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Michael Palij <[hidden email]>
Sent: Friday, October 23, 2020 1:21 PM
To: [hidden email] <[hidden email]>
Subject: Fwd: Correlational Example Involving COVID-19 Useful for Classes
 
Note:  r = -0.85; the R^2 is provided in the article and I used a calculator to
get the square root and forgot to include the negative sign (as the percentage
of mask users in a state increases, the fewer one knows people with COVID-19
symptoms).  Sorry about that.

-Mike Palij
New York University



---------- Forwarded message ---------
From: Michael Palij <[hidden email]>
Date: Fri, Oct 23, 2020 at 1:17 PM
Subject: Correlational Example Involving COVID-19 Useful for Classes
To: SPSS list <[hidden email]>
Cc: Michael Palij <[hidden email]>


This Washington Post article provides an interesting example of correlation
between the "percentage of people one knows with COVID-19 symptoms"
and "percent of people wearing masks in public" -- the unit of analysis is U.S.
states.  There is a nice scatterplot and the data used is provided in a table
(comes from Delphi CPVODCast, Carnegie-Mellon U). The R^2 is 0.72 (r = 0.85).
I have not checked the Carnegie-Mellon source but they may be more
interesting data/analysis available there.


-Mike Palij
New York University

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Correlational Example Involving COVID-19 Useful for Classes

Bruce Weaver
Administrator
In reply to this post by Mike
Mike wrote
> Thanks for bringing up the correlation between chocolate consumption (CC)
> and the number
> of Nobel Laureates (#NL); I remember when it first came out.  However,
> although the correlation
> is between group/aggregate values, I think that this is a better example
> of spurious correlation
> than an ecological correlation.  

Fair point.  


> --- snip ---
> My understanding of ecological correlation/inference (also known as the
> ecological fallacy) is
> that statistics and relationships based on aggregate/grouped data do not
> necessarily reflect
> the statistics or relationships based on individual level data (or
> whatever the lowest unit of analysis
> is; in the social sciences, this would usually be the person level).  The
> Wikipedia entry on the
> Ecological Fallacy (see: https://en.wikipedia.org/wiki/Ecological_fallacy
> ) is consistent with
> this view

Agreed.  Ditto for the atomistic fallacy, except for the reversed direction
(i.e., associations at the level of individuals do not necessarily match
associations between the same variables at the aggregate level).  


> but I think Simpson's Paradox presents the fallacy most directly (see:
> https://en.wikipedia.org/wiki/Ecological_fallacy#Simpson's_paradox ).

Hmm. You're going to have to explain this one to me.  Simpson's Paradox is
often illustrated with examples where there appears to be no association
between X and Y, but when one "controls" for Z, the X-Y association becomes
apparent.  As this article suggests, it is an example of suppression, or
negative confounding, as epidemiologists might call it:

https://link.springer.com/article/10.1186/s12982-019-0087-0

See the example in Table 1.  

Perhaps what you're suggesting is that to get the correct estimate of the
X-Y association, one must compute estimates within each stratum of the
confounder, and then a pooled estimate of those within-stratum estimates
(rather than pooling the data across strata)?  I don't see that as being the
same thing as computing the association between aggregate measures of X and
Y, though.  

--- snip the rest ---




-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Correlational Example Involving COVID-19 Useful for Classes

Mike
On Sun, Oct 25, 2020 at 10:55 AM Bruce Weaver <[hidden email]> wrote:
> --- snip ---
> but I think Simpson's Paradox presents the fallacy most directly (see:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_Ecological-5Ffallacy-23Simpson-27s-5Fparadox&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=300e3cAU3FH4Wnoe4MY_n1Jmt3K-xsHo9cXm3y8sse0&s=sDi6eBkVmVauFo92kBooAiYs9NvPwMPBB0WifOkOGaY&e=  ).

Hmm. You're going to have to explain this one to me.  Simpson's Paradox is
often illustrated with examples where there appears to be no association
between X and Y, but when one "controls" for Z, the X-Y association becomes
apparent.  As this article suggests, it is an example of suppression, or
negative confounding, as epidemiologists might call it:
https://urldefense.proofpoint.com/v2/url?u=https-3A__link.springer.com_article_10.1186_s12982-2D019-2D0087-2D0&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=300e3cAU3FH4Wnoe4MY_n1Jmt3K-xsHo9cXm3y8sse0&s=4zYrM0WEJGa9AyckGqWEStFDWkuDwpHt5FoH70LHtvQ&e=

See the example in Table 1. 

A few points:
(1)  I think that the case you are referring to, i.e., no association between X and Y
when Z is controlled for, is a special case of Simpson's paradox, that is,
sometimes suppression may give rise to the Simpson's paradox but
Simpson's paradox can still occur without suppression.  More on this
point shortly.

(2) Please see the following article:
Kievit, R., Frankenhuis W., Waldorp L., & Borsboom, D. (2013). Simpson's paradox in
psychological science: a practical guide.Frontiers in Psychology, 4, 513.

The article can be accessed at:

The abstract to the article follows:
The direction of an association at the population-level may be reversed within the subgroups
comprising that population --- a striking observation called Simpson's paradox. When facing this
pattern, psychologists often view it as anomalous. Here, we argue that Simpson's paradox is
more common than conventionally thought, and typically results in incorrect interpretations --
potentially with harmful consequences. We support this claim by reviewing results from cognitive
neuroscience, behavior genetics, clinical psychology, personality psychology, educational psychology,
intelligence research, and simulation studies. We show that Simpson's paradox is most likely to
occur when inferences are drawn across different levels of explanation (e.g., from populations
to subgroups, or subgroups to individuals). We propose a set of statistical markers indicative
of the paradox, and offer psychometric solutions for dealing with the paradox when encountered --
including a toolbox in R for detecting Simpson's paradox. We show that explicit modeling of situations
in which the paradox might occur not only prevents incorrect interpretations of data, but also
results in a deeper understanding of what data tell us about the world.
NOTE: emphasis of the last sentence is added.  Modeling the data pattern is important because
of the next point.

(3)  On page 6 of the PDF for the article (scroll down on the webpage) the following quote
appears:

A Survival Guide to Simpson's Paradox
We have shown that SP may occur in a wide variety of research designs, methods, and questions.
As such, it would be useful to develop means to “control” or minimize the risk of SP occurring, much
like we wish to control instances of other statistical problems. Pearl (1999, 2000) has shown that
(unfortunately) there is no single mathematical property that all instances of SP have in common, and
therefore, there will not be a single, correct rule for analyzing data so as to prevent cases of SP.

Based on graphical models, Pearl (2000) shows that conditioning on subgroups may sometimes be
appropriate, but may sometimes increase spurious dependencies (see also Spellman et al., 2001).
It appears that some cases are observationally equivalent, and only when it can be assumed that the
cause of interest does not influence another variable associated with the effect, a test exists to determine
whether SP can arise (see Pearl, 2000, chapter 6 for details).

Note #1:  Emphasis of the sentence containing Judah Pearl's statement that there is no single math property
that underlie all instances of Simpson's Paradox.  This implies that some cases of SP may be due
to suppression but other mechanisms are probably operating to produce the pattern, hence the need
for something like the author's R toolkit to investigate an instance of SP in detail.

Note #2:  I think that this article is helpful in thinking about Simpson's Paradox even though most of
the examples are from psychology because it shows how it can appear in a wide variety of situations
(sometimes unnoticed) as well as the difference between SP based on different groups of subjects
and SP based on repeated measurements of individuals in different groups.

Perhaps what you're suggesting is that to get the correct estimate of the
X-Y association, one must compute estimates within each stratum of the
confounder, and then a pooled estimate of those within-stratum estimates
(rather than pooling the data across strata)?  I don't see that as being the
same thing as computing the association between aggregate measures of X and
Y, though. 
--- snip the rest ---

No, I was trying to suggest that Simpson's paradox may reflect the operation of
different mechanisms which is one reason why I pointed out that multilevel analysis
is one strategy that some researchers are using to understand SP. 

-Mike Pallij
New York University


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Correlational Example Involving COVID-19 Useful for Classes

Bruce Weaver
Administrator
Thanks for the links, Mike.  I see that I also have access to Simpson (1951)
via JSTOR, so when I have time to dig into this a bit more, I'll start with
that.  

https://www.jstor.org/stable/2984065?seq=1#metadata_info_tab_contents

Bruce



Mike wrote
> On Sun, Oct 25, 2020 at 10:55 AM Bruce Weaver &lt;

> bruce.weaver@

> &gt;
> wrote:
>
>> > --- snip ---
>> > but I think Simpson's Paradox presents the fallacy most directly (see:
>> >
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_Ecological-5Ffallacy-23Simpson-27s-5Fparadox&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=300e3cAU3FH4Wnoe4MY_n1Jmt3K-xsHo9cXm3y8sse0&s=sDi6eBkVmVauFo92kBooAiYs9NvPwMPBB0WifOkOGaY&e=
>> ).
>>
>> Hmm. You're going to have to explain this one to me.  Simpson's Paradox
>> is
>> often illustrated with examples where there appears to be no association
>> between X and Y, but when one "controls" for Z, the X-Y association
>> becomes
>> apparent.  As this article suggests, it is an example of suppression, or
>> negative confounding, as epidemiologists might call it:
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__link.springer.com_article_10.1186_s12982-2D019-2D0087-2D0&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=300e3cAU3FH4Wnoe4MY_n1Jmt3K-xsHo9cXm3y8sse0&s=4zYrM0WEJGa9AyckGqWEStFDWkuDwpHt5FoH70LHtvQ&e=
>>
>> See the example in Table 1.
>>
>
> A few points:
> (1)  I think that the case you are referring to, i.e., no association
> between X and Y
> when Z is controlled for, is a special case of Simpson's paradox, that is,
> sometimes suppression may give rise to the Simpson's paradox but
> Simpson's paradox can still occur without suppression.  More on this
> point shortly.
>
> (2) Please see the following article:
> Kievit, R., Frankenhuis W., Waldorp L., & Borsboom, D. (2013). Simpson's
> paradox in
> psychological science: a practical guide.Frontiers in Psychology, 4, 513.
>
> The article can be accessed at:
> https://www.frontiersin.org/articles/10.3389/fpsyg.2013.00513/full
>
> The abstract to the article follows:
> The direction of an association at the population-level may be reversed
> within the subgroups
> comprising that population --- a striking observation called Simpson's
> paradox. When facing this
> pattern, psychologists often view it as anomalous. Here, we argue that
> Simpson's paradox is
> more common than conventionally thought, and typically results in
> incorrect
> interpretations --
> potentially with harmful consequences. We support this claim by reviewing
> results from cognitive
> neuroscience, behavior genetics, clinical psychology, personality
> psychology, educational psychology,
> intelligence research, and simulation studies. We show that Simpson's
> paradox is most likely to
> occur when inferences are drawn across different levels of explanation
> (e.g., from populations
> to subgroups, or subgroups to individuals). We propose a set of
> statistical
> markers indicative
> of the paradox, and offer psychometric solutions for dealing with the
> paradox when encountered --
> including a toolbox in R for detecting Simpson's paradox.
> *We show that explicit modeling of situations *
>
> *in which the paradox might occur not only prevents incorrect
> interpretations of data, but also *
> *results in a deeper understanding of what data tell us about the world.*
> NOTE: emphasis of the last sentence is added.  Modeling the data pattern
> is
> important because
> of the next point.
>
> (3)  On page 6 of the PDF for the article (scroll down on the webpage) the
> following quote
> appears:
>
> *A Survival Guide to Simpson's Paradox*
> We have shown that SP may occur in a wide variety of research designs,
> methods, and questions.
> As such, it would be useful to develop means to “control” or minimize the
> risk of SP occurring, much
> like we wish to control instances of other statistical problems.
>
> *Pearl (1999, 2000) has shown that(unfortunately) there is no single
> mathematical property that all instances of SP have in common,
> andtherefore, there will not be a single, correct rule for analyzing data
> so as to prevent cases of SP.*
> Based on graphical models, Pearl (2000) shows that conditioning on
> subgroups may sometimes be
> appropriate, but may sometimes increase spurious dependencies (see also
> Spellman et al., 2001).
> It appears that some cases are observationally equivalent, and only when
> it
> can be assumed that the
> cause of interest does not influence another variable associated with the
> effect, a test exists to determine
> whether SP can arise (see Pearl, 2000, chapter 6 for details).
>
> Note #1:  Emphasis of the sentence containing Judah Pearl's statement that
> there is no single math property
> that underlie all instances of Simpson's Paradox.  This implies that some
> cases of SP may be due
> to suppression but other mechanisms are probably operating to produce the
> pattern, hence the need
> for something like the author's R toolkit to investigate an instance of SP
> in detail.
>
> Note #2:  I think that this article is helpful in thinking about Simpson's
> Paradox even though most of
> the examples are from psychology because it shows how it can appear in a
> wide variety of situations
> (sometimes unnoticed) as well as the difference between SP based on
> different groups of subjects
> and SP based on repeated measurements of individuals in different groups.
>
> Perhaps what you're suggesting is that to get the correct estimate of the
>> X-Y association, one must compute estimates within each stratum of the
>> confounder, and then a pooled estimate of those within-stratum estimates
>> (rather than pooling the data across strata)?  I don't see that as being
>> the
>> same thing as computing the association between aggregate measures of X
>> and
>> Y, though.
>> --- snip the rest ---
>>
>
> No, I was trying to suggest that Simpson's paradox may reflect the
> operation of
> different mechanisms which is one reason why I pointed out that multilevel
> analysis
> is one strategy that some researchers are using to understand SP.
>
> -Mike Pallij
> New York University

> mp26@

>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Correlational Example Involving COVID-19 Useful for Classes

Mike
Bruce, may I suggest that you also read the following by Judea Pearl:

Judea Pearl (2014) Comment: Understanding Simpson’s Paradox, The
American Statistician, 68:1, 8-13, DOI: 10.1080/00031305.2014.876829

I can provide a copy if you need one.

-Mike Palij
New York University



On Sun, Oct 25, 2020 at 1:24 PM Bruce Weaver <[hidden email]> wrote:
Thanks for the links, Mike.  I see that I also have access to Simpson (1951)
via JSTOR, so when I have time to dig into this a bit more, I'll start with
that. 

https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jstor.org_stable_2984065-3Fseq-3D1-23metadata-5Finfo-5Ftab-5Fcontents&d=DwIFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=FOEUb4yryPerG_VrcHKoSK109IfB-q-_ac_0ku3suks&s=bruuLj6QcuwkvBMjesSGJl3MkDjmr49Z8Ff4dl5SR1E&e=

Bruce



Mike wrote
> On Sun, Oct 25, 2020 at 10:55 AM Bruce Weaver &lt;

> bruce.weaver@

> &gt;
> wrote:
>
>> > --- snip ---
>> > but I think Simpson's Paradox presents the fallacy most directly (see:
>> >
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_Ecological-5Ffallacy-23Simpson-27s-5Fparadox&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=300e3cAU3FH4Wnoe4MY_n1Jmt3K-xsHo9cXm3y8sse0&s=sDi6eBkVmVauFo92kBooAiYs9NvPwMPBB0WifOkOGaY&e=
>> ).
>>
>> Hmm. You're going to have to explain this one to me.  Simpson's Paradox
>> is
>> often illustrated with examples where there appears to be no association
>> between X and Y, but when one "controls" for Z, the X-Y association
>> becomes
>> apparent.  As this article suggests, it is an example of suppression, or
>> negative confounding, as epidemiologists might call it:
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__link.springer.com_article_10.1186_s12982-2D019-2D0087-2D0&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=300e3cAU3FH4Wnoe4MY_n1Jmt3K-xsHo9cXm3y8sse0&s=4zYrM0WEJGa9AyckGqWEStFDWkuDwpHt5FoH70LHtvQ&e=
>>
>> See the example in Table 1.
>>
>
> A few points:
> (1)  I think that the case you are referring to, i.e., no association
> between X and Y
> when Z is controlled for, is a special case of Simpson's paradox, that is,
> sometimes suppression may give rise to the Simpson's paradox but
> Simpson's paradox can still occur without suppression.  More on this
> point shortly.
>
> (2) Please see the following article:
> Kievit, R., Frankenhuis W., Waldorp L., & Borsboom, D. (2013). Simpson's
> paradox in
> psychological science: a practical guide.Frontiers in Psychology, 4, 513.
>
> The article can be accessed at:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.frontiersin.org_articles_10.3389_fpsyg.2013.00513_full&d=DwIFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=FOEUb4yryPerG_VrcHKoSK109IfB-q-_ac_0ku3suks&s=-jx8yrg77I54X2jA4t5za0iutaY8nFjsJ1d7gbut_Ng&e=
>
> The abstract to the article follows:
> The direction of an association at the population-level may be reversed
> within the subgroups
> comprising that population --- a striking observation called Simpson's
> paradox. When facing this
> pattern, psychologists often view it as anomalous. Here, we argue that
> Simpson's paradox is
> more common than conventionally thought, and typically results in
> incorrect
> interpretations --
> potentially with harmful consequences. We support this claim by reviewing
> results from cognitive
> neuroscience, behavior genetics, clinical psychology, personality
> psychology, educational psychology,
> intelligence research, and simulation studies. We show that Simpson's
> paradox is most likely to
> occur when inferences are drawn across different levels of explanation
> (e.g., from populations
> to subgroups, or subgroups to individuals). We propose a set of
> statistical
> markers indicative
> of the paradox, and offer psychometric solutions for dealing with the
> paradox when encountered --
> including a toolbox in R for detecting Simpson's paradox.
> *We show that explicit modeling of situations *
>
> *in which the paradox might occur not only prevents incorrect
> interpretations of data, but also *
> *results in a deeper understanding of what data tell us about the world.*
> NOTE: emphasis of the last sentence is added.  Modeling the data pattern
> is
> important because
> of the next point.
>
> (3)  On page 6 of the PDF for the article (scroll down on the webpage) the
> following quote
> appears:
>
> *A Survival Guide to Simpson's Paradox*
> We have shown that SP may occur in a wide variety of research designs,
> methods, and questions.
> As such, it would be useful to develop means to “control” or minimize the
> risk of SP occurring, much
> like we wish to control instances of other statistical problems.
>
> *Pearl (1999, 2000) has shown that(unfortunately) there is no single
> mathematical property that all instances of SP have in common,
> andtherefore, there will not be a single, correct rule for analyzing data
> so as to prevent cases of SP.*
> Based on graphical models, Pearl (2000) shows that conditioning on
> subgroups may sometimes be
> appropriate, but may sometimes increase spurious dependencies (see also
> Spellman et al., 2001).
> It appears that some cases are observationally equivalent, and only when
> it
> can be assumed that the
> cause of interest does not influence another variable associated with the
> effect, a test exists to determine
> whether SP can arise (see Pearl, 2000, chapter 6 for details).
>
> Note #1:  Emphasis of the sentence containing Judah Pearl's statement that
> there is no single math property
> that underlie all instances of Simpson's Paradox.  This implies that some
> cases of SP may be due
> to suppression but other mechanisms are probably operating to produce the
> pattern, hence the need
> for something like the author's R toolkit to investigate an instance of SP
> in detail.
>
> Note #2:  I think that this article is helpful in thinking about Simpson's
> Paradox even though most of
> the examples are from psychology because it shows how it can appear in a
> wide variety of situations
> (sometimes unnoticed) as well as the difference between SP based on
> different groups of subjects
> and SP based on repeated measurements of individuals in different groups.
>
> Perhaps what you're suggesting is that to get the correct estimate of the
>> X-Y association, one must compute estimates within each stratum of the
>> confounder, and then a pooled estimate of those within-stratum estimates
>> (rather than pooling the data across strata)?  I don't see that as being
>> the
>> same thing as computing the association between aggregate measures of X
>> and
>> Y, though.
>> --- snip the rest ---
>>
>
> No, I was trying to suggest that Simpson's paradox may reflect the
> operation of
> different mechanisms which is one reason why I pointed out that multilevel
> analysis
> is one strategy that some researchers are using to understand SP.
>
> -Mike Pallij
> New York University

> mp26@

>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
https://urldefense.proofpoint.com/v2/url?u=http-3A__sites.google.com_a_lakeheadu.ca_bweaver_&d=DwIFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=FOEUb4yryPerG_VrcHKoSK109IfB-q-_ac_0ku3suks&s=UjmbgE7xX9eYcTeGsD_7xbINg63civa19LPB80Zok08&e=

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: https://urldefense.proofpoint.com/v2/url?u=http-3A__spssx-2Ddiscussion.1045642.n5.nabble.com_&d=DwIFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=FOEUb4yryPerG_VrcHKoSK109IfB-q-_ac_0ku3suks&s=2eBklti2OfEXBGmuDDVZa4lHHYI4mlP-kcTBtEtrybY&e=

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD