Thursday, June 28, 2012

Against the Storks of Oldenburg

A Humean Being
A few centuries back as the crow flies, David Hume (among others) discarded the concept of final causation.  However, this left efficient causation hanging in the air.  If there is nothing in A that "points toward" B, then there is no reason to suppose that A causes B "always or for the most part."  So committed was he to discarding finality that, faced with this inconvenient truth, Hume discarded causality entirely.  A does not "cause" B.    It is only that B happens to follow A "always or for the most part."  So far.  Tomorrow, it might not.  What appear to be laws of nature are simply the human tendency to "see" patterns regardless whether they are there.  This, of course, pulled the entire metaphysical rug out from under the new natural science; but scientists responded with a clean, manly cognitive dissonance.  They accepted the premise (final causes = boo!) while while whistling past the graveyard (of efficient causes).  That is, they acted for the most part as if causality was alive and well.

Well, they were physicists.  But over the centuries, Humean correlation gradually encroached on causation.   In the social sciences, correlation is triumphant. 

Which brings us to the topic du jour.  Does belief in heaven encourage criminal behavior?


A correlation in which X (carbon
content of steel) does cause Y
(the tensile strength of that steel).
Correlationships
And you thought relationships were hard....

In general, there are four reasons why two factors (X and Y) may be co-related. 

1. X is a cause of Y.  The example show in the relationship of the tensile strength of steel to its carbon content.  There are sound chemo-physical reasons why this should be so.

A correlation in which Y (population)
causes X (stork observations).
2. Y is a cause of X.  That is, the experimenter has gotten things backward and confused the effect with the cause.  The classic example is that of the Storks of Oldenburg, in which the population of Oldenburg, Germany, correlated strongly with the number of storks observed during the years 1930-36.  Do storks bring babies?  You can't argue with the data!  This is sometimes referred to as a spurious correlation; but there actually is a causal chain:
babies→population→houses→chimneys→stork nesting places
So babies bring storks!

The short story is that the bigger the population, the more eyeballs available to spot storks. 

3. Z causes both X and Y.  This is far and away a usual reason for correlation.  The Z-factor is called a "lurking variable," some examples of which can be found in a paper by Brian Joiner, founder of one division of the consulting firm for which I once worked.  Two schematic examples:
Two examples showing how correlation might exist despite
no causal connection between X and Y
  • Frothing.  A chemical operation beset by poor yields had a statistician on staff who performed a regression analysis on Yield versus a variety of operational variables.  He found one, pressure in the reactor, and went to tell the plant manager. "All you need to do is lower the pressure!" 
    "Please turn around and bend over," the manager said, "so that I may thank you properly."  For he knew something the statistician did not.  Namely that a certain impurity in the raw materials would cause low yields in reaction.  That same impurity would cause the batch to froth.  The standard operating procedure was to increase the pressure to hold the frothing down.  So, yes, low yields were "associated with" high pressures, but there is no causal connection between them. 
  • Grading.  In the 1950s, Harvard conducted a study in which they found a correlation between the grades earned in high school and the salary earned ten years after graduation.   (In those days, to get a good job required a high school education.  That's still true today, but you have to go to college to get one.)  This resulted in hordes of crazed parents descending on the schools of America demanding that their little Jimmy or Jane be given a higher grade.  This was based on the magical belief that if a teacher erased one letter in a grade book and wrote in another letter, their child would get a raise or a promotion ten years later.  However, both X and Y were effects, resulting from an intangible we might call "drive to succeed."  In school success is measured by grades and promotions; in work it is measured by wages and promotions.  Jiggering one effect does nothing to the other effect. 
Lurking Z.  
The lurking variable can create the appearance of a correlation when it's not there or mask a correlation when it is.  Two experiences of mine are pertinent. 
Left: a spurious correlation (in the actual case there were four clerks)
Right: a masked correlation

  • Workload and errors.  In an office producing certificates of inspection, a negative correlation was observed between the error rate and the workload.  Does this mean that if we load down the clerks with more work they will make fewer errors?  Not even management believed that.  No, there were four clerks typing certificates, and those with more time on the job made fewer errors and completed more certificates.  The correlation disappeared when the factor "Clerks" was taken into account.
  • Tablet potency.  Data collected on the weights and potencies of pharmaceutical tablets showed no correlation.  But it should have: the heavier the tablet, the more active ingredient it ought to contain.  The lurking Z here was that samples had been inadvertently taken from two bulk powder batches.  One batch had been mixed with more of the active ingredient than the other -- variation in the weighing and mixing operation -- and so tablets of identical weights would have different potencies, obscuring the relationship that did exist. 
4. Coincidence.  
A nearly perfect correlation was found between the percentage of imported passenger cars sold in the US and the percentage of women in the labor force.  Does this mean that to save Detroit we must get all the women back in the kitchen?  Does is mean that working women tended to buy more efficient imported cars?  Nay, I say.  It means that if you take two data series that are trending over the same time frame they will always correlate.  Period.  Imports were increasing linearly from 1955 on, and women were entering the labor force in exponentially increasing numbers. 

With all this in mind...
Behold this news item from CBS Seattle
Study Finds People Who Believe In Heaven Commit More Crimes
We may find this counter-intuitive, but it confirmed the a priori beliefs of the researchers, so we know it must be true.  The University press release was a little different:
Belief in hell, according to international data, is associated with reduced crime
So, it's not just "data," it's "international data," which we all know makes it really really, er umm, "diverse." 
Now look again at that PHD Comics cartoon, above.  Did he call it, or did he call it?  So what did the researchers actually say?
Divergent Effects of Beliefs in Heaven and Hell on National Crime Rates
Data for belief in hell, belief in heaven, belief in God, and religious attendance were taken from the 1981–1984, 1990–1993, 1994–1999, 1999–2004, and 2005–2007 waves of the World Values Surveys (WVS) and European Value Surveys...
Mean standardized crime rates were computed from the 10 crimes for which the United Nations Office on Drugs and Crime (UNODC) had reliable statistics.... These data were compiled by the UNODC from national government sources, including police and court records, national statistics ministries, and other national government bodies.
The first thing we notice is that they actually did make an effort to say "associated with" or "predictor of" rather than "causes."  But then in the end they try to develop causal explanations for why belief in heaven is associated with higher crime rates while belief in hell is associated with lower rates.  (This seems especially problematic since in certain religions, one believes in both.) 

The second thing is that they simply accepted data collected by a variety of governments.  Were they using the same definitions of the various crimes?  For the US (e.g.) did they use the UCR series of the FBI or the Victimization surveys conducted by BuCensus?  How well does the rate of reported crimes in Tanzania track the actual frequency of crimes there?  Canada?  Russia?  How the data was defined, measured, and collected matters.  At least it did in chemical and mechanical processes I worked with.  I find it hard to believe that demographic data, gathered without aid of micrometers or titration columns, is somehow better defined and measured. 

The third thing is this.  If you wanted to study the relationship of the height of trees to their girths, would you pair the height of one tree to the girth of another tree simply because they were growing in the same forest?  Then why pair crime rates and religious beliefs simply because they are in the same country?  First rule of correlation: X and Y must be measured on the same unit.  Unless the people believing in heaven were the ones committing the crimes there is no intelligible basis for a causal connection from the belief to the crime. 

In fact, one might wonder: if a country is more crime-ridden, perhaps its victims turn more to religion.  (Y causes X).  Or perhaps a more religious population is more apt to report the crimes done to them, leading to a higher reported crime rate.  There are a number of possibilities.  The researchers say they considered some of these and dismissed them, but give no details on why they did so.  But people who deal with databases rather than going down to the shop floor to study the data in situ sometimes learn that the data in the database is not so reliable after all. 

5 comments:

  1. I detect a correlation between what appears on your blog and what you talk about with SIGMA.

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete
  3. (In those days, to get a good job required a high school education. That's still true today, but you have to go to college to get one.)

    Loved this. :-D

    ReplyDelete
  4. I decided to write a better survey. Because I’m genuinely interested in a slightly different data set- as in what actual people believe vs what they're willing to self report as encounters with law enforcement. I’m also interested in seeing if as many online places as I can think to post this will give me anything close to the number of responses I would need to become statistically significant (which I figure is about 7 million).
    http://outsidetheautisticasylum.blogspot.com/2012/06/bad-data-so-i-wrote-better-survey.html

    ReplyDelete

Wonder and Anticipation, the Likes of Which We Have Never Seen

  Hello family, friends and fans of Michael F. Flynn.   It is with sorrow and regret that I inform you that my father passed away yesterday,...