Wednesday, January 7, 2015

A Startling Proposal

Top panel of three, see first link. Year axis in on bottom panel.
The suggestion has been made that predictions made by scientific studies be checked against Actual Results in what TOF joshingly refers to as "The Real World™." A band of intrepid researchers have compared the actual rates of glioma to the rates expected by the seminal Swedish study linking them to cell phone use. The graph to the right covers non-Hispanic white males from 1992/97 to 2008. Corrections were made for the delay of onset. The results are discussed less dauntingly here.

As we can see, the rate of gliomas has remained essentially unchanged even while cell phone use was skyrocketing. The exponential curve is where we would expect to find glioma rates if we took the predictions of the Swedish study as, well, predictions.


Should this novel approach be applied to other studies, especially those based not on data (whether case control or observational) but on "data" produced by "models"? Will the idea catch on? What a notion!

A Note on Bacon

To understand the decay of modern science properly, recall that Francis Bacon proposed the following method for ascertaining whether X should be considered a cause of Y.

Make three tables:
  • Table 1: Cases in which Y always occurs.
  • Table 2: Cases in which Y does not occur.
  • Table 3: (if appropriate) Cases in which Y occurs more or less.

Study each list diligently and make a list made of possible factors appearing in each table.

X is to be a cause of Y iff
  • It appears in every case on Table 1
  • It appears in no case on Table 2
and, if appropriate,
  • It appears in greater intensity in cases of Table 3 where Y is greater.
As Modern Science weakened, this rigorous approach was replaced by "correlation." X would be considered "associated with" Y (in good Humean fashion) if it appears lots of times in Table 1 and few times in Table 2. It is unclear what the researchers make of those cases in Table 2 which X occurs but Y does not, or in Table 1 in which X does not occur but Y does. Quite often, we suppose theses cases are overlooked because the researcher is blinded by the light of a sufficiently small p-value.

Of course, life is a bit more complex than Bacon and the Scientific Revolutionaries supposed and Bacon's approach suffers a bit from monocausalitis. If a football game is complicated by the presence of the other team, causal analysis is complicated by the presence of Other Causes, some of which act in consort with and others in opposition to our X, so that the pure X-Y relationship always gets muddied up.

Here, the end is swaged out, not in.
What too many researchers seem to overlook is that the discovery of such a relationship is the beginning, not the end of the search for a causal factor. Juran tells of a case in which 23 different types of torque tubes were ranked according to their percent defective for dynamic imbalance. In order to fit onto common connections, the larger diameter tubes had their ends swaged down to the common diameter. Those swaged tubes were found to be concentrated in the upper ranks of the failure-prone types. Further investigation (and this was the crucial step) revealed that during the swaging there was poor control on maintaining the coaxiality of the swaged and unswaged diameters of the tube, and when these were not aligned properly, dynamic imbalance resulted.

IOW, it ain't over till its over; at least in quality improvement and troubleshooting. It's important to identify the actual physical cause of a thing, not wave hands at reified abstractions like "randomness" or "correlation."

5 comments:

  1. This reminds me of the outrage experienced by psychologists when some barbaric bullies tried - and failed - to duplicate their results: http://tinyurl.com/lydg2vz. You do a study, get it peer-reviewed by your buddies, get tenure - and then some punk shows that you kind of made it all up. Some people have no respect for a professor's career.

    ReplyDelete
  2. Dear TOF. As I'm sure you are aware the output of Models is not Data. Which is probably why you put them in quotes. Best case is that observed and verified data goes into the model and the output is a . . .guess at one probable outcome. I only mention it because not everyone understands. I've been in discussions with Professors who truly believed that the output of models was more data.

    ReplyDelete
  3. Hi TOF,

    This is not strictly on-topic except as it pertains to statistics, but I am a wee baby would-be-statistician one-quarter of the way through a master's program, and I have some questions about a messy regression problem I dealt with last semester that I wanted a little more insight on as it pertains to something you mentioned about being able to fully model any process once you stuff enough variables in there. Where would be a good place to ask you such questions, if you have time to entertain them? (Which I realize is probably not the case, but figured I might ask anyway!)

    ReplyDelete
    Replies
    1. Don't know how much insight I can give, but you can try theofloinn@aol.com

      Delete
    2. Thank you very much! I'll send them along.

      Delete

In The Belly of the Whale - Now Available

    Dear Readers, Dad's final (? maybe?) work is now available at Amazon, B&N, and many other fine retailers. I compiled a list a fe...