Tuesday, May 28, 2013

Quote of the Day. Models and Turism

"A dirty, actually filthy, open secret in statistics is that for any set of data you can always find a model which fits that data arbitrarily close. Finding “statistical significance” is as difficult as the San Francisco City Council discovering something new to ban. The only evidence weaker than hypothesis tests are raw assertions and fallacies of appeal to authority.

The exclusive, or lone, or only, or single, solitary, sole way to check whether any model is good is if it can skillfully predict new data, where “new” means as yet unknown to the model in any way—as in in any way. The reason skeptics exist is because no known model has been able to do this with temperatures past a couple of months ahead."
One way of avoiding this is to pretend that the skill of the model lies in predicting the values of a parameter of the model, which is tested by comparing models against one another and pretending that the differences among models is due to random causes. 

It is also worth noting the Turing Fallacy.  The basic programme in AI research is to mimic human performance with a computer algorithm until the mimicry is indistinguishable from actual human responses.  Then to assume that whatever the algorithm did to achieve this is what the human mind does to do so.  The same may be said of any modeling.  If the model outputs do match the Real World™ with adequate precision, it is not guaranteed that what the model was doing matches what the Real World™ does.  The Ptolemaic model of the World adequately matched the actual positions of stars and planets for thousands of years; and the Tychonic model was mathematically equivalent to the Copernican model, differing only in the origin point of the coordinate systems.  In the end, both the Ptolemaic and the Tychonic systems failed, but they did so for physics reasons, not because the models were a poor fit. 

The sad fact is that scientific theories are underdetermined:  There will always be multiple theories that account for the same body of facts.  IOW, a dirty open secret is that for any set of data you can always find multiple models which fit that data arbitrarily close.

