- Organized simplicity. Systems with few elements. Analyzed mathematically.
- Disorganized complexity. Systems with many elements acting "randomly." Analyzed statistically.
- Organized complexity. Systems with many interconnected elements. Analyzed with operations research/model-building methods.
Joe Martino tells TOF that "One of the most horrible examples I ever encountered was the use of a Cobb-Douglas Production Function to predict the effectiveness of bombing the Ho Chi Minh trail. When I first saw the model it fairly screamed 'wrong!' But the people who put it together saw nothing wrong with it."And yet, these things have so many pretty equations they seem like they damn well ought to work. And they do, in some cases. Kingsbury Bearings has a model for hydraulic bearings that works well in predicting the performance of new bearing designs.* So what's the problem? Operationally, what is the difference between a model that is "useful" and one that is "true"?
(*) hydraulic bearings. TOF digresses. At the entrance to the Kingsbury plant is a placard honoring the Kingsbury Bearing installed in Holtville #5 in 1912. TOF inquired of his hosts in the 1980s when he spent some time with them: "How long did it last?" "We don't know yet," was the response. "It's still running." As of 2008, Wikipedia tells us, it was still running with an estimated TTF of 1300 years. That's craftsmanship! It's also a system whose elements and interactions are pretty well understood.
The Most Famous Model In the History of the World, Maybe.
Faithful Reader may recall TOF's extended discussion of the Great Ptolemaic Smackdown. The Ptolemaic model of the heavens was fabulously successful for 1,460 years -- slightly longer than a Kingsbury bearing at the Holtville Dam. It predicted sunrises, sunsets, eclipses, and sundry other stellar phenomena with tolerable accuracy. If the proof of the pudding is in the eating,* surely the proof of the model is in its forecasts. Or maybe that just is the difference between "useful" and "true."That a model makes good predictions, as the Ptolemaic (and later Tychonic) model did, is no assurance that the real world matches the internal arrangements of the model. (See Cartwright: "How the Laws of Physics Lie.") It is an "associative law" not a "causal law." Let's call it the Turing Fallacy. There's always more than one way to skin a cat, and more than one way to model a phenomenon. And you cannot proceed automatically from post hoc results to propter hoc models. Even Hume recognized that.
(*) proof. Means "to test", as in "proving grounds" or the proof of whiskey.
Data, Data Everywhere, and Not a Jot to Think
Heinlein, self-demonstrating |
What we observe is not nature itself, but nature exposed to our method of questioning.
– Werner Karl Heisenberg, Physics and Philosophy: The Revolution in Modern Science
IOW, how you ask the question determines the sort of answers you can get. If your method of questioning is a hammer, Nature will testify to her sterling nail-like qualities. So Facts have meaning only relative to a conceptual Model of the phenomenon.
What Do You Mean, "Probably"?
Give me data!" he cried. "I can't make bricks without straw!"
-- The Copper Beeches
|
Pr(p|E)
Usually, the evidence E is extended by tacking on “I believes.” For example, “I believe p is represented by a normal distribution with parameters, μ and σ.” These “I believes” comprise the model, M. This makes the model "credo-ble."*
So the probability is assigned to propositions p, given evidence E and model M:
Pr(p|EM)
A bad Model |
(*) structured relationship. In Latin, fictio. Facts acquire meaning as part of a fiction.
But nothing in this life is certain. All is fraught with uncertainty.* That includes models. Yet, the conclusions, the press releases, the requests for your money always seem couched in the language of certainty. Let's take a look at "A Certain Amount of Uncertainty" models deal with. Not all the freight is statistical.
(*) TOF is suddenly struck by the term "fraught." Can a situation be 'fraught with certainty'?
What else might we be fraught with? These are weighty matters.
A Model of a Model
What does a model even look like?
First, there's the real-world situation or system that you want to understand, the context that determines what's in the system and what's outside the system.
Example: In modelling velocities, Newton confined himself to macroscopic bodies traveling well below the speed of light. He didn't know he did this, since quantum events and relativistic speeds were beyond anyone's experience at the time. But in consequence his model stumbles whenever it crosses the boundaries into the very fast or the very small. TOF tells you three times: a model that gives a good account of Situation A may not do so for Situation B.
Second, there's the model structure itself, a sort of skeleton that in some manner "fills in" and "supports" the blob with nodes and links: various factors and their relationships. These relationships are often expressed in mathematical form.
Third, there are the inputs, or signals from outside the system boundary; and the outputs or performances.
How Models Go Bad
Models gone bad |
- Context Uncertainty
- Model Structure Uncertainty
- Input Uncertainty
- Parameter Uncertainty
- Model Outcome Uncertainty
1. Context Uncertainty. This is uncertainty in framing the situation to be modeled: e.g., have we properly identified the problem to be solved, the system boundaries, the economic, political, social, environmental, and technological situation, etc. Basically, it's hard to model a situation if you don't know what you're talking about. Although that does not ever seem to stop anyone.
What exactly is the problem? |
This sort of uncertainty cannot be quantified, which can be a pain in the butt when clients want to know if we are "95% confident" in a conclusion. There is no mathematical-statistical approach to this, and we often wind up simply brainstorming a consensus figure. Just because a number is announced is no guarantee that something has been measured.
Ed Schrock used to call this sort of thing Type III Error: "getting the right solution to the wrong problem."
2. Model Uncertainty. This is uncertainty in designing and executing the model itself, that is, Error in:
- doing the right thing and
- doing the thing right.
Obamacare web portal |
Aquinas once wrote: "The suppositions that these astronomers have invented need not necessarily be true; for perhaps the phenomena of the stars are explicable on some other plan not yet discovered by men." (De coelo, II, lect. 17) That is, there is always more than one way to skin a cat -- and more than one way to model a situation. For example:
- Is a photon a particle or a wave?
- Should we apply a mean or a median -- or a harmonic mean?
- Copernican or Tychonic?
- Etc.
Obamacare web portal |
- Bugs (software errors): keystroke errors in model source code, unclosed loops, registers not set properly, etc.
- Malfs (hardware faults): malfunctions in the technical equipment used to run the model, equipment capabilities, available bandwidth, etc.
3. Input Uncertainty
This is uncertainty in the data describing the reference (base case) system and in the external driving forces that influence the system.3.1. Uncertainty about the external driving forces and their magnitudes -- especially drivers not under the control of policymakers. Model-users tend to assume they have control of system simply because they've identified the Xs. This can be exacerbated when a congeries of actual measurements is given a single symbolic name for convenience. One is then at risk of treating this composite variable as if it were a real-world measurement. (See the model of coups in sub-Saharan Africa.) There is also uncertainty regarding the system response to these forces, leading to model structure uncertainty.
3.2. Uncertainty about the system data (e.g., land-use maps, data on infrastructure, business registers, etc.) on which the model will operate. These may be in error in a variety of ways. The information may be wrong, outdated, missing, or (more subtly) be from outside the problem situation. Examples:
a) The predictive sample of the 1936 US presidential election, in which the target population (context) was "all registered voters" but the sampling frame (system data) comprised mostly lists of phone numbers. The two did not coincide: during the Depression, many voters did not have telephones, and those who did not voted differently from those who did.
Uncertainty about system data is generated by a lack of knowledge of the properties of the underlying system and deficiencies in the description of the variability. Modelers often take their databases and such very much for granted. A historical example: Copernicus' model failed not only because his model structure insisted on pure Platonic circles for the orbits, but also because he used the old Alphonsine Tables of astronomical data, which were rife with centuries of accumulated copyist errors and which transferred to the new Prussian Tables.b) Data for general climate models come from weather stations established and sited for other purposes. These stations often have missing data, have suffered damage, of have gone out of use. Or they may be subject to local effects, such as concrete or asphalt surroundings that may be acceptable for air traffic control -- where you want to know the specific conditions, heat island and all -- but not for other purposes.
4. Parameter Uncertainty
Parameters are constants in the model, supposedly invariant within the chosen context and scenario. There are the following types of parameters:
- Exact parameters: universal constants, e.g.: e.
- Fixed parameters: considered exact from previous investigations, e.g.: g.
- A priori chosen parameters: based on prior experience in similar situations. (The uncertainty must be estimated on the basis of a priori experience.)
- Calibrated parameters: essentially unknown from previous investigations or cannot be used from previous investigations due to dissimilarity of circumstances. These must be determined by calibration. They directly affect model structure uncertainty.
5. Output Uncertainty
This is uncertainty in the predictions of the model and is typically a combined effect of all the other uncertainties. But specifically it includes uncertainties in the reference data used to perform the calibrations.Another Fine Math You've Gotten Us Into
Another fine math... |
A common problem is when a metric of interest (density of coal in a bunker) is for practical reasons difficult or impossible to measure. If we can construct a model by which the density is expressed as a function of radiation backscatter, then we can measure the more accessible metric and convert the results to equivalent density, as was done at the link.
The calibration data is shown in red. These were tubes packed to a known density whose backscatter was then measured. (This sort of thing had to be done for each shipment of coal to the power plant, since the relationship between density and radiation differed for different veins of coal (Context!). A linear regression was deemed a reasonable model over the range of interest both for empirical reasons (the data was accounted for) and for scientific reasons. So Y=b0+b1X
The chart shows two envelopes around the regression line. The inner envelope (red) is a confidence bound for the regression line itself -- the "slop in the slope." That is, it tells us how closely the analysis has pinned down the parameter b1. The outer envelope (blue) is the prediction interval, which tells us what likely densities could account for the observed backscatter. This is what is really of interest. If the bunker backscatter measures 6000, the density is most likely somewhere betwixt 64 and 70. Which means it might not be. It is possible to have a very precise interval around a very wrong value. But that is a topic for another day.
The point here is that the model Y=b0+b1X, simple as it is, is subject to other uncertainties that cannot be addressed quite so mathematically. One is how widely it applies.
- We have already decided that this particular regression will not apply to other coal shipments because the very relationship Y=b0+b1X may differ.
- It would not be appropriate to apply the model to bunkers with backscatters much beyond 5400 to 6500. It might still be valid, but we don't know. The b1 parameter was calculated from the calibration data, and the results cannot be extrapolated beyond the range of those data.
- There is also the question of whether a one-variable linear regression is the best model. There may be other Xs we could include that would give us a tighter prediction. That the calibration points do not fall on a perfectly straight line indicates that there are other factors in play. The coefficient of determination is 91%, which means (loosely speaking!) that 91% of the variation in density is accounted for by its relationship to backscatter, which leaves 9% unaccounted for.
- The parameter b1 was estimated using eleven calibration tube samples. This produced part of the uncertainty in the parameter estimation; that is, in the slope of the regression line. Was this sample size sufficient? Sufficient for what purposes?
- The backscatter is not measured with perfect precision. The vertical line at 6000 implies the measured backscatter at the bunker was exactly 6000. In reality, all instruments suffer from uncertainties related to precision and reproducibility. That vertical line should be a band of probabilities. The uncertainty in X is propagated through the model to an additional uncertainty in Y over and above what is shown.
- The Y values on the calibration data were also measured with uncertainty.
- In addition to the instrument, there are uncertainties related to the technician: technique, attentiveness, skill, and so forth. The coal samples were supposed to be taken according to ASTM D2234 and D2013. Were they? What about the calibration tubes? Do the scribe lines mark the exact volumes for the calibration?
Coming next, a closer look at some uncertainties in models, with special attention to that model.
Part III link.
References
- Box, George E.P., William G. Hunter, J. Stuart Hunter. Statistics for Experimenters, Pt.IV “Building Models and Using Them.” (John Wiley & Sons, 1978)
- Cartwright, Nancy. "How the Laws of Physics Lie."
- Curry, Judith and Peter Webster. “Climate Science and the Uncertainty Monster” Bull. Am. Met. Soc., V. 92, Issue 12 (December 2011)
- El-Haik, Basem and Kai Yang. "The components of complexity in engineering design," IIE Transactions (1999) 31, 925-934
- von Hayek, Friedrich August. "The Pretence of Knowledge," Lecture to the memory of Alfred Nobel, December 11, 1974
- Jackman, Robert W., "The Predictability of Coups d'Etat: A Model with African Data." (Am.Pol.Sci.Rev. (72) 4, (Dec. 1978)
- Petersen, Arthur Caesar. "Simulating Nature" (dissertation, Vrije Universiteit, 2006)
- Swanson, Kyle L. "Emerging selection bias in large-scale climate change simulations," Geophysical Research Letters
- Turney, Jon. "A model world." aeon magazine (16 December 2013)
- Walker, W.E., et al. "Defining Uncertainty: A Conceptual Basis for Uncertainty Management in Model-Based Decision Support." Integrated Assessment (2003), Vol. 4, No. 1, pp. 5–17
- Weaver, Warren. "Science and Complexity," American Scientist, 36:536 (1948)
Oddly enough, Holmes misquotes that biblical reference. He actually says "bricks without clay". My sister quoted it a while back and I was sure she was wrong, but then I discovered no, Holmes (or Doyle) was, and she was quoting their misquote correctly.
ReplyDeleteFraught with danger, obviously. Peril even!
ReplyDeleteI have often claimed to be fraught with certainty, but most people insist that's a vice on my part.
I usually avoid saying "thank you for this post" so as to avoid cluttering up your comments page, but this was a truly exceptional outline; thank you for this post.
ReplyDeleteIs that Schrock book the best introduction to Context Uncertainty you could recommend?
No. Ijust couldn't find a photo of Ed on the Internet!
DeleteThe Walker paper was the source for much of the skeleton of the post. His "taxonomy" of uncertainties "by location" is pretty much the standard. The Curry/Webster paper discusses the uncertainties applied to climate models.
Classic case Number 1:
DeleteThe "Paraffin Test" (famous from 40's film noir). Dip someone's hand in paraffin. Peel the paraffin off. See if it fluoresces. If so, they just fired a gun. Why? because the gunpowder residue embedded in the paraffin and extracted from the hot, open pores fluoresces.
Unfortunately that is where the "science" stopped. Then it became a tool for prosecutors. Periodically they would get anomalous results.
Turns out that some cosmetics fluoresce as do farm chemicals. But they kept using it. It was "scientific".
They had "forensic scientists" appear and testify at each trial about how the test was "generally accepted".
The accuracy with which the "model" fitted "reality" was tested by the number of convictions. If you can show a 95% conviction rate using the paraffin test, that is proof that it is a reliable scientific test.
Classic case Number 2: The "Castro Case" (ca. 1987) in which a DNA test using a fuzzy Western Blot was used to convict Mr. Castro. The prosecution claimed the probability of the DNA coming from someone other than Castro was ~1 in 5 billion, based on the population frequency of several pairs of matching bands in Castro's blood and in the blood found on the victim.
When questioned on how they determined whether any pair of bands "matched", they said they took a vote in the lab, and if two out of three "forensic scientists" thought the bands matched then they matched.
So they got a 2/3ds popular vote on maybe five pairs of bands, and based on that they told the jury there was a 1 in 5 billion chance of error on the DNA test.
Chilling.
It certainly clarifies how writing "The Wreck of the River of Stars" might be a diversion after long days at the office. Seems like going to an Aquinas conference ought to be tax deductible as professional development....
ReplyDeletea few disjointed remarks:
ReplyDelete1. It might be helpful to relate the remarks about the scope of Newton's model to the (unremarked AFAICS) scope of the Copernican model. One of the nice things about Newton's model is that it covered more things (some existing things, like comets, and some easy-to-imagine things, like Jules Verne's voyage to the moon) in a consistent way. The old astronomical models performed impressively well (especially when no one as careful as Tycho Brahe was checking them) but one of their limitations was they treated as essential distinctions things that Newton blew past.
2. You wrote "That is, there is always more than one way to skin a cat -- and more than one way to model a situation." You give examples of one sense in which this is true. There is another sense in which this is true: models can look very different and still come to similar or even exactly-equal results. E.g., Hamiltonian mechanics, Lagrangian mechanics, and Newtonian mechanics look very different, and it's not just superficial: their mathematical plumbing is so different that a problem that is easy to solve in one is hard to solve in another. But they are exactly equal, no different than Roman numerals vs. Arabic numerals. A similar precisely-the-sameness apparently exists between the Feynman, Schwinger, and Tomonaga representations of QED, but it's very hard to see: one of the things that Freeman Dyson is known for is showing it.
This apparently caused a lot of confusion in early quantum mechanics (1920 or so) as people disagreed over things which later turned out to mean the same thing in practice. I studied quantum mechanics in the 1980s, and encountered the lingering fallout from this: especially a tendency to be very impatient about possibly-superficial disagreements about representation of things, and jump quickly to asking whether there is any essential disagreement about what will be observed when a particular experiment is performed.
3. You might want to point interested readers at http://yudkowsky.net/rational/technical/ which says some useful related things (about uncertainty and partial correctness, e.g.) at a more detailed level than you are likely to, because while it tries to be easy to read it makes much heavier use of equations and numerical examples than I remember seeing on your blog.
(I think Curry is right to ask many of those uncertainty questions, but while I understand why you choose to write at the avoid-equations-and-numbers level, I don't understand why she seems to want to investigate only at the avoid-equations-and-numbers level. Besides the Yudkowsky Bayesian-lite page above, or a text like Jaynes' _Probability Theory_, the machine learning people have done a lot of quantitative inference stuff that bears on the questions of inference and attribution, importantly including quantitative information-theoretic generalizations of Occam's Razor which seem to bear rather directly on Curry's investigations of we can sensibly conclude from climate models, and about them. See e.g. http://stellar.mit.edu/S/course/6/sp08/6.080/courseMaterial/topics/topic1/lectureNotes/lec20/lec20.pdf and Gruenwald _the Minimum Description Length principle_ for pointers into the two main ways I am aware of --- which are closely related, but not as equivalently identical as e.g. the Newton/Lagrange/Hamilton example I gave above. Curry is a professor of atmospheric sciences and coauthor of a book on thermodynamics; it seems as though the basics of either the VC or MDL approaches should should be fairly lightweight math compared to her background, especially the fluid mechanics.)
Your remarks are quite well-taken and I would like to cite them in Part III when I get around to it.
DeleteI generally avoid too much math here for two reasons: 1) most of the readership is not into it and 2) I have a hard time writing equations on Blogspot. I usually have to write them out as an Object in PowerPoint, then save the slide as a jpeg image and insert it using the "picture" button.
Newton, or rather Kepler invented Astrophysics. The ancients were doing Astronomy.
DeleteThere is a difference here that is overlooked by the word "model" for both.
Astronomy describes What Astrophysics seeks to know Why.
Kepler, by his three laws, launched the concept of a force that keeps the planets in their orbits.
The goal of Science generally and Physics particularly is to know the causes of things. That an ellipse would take place on such and such time is a statement of Astronomy but the ellipse occurs because of Moon interposing between Sun and Earth is a more physical statement.
This was precisely the revolution: astronomy moved from the math department to the physics department. The fundamental contribution of the telescope was that it allowed people to see the heavenly bodies as actual physical places.
Delete"To know the causes of things." In the medieval period this was called the "propter quid." The "quia" were what we call the facts/observations.
"The real reason why Copernicus raised no ripple and Galileo raised a storm, may well be that whereas the one offered a new supposal about celestial motions, the other insisted on treating this supposal as fact. If so, the real revolution consisted not in a new theory of the heavens but in 'a new theory of the nature of theory'.
DeleteThe Discarded Image (CS Lewis)
---------------------------------------------------------------------------------------------
Are theories merely useful or are they true in some sense?
One one hand, clearly more than one theory may explain the same phenomena, thus the theories can not pretend to the truth.
On another hand, we have theological reasons to believe that God wants us to know the truth, he has set the cosmos open to us, the cosmos reveals the glory of God etc. So the theory has some claims on the truth.
So, interestingly, philosophically, the scientific theories can not be more than useful while theologically, they may be true or point to the truth.
"Context uncertainty" calls to mind an old joke:
ReplyDeleteA lawyer objected, and the judge asked why. When the lawyer finished explaining, the judge replied. "You reason eloquently, sir, and your argument is compelling. I sincerely hope you one day try a case in which it may be relevant. Overruled."
Didn't the Ptolemaic model had a known but ignored problem with the varying brightness of the planets as seen from the Earth?
ReplyDeleteThey took care of that with the epicycles, which brought the planets closer and farther.
DeleteCould epicycles of Venus, for instance, account for eightfold variation, I believe, in the apparent brightness of Venus?
DeleteFor epicycle is only a perturbation on the circle, and its radius must be smaller than the circle radius. Take circle radius as unity and epicycle radius as 0.2, Then the distance Venus-Earth goes from 0.8 to 1.2. And brightness going as square of distance would be 0.64 to 1.44, A factor of 2.
You would need pretty big epicycles to account for brightness variations. And were they really fitting the curve for the brightness?
This comment has been removed by the author.
DeleteAccounting for variation in apparent brightness was one of the things the Ptolemaic system was deliberately designed to do, and one of the reasons it was taken to work so well. It was celebrated for its ability to handle the problem. You have to keep in mind that the apparent brightness of Venus in reality does not depend on distance alone but on the phases of Venus as well, so that actual variation is much, much smaller than real distance alone would indicate. I'm not sure where you get the eightfold variation; surely the difference is closer to two or three times. (It's also unclear to me why you are bringing in the inverse square law for light; are you speaking solely of the post-Kepler situation?)
DeleteI have taken the brightness problem in Ptolemaic system from Koestler's The Sleepwalkers. I may be recalling the factor of eight wrongly and need to recheck.
DeleteIt might be a matter of what stage in the game he was considering; it's possible (I don't know) that you could re-introduce the problem once you move from naked eye observation to telescope observation (which allows you to recognize the phases of Venus). But the Ptolemaic system had a lot of tools to allow re-calibration for new measurements -- circles didn't have to be strictly concentric, they could be slightly tilted, and new circles could always be added -- so I'm not sure exactly how the argument would work.
DeleteWhat's going on here is an interesting puzzle I'll have to look into at some point. Looking around, I find one source (Gunnar Andersson's Criticism and the History of Science) making the opposite claim -- i.e., that both the Ptolemaic and the Copernican systems suggested more variation in the brightness of Venus than we actually get (with Galileo being able to explain why Venus is nearly constant in brightness). Of course, if that's the case, as Andersson recognizes, then we deal with the problem of measurement precision -- on its own it could be explained by the limitations of naked eye observations, or atmospheric effects, or the like.
"A model is a mechanism to assign probabilities to propositions p, given evidence E:"
ReplyDeleteA rather instrumental view. A physicist would rather talk in terms of understanding. A model helps us to understand the system.
And what does "probability" mean here?
If I assign a probability, say 0.3, to a proposition X, what precisely I have done?
This comment has been removed by a blog administrator.
ReplyDeleteHi.
ReplyDeleteAre you the author of the first image? (Organized simplicity, Disorganized complexity, Organized complexity), I would lie to cite it on a work.
yes
Delete