No, no, no. Not that kind. The interesting and exciting kind. Mathematical and statistical models!
TOF can hear your pulses quickening all the way up here in his Fortress of Solitude. Tell us more, TOF! (he hears you cry).
But you knew that the moretelling was going to happen, didn't you.
Some years ago, TOF read a book review, the title and author of which he has forgotten. Which is too bad because he now has an Amazon.com gift card and wouldn't mind buying it, if the price were right. The gist of it was a research team studying beach erosion who came to the conclusion that the actual erosion on the beaches they studied never matched the theoretical erosion produced by mathematical models of erosion.
One marker of the Late Modern Age is a blind faith in Pythagorean number magic. Sometimes -- and TOF knows this is hard to believe -- the outputs of computer models are even referred to as "data." People who would believe that are subject to any number of other delusions and TOF knows of a bridge they could buy in Pennsylvania.
TOF tells you three times: the output of a model is supposed to be a prediction of a real world characteristic. That prediction will be wrong, but sometimes not too wrong, and the nature of its wrongness may lead to insights. But it is not in and of itself "data." Yet, so much of what is reported in the newsfog -- number of illegal immigrants from Patagonia, number of cases of Patagonian rat fever, and so on -- are actually the outputs of models, not the counting of noses. This is a great comfort to those who might otherwise be tasked with nasal enumeration. Building a model is done in safe, clean, and air-conditioned accommodations. Counting actual noses runs the risk of being bitten by Patagonian rats, or immigrants.*
(*) TOF recollects a computer model of illegal immigrants that, broken down by country-of-origin, predicted a negative amount of illegal Irish immigration. One can only imagine sons of the Gael slipping clandestinely from Maclean Ave. in the Bronx on their way to Tir na nOg. No one seemed to regard this as a deficiency in the model.
Models provide a drastic simplification of reality, and are in no sense accurate or true representations of real drainage systems. This is why they are wrong.
Simplicity and ComplexitySo why model? Because waiting until after a new design is launched or the new policy promulgated is a heckuva time to discover that the design or policy sucks great horny toads. Modelling is a way to test a plan "on paper" before you have to "bend tin." However, for this very reason, the design of the model deserves painstaking attention, especially to the uncertainties that are always built into them. Otherwise, it's simply one more way to fool yourself. Believing your own PR is insane.
8, Dr. Warren Weaver defined the notions of simplicity and complexity in the sciences.
1. Organized simplicity describes a system characterized by a small number of significant variables tied together in deterministic relationships. Sometimes only one significant variable was involved, perhaps two; at most maybe four. The interactions and effects of these variables can be grasped analytically through mathematics.
A good example is Newton's Law of Universal Gravitation:
F= G(M*m)d²Analytically, the only two variables are mass and distance. Physically, there are only two masses and one relationship (gravity). For three or more bodies, there is no analytical solution other than an infinite convergent series, and in practice one uses numerical methods of approximation.
Physicists can get away with organized simplicity because they make simplifying assumptions: motion in a vacuum; perfectly elastic collisions; "ideal" gasses; infinite Euclidean 3-space; and so on. They abstract just that which is amenable to those Cartesian methods which, since the 17th century, are the only permissible ways of doing Science!™ And it works well enough in many cases.
But not all.
2. Disorganized Complexity. As the n-body problem illustrates, there are also systems characterized by a large number of elements. Each of the many elements may be individually erratic, or even unknown. However, despite this "random" (recte: "unknown") individual behavior, the system as a whole possesses certain orderly and analyzable average properties. In effect, we can substitute a group average for the (possibly unknown) individual element values.
A common example is the actuarial table. The casino does not know in which roulette slot the little ball will drop. (Or, if they do, do not gamble in that casino.) But they can know with great precision the likelihood that the ball will drop into each slot. The same applies to things like insurance, quality control of manufactured product, thermodynamics in physics, etc.
But only if the complexity is disorganized; that is, the variation among elements is random. It is precisely the randomness that makes statistical mechanics work. There is another kind of complexity that cannot be studied this way.
3. Organized complexity. Significant problems in biology, economics, and so on can seldom be characterized by two to four variables. It is impossible hold all the other variables constant. We wind up with a half-dozen, or even several dozen quantities, varying simultaneously in interconnected ways. Genomes or economies have a wholeness to them that a mass of gas or even a mass of policyholders do not. Weaver mentions such things as predicting the price of wheat, stabilizing a currency, controlling economic business cycles, predicting behavior patterns of a labor union, a group of manufacturers, a racial minority, allocating resources to win a war (or avoid one), and so on.
These problems are too complex to handle with the 19th century mathematical techniques that were so successful on problems with two-to-four elements. But neither can they be handled with the statistical mechanics of the 20th century that worked on problems of disorganized complexity. You cannot take an average of heterogenous data.* This is the reason for the failure of things like socialism or the assumption that the underlying elements of evolution are random "like" thermodynamics. Genes are complex, but organized. They do not vary "at random."
(*) average of heterogenous data. Example: The average human being has roughly one testicle and one ovary. So what does that tell you? Or consider taking the average of a time series of automotive battery paste weights (shown here ) or of aggregate temperatures from weather stations.
A third way of doing science was needed for problems of this type, as Friedrich August von Hayek noted in his Nobel acceptance speech:
Cannot substitute the average: Suppose a set of machines suffers stoppages on the average of four times per day, and that each stoppage requires an average of seven hours of maintenance time. If we use 4 stops @ 7 hrs. and budget 28 man-hours per day for maintenance, the maintenance backlog will steadily increase until it reaches saturation; that is, until there are enough undone jobs in the queue that maintenance will never lack for 28 man-hours of work. This is because variation above and below those averages has different consequences. If there is more than 28 hours of work, the undone work goes into backlog. If there is less than 28 hours work on a particular day, the unused capacity cannot be saved for the next day.
Organized complexity... means that the character of the structures showing it depends not only on the properties of the individual elements of which they are composed, and the relative frequency with which they occur, but also on the manner in which the individual elements are connected with each other. In the explanation of the working of such structures we can for this reason not replace the information about the individual elements by statistical information, but require full information about each element if from our theory we are to derive specific predictions about individual events. Without such specific information about the individual elements we shall be confined to what on another occasion I have called mere pattern predictions - predictions of some of the general attributes of the structures that will form themselves, but not containing specific statements about the individual elements of which the structures will be made up. [Emph. added]
-- Friedrich August von Hayek, "The Pretence of Knowledge"
The Third Way
|Yang and el-Haik, |
disguised as a book.
The bad news is that in most complex situations, we cannot know the "full information about each element." Important factors may be unmeasured and unmeasurable. And if the information is the least little bit off, the calculated response may diverge enormously. This part of complexity theory is sometimes called "chaos theory" and little butterflies begin flapping their wings in Patagonia to cause storms in Kalamazoo.
|Juran, a native of Transylvania|
and looking it.
The good news is that even though real-world systems are multi-causal, a small subset of the contributing factors generally accounts for the bulk of the response. This is known as the Pareto Principle, so-called by the late Joseph M. Juran. This is sometimes called the 80-20 rule: 80% of the results are due to 20% of the contributors. For example, most of the rushing yardage gained in the NFL is gained by a small percentage of the running backs, who mostly seem to play for Seattle. (But let us not beat a dead horse.) Most of the words used in documents comprise a small percentage of the words in the dictionary. (Compare how often "the" appears in this post to how often "numinous" appears.) The percentages need not be precisely 80% and 20%, of course. Forty US Metropolitan Statistical Areas account for 50% of the population of the USA. (Two metro areas -- NYC and LA -- account for one in ten Americans.)
So sometimes we can handle organized complexity by identifying the "vital few" elements and model those, leaving the others "to vary as they vary." But that won't always work. A model of the US population based solely on the forty largest Metropolitan areas will overlook the fact that non-metropolitan areas -- rural, for example -- may behave very differently. In 1936, the US Presidential election was modeled by a massive sample, predicting a big victory for Alf Landon. But the model was predicated on a sample of telephone numbers and during the Great Depression, telephone ownership was correlated with greater wealth and therefore with different political inclinations.
So Weaver proposed in 1948 as the "Third Way" method for dealing with organized complexity, the methods of operations research pioneered during the War coupled with the new-fangled "electronic calculating machines." In short, modeling.
Next time, we will look at How Models Go Bad.