A beautifully told story with colorful characters out of epic tradition, a tight and complex plot, and solid pacing. -- Booklist, starred review of On the Razor's Edge

Great writing, vivid scenarios, and thoughtful commentary ... the stories will linger after the last page is turned. -- Publisher's Weekly, on Captive Dreams

Wednesday, August 22, 2012

The Wonderful World of Statistics - Part I

Yes, it's that time once again.  So sit yourself down, kick your feet up, and pop a couple of brewskies, and ready yourself for:

The Allegory of the Fluoropolymers

Once upon a time, when the world was younger and TOF still had to work for a living, TOF came to the Land of the Fluoropolymers somewhere in the wilds of central New Jersey.  Among the many sterling qualities of these fluoropolymers - indeed of all polymers and many other substances beside - is the viscosity of the material.  Viscosity may be thought of as the flow of that which is thick.  High viscosity is "thick" and low is "thin."  It is measured (usually) in centipoise (cps) although TOF has been in situations measured in poise and (once, memorably) in which it was measured in kilopoise.  (The latter involved not liquids dripped through a Zahn cup, but solid plastic pellets pushed through an orifice.  TOF will leave you with that thought.)  The science fiction masterpieces of Flynn have sometimes been called not "hard SF" but "high viscosity SF," much like the wit of TOF, which is also said to be thick. 

Time plots for Reactor C (above) and D (below) showing
the effect of changing the grade of polymer on viscosity.
Now TOF had come among the heathen to preach the wisdom of the ancients - i.e., statistical process control - to a class generously described as The Unwilling.  In preparing lessons, he had acquired data on polymer viscosity, which he used as an example of various time series analyses. 

He showed the class the time series for Reactor D (lower panel of picture).  Note, he said, that the viscosities for Polymer Grade 500 are not compatible with those for Polymer Grade 400.  This can be done by fairing a median through the data for Grade 400 (about 32 cps) and then counting runs above and below this median for subsequent batches.  We observe that all the batches for Grade 500 fell above the median for Grade 400 when by chance alone we would expect only about half of them to do so.  We therefore conclude that Grade 500 had a higher median viscosity (about 35 cps).  We can even say something about how unlikely this would be, given various assumptions about the Model; but it is best not to get too carried away with that.  Probabilities are almost always wrong, and are not directly the object of process control.  

BFD, said the Unwilling.  Grade 500 is supposed to be more viscous than Grade 400.  It's what we expect to see.  Your stupid statistical methods told us nothing we did not already ken. 

But then, asked TOF, what went wrong on Reactor C (top panel), which did not exhibit the "expected" shift when the Grade was changed? The viscosity was about 34 cps for both grades. 

TOF was not interested in detecting previously unknown issues, only in demonstrating how Ellis Ott's analysis of runs can be used on a simple time plot.  But one thing he had known for certain sure: either Reactor C or Reactor D was not behaving the right way.

But he has always wondered since then what the Unwilling would have said had he first displayed the plot for Reactor C and noted that there had been no change when the grades were changed.

Adventures in Alternate History

Now the final cause of process control is to identify and eliminate special causes of assignable variation, not to create imaginary data for a Reactor D that might have been.  To illustrate, let us engage in the herculean task of supposing that the charts ran backward.  This would give us a situation in which Reactors C (white dots) and D (black dots) were producing batches of similar viscosity for a period of time.  Then, supposing we had switched from Grade 500 to Grade 400, we notice that the two reactors are now producing markedly different results.  Reactor D, instead of running 1 cps thicker, is now running 2 cps thinner, a change of 3 cps. 

Suppose that, rather than a search and destroy mission against the cause of the change, we simply wished to "adjust the data" so that the past similarity of the two reactors was preserved.  We could do this by calculating D' = D+3 cps. and replacing the actual data with the imaginary data.  This is fine, especially if the root cause of the change lay in instrument calibration; but note that a number of assumptions leap on board: namely, that the difference between C and D "ought" to be 1 cps.  This requires some faith, since the data was noisy to begin with and we can't be entirely sure that the difference "really" was 1 cps.  We also cannot be sure that the change we saw for Grade 400 will continue in the future.  So do not, above all else, implement an automated adjustment algorithm!  Who knows if there might not have been a lurking variable that just happened to coincide with the production run for Grade 400.  The next time 400 is run, we might see something else entirely.  And what if we ran Grade 100, 200, or 300?  To make the adjustment legitimate, we need non-statistical data about both the manufacturing and measuring processes.

Why would we ever want to do this?  

Well, if an instrument were found to be mis-calibrated or used improperly and it was physically impossible to replicate the original measurements, it might be a necessary resort.  In his book Statistical Adjustment of Data, W.Edwards Deming uses the example of a triangular part whose angles were measured with a protractor but whose sum did not add up to 180°.  If the part cannot be remeasured, it might be legitimate to adjust the three angles so that they do add to a proper sum.  (The non-statistical information here is Euclidean geometry.)  The same might be true for a running record of process variables like temperature of a curing oven or vapor pressure of a separating column.  Without a time machine, it would be hard to correct the past; but it might still be operationally important to estimate what the real temperatures or pressures might have been.

Just don't confuse these with actual measured data. 

Coming Soon: The Allegory of the Cookies.

No comments:

Post a Comment