A beautifully told story with colorful characters out of epic tradition, a tight and complex plot, and solid pacing. -- Booklist, starred review of On the Razor's Edge

Great writing, vivid scenarios, and thoughtful commentary ... the stories will linger after the last page is turned. -- Publisher's Weekly, on Captive Dreams

Wednesday, June 11, 2014

America's Next Top Model -- Part VI

Now that we know how to measure our Xs (and Ys) we need to select units on which we will make the measurements not only for the model baseline and calibration, but also for the input data used in running the model later.

The Eighth Uncertainty: Cheery Picking

On rare occasions, we can measure every item in the population, but more usually, we have to select a sample. This is the third pitfall in data collection. Sampling error and sampling bias could be the subject of entire books -- and have been! TOF's favorite for non-statisticians is A Sampler on Sampling, by Bill Williams, a man whose middle name he has always wanted to know. Other books of a more grimly technical nature include Deming, Sample Design in Business Research (Wiley Classics, 1990), Cochran, Sampling Techniques (Wiley, 1977), and Deming (again), Some Theory of Sampling (Wiley, 1950). TOF directs Faithful Reader's attention toward "sources of error in sampling surveys" and similar topics.

A TOFling at the cute age
Two major sources of sampling uncertainty are Judgment Sampling and Convenience Sampling. The former is exemplified by the market researcher with the questionnaire who is instructed that when Mickey's big hand is pointing straight up, he is to stop the next person to reach his position and ask them to answer some questions. The appointed moment arrives and with it, two people approach. One is a young mother pushing a stroller in which is a little goo-goo baby almost, but not quite, as cute as TOF's grandkids at that age (see left for objective evidence). The other is a 250-pound biker with a tatoo on his face. So who does the marketeer stop and question: the mother or the mother. Either answer is wrong because, regardless, he is imposing his own judgment on the sample.

Judgment samples may be useful -- for example, in finding exemplars of different defects or different opinions -- but NOT for forming estimates of the larger population. Face it. If you already know which units are "typical," why are you taking the sample?

Bird's eye view: Units selected conveniently from the outside
rows of the pallet will drastically under-represent Machine #2,
which tends to fill the middle of each layer, as shown.
Convenience sampling is even more insidious. It consists of selecting units simply because they are more easily obtained. This is like forecasting the weather by looking out TOF's window, a task complicated by the unfortunate fact that his office does not possess a window. Other examples include:
  • parts from the top tray in a shipping carton
  • product on the outside rows of a pallet
  • the last fifty invoices received
  • shoppers haphazardly encountered in a store
  • the past week's data
  • temperature data from wherever thermometers happen to be already located
  • questionnaires taken of a class of college psych students
Such chunks -- it would be unseemly to dignify them as "samples" -- might tell you something about the students in that psych class, but it is unlikely to tell you about all psych students, all students, all human beings, or any other such larger population in which you have an interest. Samples from the tops of the carton will tell you about the parts most recently produced, since these tend to go in the box last, but may not say much about parts produced earlier, which wound up in the bottom layers.

A bank once tried to estimate the annual errors in accounting by taking the month of July, reviewing 100% of all transactions (so they could be really really sure) and then multiplying the result by 12. It is impossible to imagine that such an estimate could ever possibly be correct.

Sampling error is the variation found among samples pulled from the same population. In exercises, one may pull 100 beads from a box of 1000 variously-colored beads -- TOF has such a box -- and count the number of blue beads. Then, replacing the beads (so the population remains identical), a second sample of 100 can be drawn in the same way, and will generally yield a different count. This is sampling error is generally what is meant when a pollster says that his results are subject to an error of ±5%. Sampling error can be reduced by increasing the sample size (and by various stratification techniques).

But sampling error is often only a small component of the uncertainty in sampling. Much greater errors result when the field workers do not understand what they are counting.
Example: Counting jams. A supervisor called in his two lead operators, each of whom was responsible for 10 machines, and told them they were to count the number of machine jams on each machine for each shift. The supervisor proposed to record the tally on a board from which, for sundry reasons, he could discern when there was a problem with the metal, the tooling, the previous operation, the lubricant circulation loop, etc. Each causal factor would make a different footprint on his board. It was quite clever. 

It became evident after two days that the ten machines on one side of the line were reporting twice as many jams as the other ten machines. The reason was that one lead operator had told his operators to make a tally mark on the checksheet every time the machine jammed. The other lead had told his operators to make a tally mark every time there was a jam caused by the machine. They had discovered two different ways of hearing "count the number of machine jams." And since the two banks of machines were separated by a formidable piece of equipment, the two groups did not realize that they were counting differently!
TOF used to run an exercise in his training classes in which the students inspected in three minutes a page containing 450 random three-digit numbers, with the goal of counting all those within a certain numerical range designated as "defects." Results were all over: some counted fewer "defects" than were actually there, while others counted more! This held true for class after class after class at a variety of clients (including scientists) from a variety of countries. Now tell TOF what a great idea it was to manually verify all the punch card ballots in a Florida election because the counting machines might have made errors! That sound you heard back in 2000 was thousands of statisticians and quality engineers rolling their eyes.

It doesn't matter how large a sample you take if you are measuring or counting the wrong thing. In fact, larger sample sizes can exacerbate the non-sampling error by adding fatigue and boredom.  TOF has encountered cases in which inspectors recorded data without the ugly necessity of making the actual measurements beforehand. It is unlikely that the bank mentioned earlier obtained an accurate count even for the month of July, since non-sampling uncertainty -- errors due to fatigue, boredom, 'highway hypnosis,' misunderstood definitions, and so on -- are far more formidable than sampling variation.

Geographical Sampling

A special situation applies to samples taken from a continuous material, such as throughout a sheet of dough, a bolt of cloth, a coil of aluminum, etc. All these are essentially alike. Another example is the siting of temperature stations for the purpose of estimating atmospheric temperature.

Some years ago NOAA noted in its USCRN Program Overview, "we do not have, in fact, an observing network capable of ensuring long-term climate records free of time-dependent biases." To eliminate or reduce biases due to heat islands, different instrumentation, asphalt, geographical clustering, etc. NOAA inaugurated a new network of stations using identical equipment calibrated on a regular basis, more evenly spread across the territory to be measured, and located well away from areas likely to be urbanized in the future. Wherever possible, the stations were paired so that unusual measurements could be cross-checked. A number of other precautions were taken to eliminate issues that had plagued the older network (which had simply used stations originally sited for air traffic control or other situations independent of estimating atmospheric averages.

The Ninth Uncertainty: Data Recording and Entry

Outliers may indicate data entry problems
TOF draws your attention to the fill weights of a liquid pharmaceutical from Port #5 of an 8-headed filling machine. The hourly weights (in coded values) are:
63, 57, 62, 56, 55, 63, 56, 56, 58, 55, 56, 75, 55

Without an advanced degree in statistics, you will probably not suspect something funny about the penultimate data point. Haha. TOF jests. The "75" jumps up with its thumbs in its ears, wiggling its fingers and going wubba-wubba. Let's have a show of hands. How many suspect that the 75 was actually a 57 but was entered with the digits flipped?

TOF knew he could count on his loyal Reader. Another case involving tube fill weights of zinc oxide, it was quite clear that one of the entries on the data sheet for the 1-oz. tube filler line had been measured on a tube from the 2-oz. line. (You could tell from the "2" in front of the number: Among all the 1.1, 1.2, and 1.3 entries was a big fat 2.2. 

Guess what we found when we looked at the data sheet for the 2-oz. filler?

Outliers may indicate lab errors
Computers don't eliminate transcription errors. They simply automate them. Keystroke errors are no less a problem then pencil and paper errors. Direct electronic data entry, from the instrument to the database, is less so; but not always. TOF recalls an instance in which a reported average was greater than all of the data in the sample. It had been obtained courtesy of a four-function hand calculator which the technician was pleased to call a "computer." (This was early days. It's about as close to one as anyone in operations was likely to get.)

The reason? The data was generally of the sort 5.7, 8.1, 3.2, 4.5, 6.3, etc. Now just imagine what happens if on a single entry the decimal place was not pressed well enough to register. Suddenly a 3.2 becomes a 32 and blows the average out of the water. The most difficult task was actually convincing the technician that the average was wrong because "that's what the computer [sic] told me!"

Outliers may also indicate genuine happenings in the process
Ignore data after the change in die set: different problem.
Outlier analysis is a useful way of identifying data that may have been measured or entered incorrectly. It may also indicate genuine changes in the process under study. Dixon's Test is one such method, but really experience with process behavior will normally flag unusual data to be looked into. Examples include not only those illustrated or mentioned here, but cases in which traffic experts misclassified waybills, technicians tightened a micrometer too much, transient power surges, data from a non-linear scale, breakdowns, incomplete operation, etc.

Each instance of an outlier must be investigated on a case-by-case basis. Data should never be corrected with an automatic algorithm because the outlier is not always "bad" data! There might really have been a spike, as in the transient spike in downtime caused by a tool wreck.

Which will bring us to the next Uncertainty in Part VII: Adjustment of Data. (The Link will be added when the next post us up.)

Suggested Reading

  1. ASTM Designation E 178-75, "Dealing with Outlying Observations" (American Society for Testing and Materials, 1975)
  2. Cochran, William G. Sampling Techniques (John Wiley and Sons, 1977) esp. Ch.13.
  3. Curry, Judith and Peter Webster. “Climate Science and the Uncertainty Monster”  Bull. Am. Met. Soc., V. 92, Issue 12 (December 2011) 
  4. Deming, W. Edwards. Some Theory of Sampling (John Wiley and Sons, 1950) esp. pp 26-30.
  5. Deming, W. Edwards. Sample Design in Business Research (Wiley-Interscience, 1960) esp. Ch.5.
  6. Juran, J. M., "Different to You but Alike to Me," Industrial Quality Control, Vol. 19, No. 10 (1963)
  7. Natrella, Mary Gibbons. Experimental Statistics (National Bureau of Standards Handbook 91, 1963) Ch.17 "The Treatment of Outliers"
  8. NOAA, USCRN Program Overview
  9. NOAA, Climate Reference Network (CRN) Site Information Handbook, NOAA-CRN/OSD-2002-0002R0UD0, December 10, 2002. 
  10.  Ott, Ellis R. Process Quality Control: Troubleshooting and Interpretation of Data (McGraw-Hill, 1975) Ch.9:"Ideas from Outliers"
  11. Ravetz, Jerome R. NUSAP - The Management of Uncertainty and Quality in Quantitative Information
  12. Rosander, A.C. Case Studies in Sample Design, (Marcel Dekker, 1977)
  13. Williams, Bill. A Sampler on Sampling (Wiley Interscience, 1978)


  1. I don't know if you want to comment on this but it's all in the news, global warming is causing coastal flooding

    1. I note that the story says that "The seas have risen and fallen before." So why is this time attributable to humans?

  2. The second question is" why is this attributable to humans" the first is, " are the oceans rising" . The article claims levels have risen eight inches since 1880 with the clear implication this is due to melting ice in turn due to global warming. They leave it unspoken that this is man-made.

  3. The first hit I get showing the rising oceans
    . ://

  4. Reminds me of an incident several years ago. At the end of a test in intro to psychology, it was announced that a grad student wanted people to stay and take a survey. It was at a fairly rigorous science/engineering school and most of us had no time to spare. One student asked if it was required, the answer was no. So half of us got up and left. The subject of the survey had something to do with how people dealt with authority. Unless the point was to see how many people would leave, and I do not think it was, then that study had to have a horrible bias in the sample.