The Eighth Uncertainty: Cheery Picking
|A TOFling at the cute age|
Judgment samples may be useful -- for example, in finding exemplars of different defects or different opinions -- but NOT for forming estimates of the larger population. Face it. If you already know which units are "typical," why are you taking the sample?
|Bird's eye view: Units selected conveniently from the outside|
rows of the pallet will drastically under-represent Machine #2,
which tends to fill the middle of each layer, as shown.
- parts from the top tray in a shipping carton
- product on the outside rows of a pallet
- the last fifty invoices received
- shoppers haphazardly encountered in a store
- the past week's data
- temperature data from wherever thermometers happen to be already located
- questionnaires taken of a class of college psych students
A bank once tried to estimate the annual errors in accounting by taking the month of July, reviewing 100% of all transactions (so they could be really really sure) and then multiplying the result by 12. It is impossible to imagine that such an estimate could ever possibly be correct.
Sampling error is the variation found among samples pulled from the same population. In exercises, one may pull 100 beads from a box of 1000 variously-colored beads -- TOF has such a box -- and count the number of blue beads. Then, replacing the beads (so the population remains identical), a second sample of 100 can be drawn in the same way, and will generally yield a different count. This is sampling error is generally what is meant when a pollster says that his results are subject to an error of ±5%. Sampling error can be reduced by increasing the sample size (and by various stratification techniques).
But sampling error is often only a small component of the uncertainty in sampling. Much greater errors result when the field workers do not understand what they are counting.
Example: Counting jams. A supervisor called in his two lead operators, each of whom was responsible for 10 machines, and told them they were to count the number of machine jams on each machine for each shift. The supervisor proposed to record the tally on a board from which, for sundry reasons, he could discern when there was a problem with the metal, the tooling, the previous operation, the lubricant circulation loop, etc. Each causal factor would make a different footprint on his board. It was quite clever.
TOF used to run an exercise in his training classes in which the students inspected in three minutes a page containing 450 random three-digit numbers, with the goal of counting all those within a certain numerical range designated as "defects." Results were all over: some counted fewer "defects" than were actually there, while others counted more! This held true for class after class after class at a variety of clients (including scientists) from a variety of countries. Now tell TOF what a great idea it was to manually verify all the punch card ballots in a Florida election because the counting machines might have made errors! That sound you heard back in 2000 was thousands of statisticians and quality engineers rolling their eyes.
It became evident after two days that the ten machines on one side of the line were reporting twice as many jams as the other ten machines. The reason was that one lead operator had told his operators to make a tally mark on the checksheet every time the machine jammed. The other lead had told his operators to make a tally mark every time there was a jam caused by the machine. They had discovered two different ways of hearing "count the number of machine jams." And since the two banks of machines were separated by a formidable piece of equipment, the two groups did not realize that they were counting differently!
It doesn't matter how large a sample you take if you are measuring or counting the wrong thing. In fact, larger sample sizes can exacerbate the non-sampling error by adding fatigue and boredom. TOF has encountered cases in which inspectors recorded data without the ugly necessity of making the actual measurements beforehand. It is unlikely that the bank mentioned earlier obtained an accurate count even for the month of July, since non-sampling uncertainty -- errors due to fatigue, boredom, 'highway hypnosis,' misunderstood definitions, and so on -- are far more formidable than sampling variation.
A special situation applies to samples taken from a continuous material, such as throughout a sheet of dough, a bolt of cloth, a coil of aluminum, etc. All these are essentially alike. Another example is the siting of temperature stations for the purpose of estimating atmospheric temperature.
Some years ago NOAA noted in its USCRN Program Overview, "we do not have, in fact, an observing network capable of ensuring long-term climate records free of time-dependent biases." To eliminate or reduce biases due to heat islands, different instrumentation, asphalt, geographical clustering, etc. NOAA inaugurated a new network of stations using identical equipment calibrated on a regular basis, more evenly spread across the territory to be measured, and located well away from areas likely to be urbanized in the future. Wherever possible, the stations were paired so that unusual measurements could be cross-checked. A number of other precautions were taken to eliminate issues that had plagued the older network (which had simply used stations originally sited for air traffic control or other situations independent of estimating atmospheric averages.
The Ninth Uncertainty: Data Recording and Entry
|Outliers may indicate data entry problems|
63, 57, 62, 56, 55, 63, 56, 56, 58, 55, 56, 75, 55
Without an advanced degree in statistics, you will probably not suspect something funny about the penultimate data point. Haha. TOF jests. The "75" jumps up with its thumbs in its ears, wiggling its fingers and going wubba-wubba. Let's have a show of hands. How many suspect that the 75 was actually a 57 but was entered with the digits flipped?
TOF knew he could count on his loyal Reader. Another case involving tube fill weights of zinc oxide, it was quite clear that one of the entries on the data sheet for the 1-oz. tube filler line had been measured on a tube from the 2-oz. line. (You could tell from the "2" in front of the number: Among all the 1.1, 1.2, and 1.3 entries was a big fat 2.2.
Guess what we found when we looked at the data sheet for the 2-oz. filler?
|Outliers may indicate lab errors|
The reason? The data was generally of the sort 5.7, 8.1, 3.2, 4.5, 6.3, etc. Now just imagine what happens if on a single entry the decimal place was not pressed well enough to register. Suddenly a 3.2 becomes a 32 and blows the average out of the water. The most difficult task was actually convincing the technician that the average was wrong because "that's what the computer [sic] told me!"
|Outliers may also indicate genuine happenings in the process|
Ignore data after the change in die set: different problem.
Each instance of an outlier must be investigated on a case-by-case basis. Data should never be corrected with an automatic algorithm because the outlier is not always "bad" data! There might really have been a spike, as in the transient spike in downtime caused by a tool wreck.
Which will bring us to the next Uncertainty in Part VII: Adjustment of Data. (The Link will be added when the next post us up.)
- ASTM Designation E 178-75, "Dealing with Outlying Observations" (American Society for Testing and Materials, 1975)
- Cochran, William G. Sampling Techniques (John Wiley and Sons, 1977) esp. Ch.13.
- Curry, Judith and Peter Webster. “Climate Science and the Uncertainty Monster” Bull. Am. Met. Soc., V. 92, Issue 12 (December 2011)
- Deming, W. Edwards. Some Theory of Sampling (John Wiley and Sons, 1950) esp. pp 26-30.
- Deming, W. Edwards. Sample Design in Business Research (Wiley-Interscience, 1960) esp. Ch.5.
- Juran, J. M., "Different to You but Alike to Me," Industrial Quality Control, Vol. 19, No. 10 (1963)
- Natrella, Mary Gibbons. Experimental Statistics (National Bureau of Standards Handbook 91, 1963) Ch.17 "The Treatment of Outliers"
- NOAA, USCRN Program Overview
- NOAA, Climate Reference Network (CRN) Site Information
Handbook, NOAA-CRN/OSD-2002-0002R0UD0, December 10, 2002.
- Ott, Ellis R. Process Quality Control: Troubleshooting and Interpretation of Data (McGraw-Hill, 1975) Ch.9:"Ideas from Outliers"
- Ravetz, Jerome R. NUSAP - The Management of Uncertainty and Quality in Quantitative Information
- Rosander, A.C. Case Studies in Sample Design, (Marcel Dekker, 1977)
- Williams, Bill. A Sampler on Sampling (Wiley Interscience, 1978)