The Value of Averaging Noisy Data

Data are noisy. Measurements will be inaccurate for many reasons. Suppose you are given the task of measuring the height of a giraffe. Here are some scenarios that will demonstrate various sources of inaccuracy in your measurements (to skip the discussion of variability and noise affecting measurements, and to get right to the demonstration of how averaging can attenuate noise, go here.):

  1. The giraffe, sadly, was accidentally exposed to a blast of liquid nitrogen that froze her in place painlessly as she stood fully erect. You have a step-ladder tall enough to reach the giraffe’s head, a tape measure accurate to the nearest millimeter, and an assistant on the ground. You climb to the top of the ladder while your assistant holds the zero end of the tape measure against the ground, place the tape against the giraffe’s head keeping it as straight as possible, and read 5.372 m. 
  2. The giraffe has suffered the same fate as in #1. As you reach the top of the ladder you discover that your tape measure is only 5 m long. You carefully hold your right hand at the 5-m height as have your assistant release the tape so that you can pull it up to read the amount to be added to 5 m. You determine that the giraffe’s height is 5.42 m.
  3. Poor giraffe. This time you are alone. You place the end of the tape against the ground as you climb the ladder, and hope to keep it there as you climb. You discover that it is only a 5-m tape, so you hold your hand at the 5-m height and pull up the tape to determine the excess. You get a height of 5.316 m.
  4. In this scenario the giraffe is fine, but she is not too happy to have your assistant on the ground beside her and you climbing a ladder by her head. As she swings her head from side to side near you, you try to read the tape when her head swings by, and get a height of 5.5 m.
  5. In this case the zookeeper did not know you were coming, so he allowed the giraffe to have a double espresso just before you arrived. There’s no way you can safely climb a ladder anywhere near her. You have your brave and under-paid assistant enter the giraffe’s enclosure and stand as near to her as he can, and you take a photo of the two of them. You then measure your assistant and determine that he is 1.832 m tall. You use the photo to determine that the giraffe is 2.95 times as tall as your assistant, so you conclude that the giraffe is 5.404 m tall.
  6. And finally, in a case of true forgetfulness, you get to the zoo without the tape measure, and it’s nighttime. You have the keeper stand near the giraffe, but it’s too dark for a photo, so you estimate that the giraffe is 3 times his height. He tells you that he is 6 feet 2 inches tall. You do a quick mental conversion, putting the keeper at 1.9 m. The giraffe is estimated to be 5.7 m tall.

Six different measurements, six different answers. Which one is right? Clearly the first is probably closest to accurate, but was the tape straight? Was its bottom end properly against the ground? Could you really be sure it said 5.372 m and not 5.373 m? Was the tape measure manufactured and calibrated properly? The problems in measurement are clearly amplified in the other scenarios. In #2 and #3 how sure are you that your hand reflected the 5-m mark accurately? In #3 did you keep the tape end exactly on the ground?  In #4 and #5 the movement of the giraffe and/or your assistant will add some error. In #6 estimates as well as the zookeeper’s exaggeration of his height both corrupt your answer.

If you have the opportunity to make a measurement repeatedly using exactly the same procedure, these sources of variability will affect each measurement, but many causes of inaccuracy will sometimes cause the height to be too large, and other times too small. You hand will sometimes mark the 5-m position too high, and other times too low. The giraffe occasionally holds her head a bit low; at other times she jumps just as you measure. If a source of variability is equally likely to lead to over-and under-estimates of the correct number, then many repeated measurements when averaged will tend to converge on the correct result. Such a source of variability is said to be “random.”

Some other sources of variability are non-random; that is, they tend to lead to errors in the same direction every time. The zookeeper’s vanity will invariably lead to his overstating his height, for example. Averaging multiple measurements will not eliminate this error.

In the case of a signal that varies across space or time, recognizing the signal as distinct from its background can be a problem if there is lots of “noise,” or random variability, in the measurement. A weak radio signal might be hard to understand when embedded in lots of static, say. This is often described as a problem of distinguishing signal from noise. The same problem occurs in determining the brain activity triggered by a specific event (signal) against the background of all the other things the brain is doing at the same time (noise). A neuroscientist finds the “evoked potential,” the electrical signal caused by a particular event, by recording overall activity of the brain when the event occurs multiple times.  The triggered brain activity will occur each time in the same way, embedded in a background of presumably random noise. Averaging the many signals will cause the noise to average out (sometimes it is high, sometimes low) while the evoked potential, the same each time, reveals itself.

We do the same thing in astrophotography. An image of the night sky might contain many very weak details (signal) embedded in a background of noise, often caused by random electrical activity of the digital camera or air currents in the atmosphere deflecting light rays. A single photo will contain weak signal and lots of noise. If the signal is the same across time (and assuming that there is no supernova occurring this is probably the case) and the noise is random, then averaging many photos will allow the noise to cancel out, revealing the signal — in this case the image of the heavens.

FIGURE 1. An arbitrary “signal” that was hidden by random noise in each list of 100 numbers in the spreadsheet array.

To simulate the advantage provided by averaging many images (in astrophotography this is called “stacking” the images) I created a spreadsheet consisting of 100 lists of random numbers between -50 and 50. To each of these lists I added a list of numbers that represented a patterned image (see Figure 1) comprised of numbers between 40 and 60. This yielded 100 lists that ranged from -10 to 110, with a signal of maximum magnitude of 20 units (60-40) embedded in noise of magnitude 100 (50 – (-50)). Each of these 100 lists of numbers appears pretty random — I would argue that it is not possible to recognize the signal in any one of these lists (see Figures 2, 3, & 4).

Figure 2. An example of random noise obscuring the signal. Signal is shown in black, red curve is Noise and Signal combined.

Figure 3. Another example of noise obscuring the signal, Signal, black; Signal + Noise, green.

Figure 4. Signal, black; Signal + Noise, blue.

Each of these 100 lists can be thought of as a single very noisy photograph – in fact a photograph in which the noise is so great that it totally obscures the image.  The spreadsheet allows me to average these 100 lists. If I average 5 of them (Figure 5), the noise is attenuated a little bit, but the signal is still hard to discern. However, if all 100 are averaged, the noise is greatly attenuated and the averaged image very faithfully approximates the underlying signal (Figure 6).

Figure 5. An average of 5 noisy signals (orange) shown against the original signal (black).

Figure 6. All 100 noisy signals averaged (orange), compared to the original signal.

Stacking procedures in astrophotography will average images in this manner, reducing random noise.  Astrophotographers reduce random noise created by the atmosphere from their sky photos (“lights”) by averaging many lights – many photos of the same target. “Darks” are photos shot at the same time and under the same conditions as the many sky photos (same lens, same exposure, same ISO…) but the lens is covered so no light gets in. These darks contain what the camera records as total darkness under the conditions when the lights were taken, plus random and non-random camera noise related to the exposure, to defective pixels, etc.; subtracting the darks from the averaged lights will leave only the signal and some other non-random noise. The remaining non-random noise can be removed through the use of two other kinds of images. “Bias” frames are photos taken at the fastest possible shutter speed (a light might involve an exposure of several minutes; the bias frames will be exposed at 1/4,000 sec or so) again with no light coming into the lens. The bias frames contain info about noise created at the level of individual electrons reading the various pixels of the camera sensors; you don’t want to interpret this noise as part of the image. Finally, “flats” are images taken with a plain, even, diffuse white light coming through the lens focused as it was for the lights, properly exposed to create a white or light grey image. The flats allow any aberrations caused by dust on the lens or camera sensor, or unevenness in the light distribution caused by the lens (“vignetting”) to be corrected in the image. So to summarize the astrophotography process, lights are averaged to remove random atmospheric noise, darks are subtracted to remove camera noise related to the exposure, bias frames are subtracted to eliminate electronic noise, and flats correct for the effects of dust and vignetting. A nice discussion of all of this by can be found at NightSkyPix.

If you read this far, I hope you got something out of this. I’m only beginning to learn how to accomplish all of this in astrophotography, but I prepared this discussion to illustrate the benefits.  At a minimum, I hope you understand how, in general, averaging data can help to reduce variability, and in specific how stacking images can reduce noise.  With regard to behavioral data, the subject of my scientific career, averaging across the people or nonhuman animals being studied is of value only if there is an underlying signal to be revealed. This is not necessarily always the case – see for example Murray Sidman’s argument against averaging behavioral data for learning curves. 

Feel free to email me at with any comments or questions.



Please follow and like:

WordPress Themes