Pages 12-25 of these lecture notes were revised on September 27,2001. These were the last changes, I hope - SR

Lecture 1 (Aug. 30 and 31, 2001)

Discussion of syllabus:

I. Lecture part of the course:

1. Buy a textbook: John Taylor's Error Analysis and do the reading assignments.
2.Go to six lectures and be prepared for the exam at the seventh lecture. Then no more lectures.
3. Lecture homework is given out at lectures 2 through 6 and due Wednesday afternoon at 5:00 pm the next week in your TAs mailbox in the lab, G2B66

3. Go to lab.

  1. Prepare for the lab by turning in a "prelab" (lab homework). These are questions at the end of the written lab exercises that you will find stacked in G2B66.
  2. Turn in a printed out lab report about every two weeks. You should have done 7 lab reports at the end of the semester.
  3. The first week you will learn Mathcad which is the program that is used for creating the lab reports.

Prerequisites:
1. Some physics. This course presumes that you are familiar with physical quantities like meters/second.
2. Some calculus. We will ask you to take derivatives of polynomial expressions  and trig functions. For example, find (d/dx)(5 + x1/2) or (d/dy)[cos (3x+5y)].

Reading assignment: Chapters 1 and 2
Chapters 2.5-2.9 are not covered in lecture but are useful reading nevertheless.
We skip in lecture chapter 2.5 and the uncertainty provisional rule on p. 23.

The web page contains the syllabus, class schedule, TA email addresses, lecture notes, and other goodies.

Professor's props for the first lecture:

  1. A clear plastic ruler and a length of string.
  2. 4.000 + 0.001 hard-boiled eggs, exactly.

Chapter 1: Preliminaries

Difference between a mistake and uncertainty:

Mistake: I read the ruler wrong and got 2.3 cm, it should have been 12.3 cm.
Uncertainty: I got 2.3 cm, my lab partner got 2.4 cm. I measured it again and got 2.5 cm.

The word "error" is used in this course to mean uncertainty, not a mistake.

Why does anyone care?

At 31 F your pipes (or crops) freeze, at 33 F they're OK.
At 6 ft. 4 in. you can be an astronaut, at 6 ft. 5 in. you're too tall.
If my blood alcohol is 0.09% no sober person will ride with me, if it's 0.10%, I could sit 5 days in jail.

Most of the time, we will use metric units in this course.

Chapter 2: Uncertainties

Chapter 2.1. Absolute and relative uncertainty

Any measurement has an uncertainty.
For example, x = 2.123 + 0.005 m means a length between 2.118 m and 2.128 m.
To tell people the uncertainty, we write x = xbest + dx.

dx is the called the absolute uncertainty. It probably has units like m or mm.
dx / x is called the relative uncertainty. It has no units.
dx / x = 0.005 m / 2.123 m = 0.002, which is the same as 0.2%.
Reminder: to get percent multiply the number by 100.

Exception:
There is no uncertainty when counting small numbers of individuals: 2 eggs, 18 students.
These numbers are exact.
2.000+0.001 hard boiled eggs makes no sense.

Chapter 2.2: Significant figures

Significant figures = the number of digits in the number that we are sure of.
Standard practice:

8, means 8, not 7 and not 9. We have one significant figure.
8.1 means 8.1, not 8.0 and not 8.2. We have two significant figures.
8.00 means 8.00, not 7.99 and not 8.01. We have three significant figures.
8.000 means 8.000, not 7.999 and not 8.001. We have four significant figures.

It is standard practice to
1. Write the number of digits you know, and not more or less.
2. Assume that the uncertainty is +1 in the last significant place unless a different uncertainty is stated.

There is a problem with the significance of zeroes at the end of big numbers:

1955 Colorado population: 1,480,000.
1955 was not a census year so this has to be a guess.
This probably means more than 1,475,000 and less than 1,485,000.
Don't count the zeroes on the right side of big numbers in figuring significant digits.
There are only three significant figures (the 1, 4, and 8).
Do count the zeroes on the right side of the decimal point, however.
8.000 has four significant digits.
They are not needed to write 8, so they were put there to indicate just how small the uncertainty is. If I write 1,480,000 I am forced to put the zeroes at the end in order to write the number, even though they aren't significant.

There is a problem with the significance of zeroes at the beginning of small numbers:

25 millimeters has two significant figures.
25 millimeters = 2.5 centimeters = 0.025 meters = 0.000025 kilometers
0.025 meters has two significant figures also. It is the same measurement.
Don't count the zeroes on the left of decimal numbers like 0.000025.

Standard writing style:

Put a zero before a decimal point so it can't be confused with a period.
0.025 is good writing style and .025 is not.

Scientific notation solves the problem with zeroes:
A quick reminder of scientific notation:

160 = 1.6 x 102

0.016 = 1.6 x 10-2

16,000 = 1.6 x 104

0.00016 = 1.6 x 10-4

Expressing uncertainties in scientific notation:

Avogadro's number: 6.0235 x 1023 + 0.0001 x 1023 is awkward
It's easier to write: (6.0235 + 0.0001) x 1023.
1955 population of Colorado: (1.48 + 0.01) x 106.
Scientific notation solves the problem with zeroes at the beginning and end!

Writing uncertainties that are not + 1 significant figure:

6.3 + 0.2 is the way to write uncertainties that are not +1 significant figure.
6.3 + 0.2397 makes no sense.
Uncertainties are never more accurately known than the measurement.

Standard practice is to round uncertainties to one significant figure.
6.3 + 0.2 does make sense.
Occasionally, you may know uncertainty to two decimal places if you have kept careful track of your errors.
6.33 + 0.35 is ok if you really calculated the error to two decimal places.

Lecture 2: (Sept. 6 & 7, 2001)

Reading assignment: Chapter 3

Concepts covered so far:

Absolute and relative uncertainty
Significant figures
Scientific notation
Writing uncertainty
Round off uncertainty to one significant figure (usually)
Standard notation: (3.0+0.1) x 108 m/s

Professors props:

Digital and analog voltmeters on a viewgraph
Meter stick
A meteorite 4,570,000,032 years old. (4.57 billion years when it fell in 1969).

Chapters 2.3 and 2.4: Did I get the right answer?

If there is an accepted value for an answer, the answer you get should lie in the range of values between x + dx and x – dx about 70% of the time and between x - 2 dx and x + 2 dx about 95% of the time.
We will learn where these percentages come from later in Chapter 5. If there are many measurements, some will be too big and others too small and the errors, if random, will tend to cancel.
But, recall that some times when you toss a coin randomly, you will get four heads in a row. You expect half the tosses to yield heads. This also happens with errors in measurement. Occasionally, errors are all in the same direction and your answer in that case is way too big. That doesn't happen often, but it does happen. And it doesn't mean that you were "wrong", just unlucky.

Example:
Known average length of eggs = 60 mm. John's answer is (59 + 2) mm.
His range includes the known value so it is "right."
Mary got 59 + 1 mm. This is less uncertain so it's a better answer, and is also right.
Monica got 60 + 3 mm. Is that better?
Bill got 58 + 1 mm. Is he "wrong"? This differs from the known answer by 2dx.
Conclusion? "right" and "wrong" are difficult concepts to apply.

When is the answer in my lab report correct?

In Physics 1140, if the known answer lies within the range you find, you should comment that it does and pat yourself on the back.

If the answer lies outside the range permitted by your error, you should say that it lies outside and write a few sentences saying what you think went wrong. No points are counted off for being "wrong" by an amount that is a few times what you expect.

If the accepted answer for the speed of sound is 340 m/s and you get 340,000 m/s, you forgot to convert millimeters to meters and should go back and fix it.

If you get (320 + 10) m/s and should have gotten 340 m/s, then your answer is not as good as you expected, but this is not bad enough to make the measurements again. If you got 390+10 m/s, you should spend time finding out what went wrong and fix it.

If you get 393.57+0.0336 for the speed of sound, you have made five mistakes.
1) You forgot the units of meters/second.
2) Your answer far from the expected value of 340 m/s.
3) The number of significant digits in the answer and in the uncertainty is not the same.
4) Your uncertainty is too small. We can't measure anything that accurately (to 0.01%) in our lab. Mathcad and your calculator may give you too many digits!
5) The uncertainty is never known to three significant figures.

Classification of error into two kinds:

1. Random error, just as likely to be + as -.

Example:
I average 100 measurements of the length of an egg and get 59 mm.
Any one egg may be bigger or smaller than this with equal likelihood.

2. Systematic error, an error in one direction

Example:
My plastic ruler sits in the hot sun and stretches. All future measurements are too small.
(A 10 cm object looks like 9 cm on the stretched ruler.)

Effect of averaging many measurements:

An average of many measurements will reduce random error because the errors tend to cancel, since some are + and some are -.
Averaging many measurements with systematic errors (using the same bad ruler over and over) will not improve the error.

Question: If your gas gauge reads too high, are you more or less likely to run out of gas?
Hint: this is a case where more is less.

SystematicAndRandom.gif

Here we begin Chapter 3 of textbook:

Errors in calculations with uncertain numbers: Propagation of uncertainty

Three ways of finding uncertainty:

I. For a quantity measured directly (length, voltage, etc.)

Read the scale on the instrument.

The uncertainty is usually +1 scale division or one significant digit.
Example: I can easily read a ruler to the nearest millimeter. With care, to 0.5 mm.

For an old fashioned voltmeter with a hand that moves, again you can read to + one scale division.

 

 

In the lab, assume that the uncertainty in a digital voltmeter is 1% of the reading OR +1 in the last decimal place on the meter, depending on which is bigger.

So 30.8 volts on the meter is 30.8 + 0.03 V because 1% is 0.03 which is bigger than 1 digit which is 0.01.

However, 0.05 volts has an uncertainty of 0.01 volts and is 20% uncertain, not 1%!

 

II. Uncertainty of a quantity calculated from measured quantities (lectures 2 and 3)

What is the speed (and its uncertainty) of a turtle that travels (200 + 3) cm in (100 + 3) sec?
Recall that speed = length / time. Looks like the best value is 2 cm / sec.
But what is the uncertainty? It's not obvious.
For calculated quantities, there are rules for calculating the error in the result.
These rules are called "propagation of error" rules. We cover these next in lecture.

III. Statistical uncertainty (to be covered later in Lectures 4 and 5)

Statistics applies to quantities measured many times.
The random error is reduced by repeated measurements.
This will be covered later.

Review of concepts covered so far today:

There is a 70% probability that the range of values x + dx contains the true value of x, and a 95% probability that x + dx contains the true value.

"Is my answer right?" is the wrong question. The right question is what is the probability that I am right within my estimated error.

Random error is equally likely to be plus or minus, but systematic error is usually in one direction (such as when the bathroom scales are off in your favor.)

Errors in directly measured quantities (lengths, masses, times) are determined by our measuring technique.
Errors in calculated results are determined by combining the errors in the quantities used in calculation.
Errors in repeated measurements are statistical and have special properties.

Propagation of error in calculations

Simple rules of thumb (not covered in lecture):

Rule 1. In sums and differences, the absolute error is determined roughly by the absolute error in the sloppiest measurement:

x = y + z,
y = 2.0 + 0.1,
z = 5.000 + 0.001,
then z = 7.0 + 0.1.

The error in z is too small to add significantly.

Rule 2a. In multiplication and division, the fractional error in the answer is roughly determined by the fractional error in the sloppiest measurement.

z = x y
x is known to 1%.
y is known to 10%.
Then z is known to about 10%.

Rule 2b. In multiplication and division, the number of significant digits in the answer is roughly determined by the number in the sloppiest measurement.

z = x y,
x = 2.0,
y = 1.234,
then z = 2.5 (not 2.468)

The above rules of thumb are approximate. They are very nearly correct when one error is much bigger than the other.
Next we learn how to accurately calculate the uncertainty.

I. Errors in sums and differences (Chapter 3.5)

Example: What is Charlie's height measured with a meter stick?

Floor to Charlie's belt: x + dx = (1.00 + 0.02) m
Charlie's belt to his head: y + dy = (0.78 + 0.02) m

First guess: x + y = (1.78 + 0.04) m (guess only)

This overestimates the error since one error may be positive and the other negative.
The correct answer for the error in z = x + y, when the errors are random, is

the square root of the sum of the squares of the errors in x and y. In equation form:
..........____________
dz = Ö (dx)2 + (dy)2 .

This formula applies also for a difference z = x – y.
This kind of sum in mathematics is called a "sum in quadrature".

And it works for more complicated sums and differences: For z = a + b – c – d,
the uncertainty is the square root of the sum [ (da)2 + (db)2 + (dc)2 + (dd)2 ].

Apply this to the example above: (dz)2 = (0.02)2 + (0.02)2 = 0.0008, then dz = 0.028 m.
Now I round off 0.028 m to 0.03 m, and say Charlie's height is (1.78 + 0.03) m.
This is smaller than the 0.04 I get from adding dx and dy.

Example: z = x - y
x = 21 + 4
y = 45 + 3
(dz)2 = 32 + 42 = 52
z = -24, then
z = -24 + 5

Example: z = x - y +2
The two at the end increases the answer by 2 but does not increase the uncertainty because numerical constants have no uncertainty.

II. Errors in multiplication and division

Suppose z = x y.
The relative uncertainty in x is dx / x.
The relative uncertainty in y is dy /y.
The relative uncertainties "add in quadrature" which means

(dz/z)2 = (dx/x)2 + (dy/y)2 .

Example: x = 6 + 1, y = 12 + 2 then (dz/z)2 = (1/6)2 + (2/12)2 = 2 / 36 and dz/z = 0.23,

and then z = 72 + 17. because 0.23 x 72 is 17 (when rounded).

You could round off the 17 to 20 (one significant digit), then, to be consistent, you should round off 72 to 70. Then the answer is 70 + 20. Is this better? Standard practice varies on this issue.

The same rule, addition in quadrature of relative uncertainties holds for division as well.

If z = x / y, then (dz/z)2 = (dx/x)2 + (dy/y)2 .
Rule of thumb: if one uncertainty is much bigger than the other (4 times, for example), the final uncertainty is going to be approximately the same as the biggest relative uncertainty.

How about z = (w x) / y? If I apply the rules in succession I find that::

(dz/z)2 = (dw/w)2 + (dx/x)2 + (dy/y)2 .

III. Mixed addition and subtraction with multiplication or division:

Given z = r/(s-t). What is dz in terms of dr, ds and dt.
Apply the addition/subtraction rule to (s-t). It helps to call it w. Then z = r/w and
w = s-t so use the addition rule to get (dw)2 = (ds)2 + (dt)2
Then the final answer comes from the division rule. (dz/z)2 = (dr/r)2 + (dw/w)2
And don't forget to multiply dz/z by z to get dz.

IV. Rule regarding exact numbers

Example: if the length of one egg is 59 + 1 mm, then the length of two eggs is 118 + 2 mm. In other words,  2 (x + dx) = 2 x + 2 dx.

The rule: Multiplication or division by an exact number does not change the relative uncertainty.

The other rule: Addition or subtraction of exact numbers does not change the absolute uncertainty.

Concepts covered since the last review

In addition and subtraction, the error in the result is (dz)2 = (dx)2 + (dy)2 .

In multiplication and division, the error in the result is (dz/z)2 = (dx/x)2 + (dy/y)2 .

Addition or subtraction of a constant does not contribute to the uncertainty of the result. Constants which multiply or divide do not change the relative uncertainty of the result.

For formulas where more than one rule applies such as (r+st)/u, break the formula into smaller parts to which only one rule applies (x = st, y = r+x, z = y/u).  

Lecture 3 (Sept. 15 & 16, 2001)

Finding errors in complicated functions using derivatives

Consider a mathematical function y = f(x)

 (DerivativeExample.gif)

By inspecting the graph, we see that as x moves from x to x + dx, then y moves from y to y + dy.

We know some basic calculus, so we know dy = f(x + dx) – f(x) = dx [df(x) / dx].

The change in f(y) is biggest where the slope (derivative) is the biggest. So dy is bigger in the upper right corner than in the lower left corner.

Example 1:
y = x3
x = 2 + 0.1. [5% uncertain] What is y + dy?

y = 23 = 8. dy/dx = 3x2 , or dy = 3x2 dx. (note that dx is the value we plug in for dx).

Then dy = 3 * 4 * 0.1 = 1.2.
So y = 8.0 + 1.2. [15% uncertain].

Looks like uncertainty tripled! I can prove this. dy / y = 3 x2 dx / x3 = 3 dx / x.

Example 2: y = sin x

 (SineCurve.gif)
x near the origin: x = 0 + 3 degrees = 0 + 0.05 radians, and sin x = 0.0
dy/dx = cos x = 1.0 when x = 0.
dy = (dy/dx) dx = 0.05
So y = 0.0 + 0.05

Example 3: y = sin x again, except:
x near the peak of the curve: x = 90 + 3 degrees = 1.57 + 0.05 radians, and sin x = 1.0
dy/dx = cos x = 0 when x = p/2.
dy = (dy/dx) dx = 0.0
So y = 1.0 + 0. Uncertainty is zero because sin x = 1.0 at 90 degrees and at small angles on either side.
Check this: sin 93 degrees = sin 1.62 radians = 0.999 and sin 1.52 radians = 0.999.

Example 4: y = 3 sin (2x). x = 2.0 + 0.1

dy /dx = 3 [2 cos (2x)] = 6 cos (2x). Then dy = 6 (cos 4) (0.1) = -0.39.

Note: I use angles in radian, not in degrees. So if you repeat this on your calculator, be sure it's set for radian angles.

Example 5: Functions of more than one variable

Let z = f(x,y) = x / y2
x = 2.0 + 0.1
y = 0.5 + 0.1

Note: this is just like the L/T2 that occurs in the pendulum lab, only different: L is x and T is y.

Rule: the relative uncertainties from dx and dy are found from the derivatives with respect to x and y and they add in quadrature.

The uncertainty from x alone: dz = (df/dx) dx

The uncertainty from y alone: dz = (df/dy) dy

The rule, again: The combined uncertainty: (dz)2 = [(df/dx) dx ]2 + [(df/dy) dy]2

df/dx = 1/ y2. So (df/dx) dx = 4 (0.1) = 0.4

df/dy = -2 x / y3. So (df/dy) dy = - [(2*2) / (0.125)] 0.1 = -3.2

Now I add in quadrature, but I note that the df/dx term is actually much smaller than the df/dy term so the answer is about the same as using df/dy alone.

(dz) = 3.2. Then z = 8 + 3.2 .

This is 40% uncertain! I could have guessed that because y is 20% uncertain and then y2, which is part of f(x,y), is 40% uncertain because a square has twice the relative uncertainty of the number that is squared.

Summary of what we just learned:

For the error in f(x) when dx is known, the absolute error is [df/dx] dx.

Statistical Uncertainty (Chapter 4)

This section is about random error, not systematic error that might occur from an instrument that reads incorrectly.

Professors props:

A viewgraph of a bullseye and an overlay of "shots" as in Figure 4.1 of text

The new big ideas:

I. Mean value: the value obtained by averaging
II. Standard deviation: a measure of how much the values differ from one another
III. Standard deviation of the mean: the uncertainty in the average value of many measurements of the same thing

How do I know if the error is random or systematic?

If there is a known correct answer, and your answer is too high or too low by about four or more times the expected error, it is very likely that you have systematic errors.
An example is the archery target in textbook Fig. 4.1. If your arrows consistently land high, low, left, or right of the bullseye, this is a systematic error and you need to correct your aim. If your arrows are distributed randomly around the center, then your errors are random.

Calibration as a means of finding systematic error:

Suppose you don't know what the answer should be. Then you don't know for sure if the error is random or systematic. One way to find if there is systematic error is to use your instruments to measure another quantity that is known precisely. This, of course, is calibration of your instruments. If your instruments show no systematic error measuring the known quantity, then you can say that you have no systematic error from your instruments. In this case, the errors in your measurement are random.
Of course, if you calibrate your digital voltmeter with new batteries, and the batteries start running down, the calibration could change. Elimination of systematic error is a tricky business. Does the meter read "high" on warm days and "low" on cold days?
If there is no way to check the instruments, you just don't know whether or not there is systematic error.

Statistical analysis of random error

I. The mean

The mean value of a quantity measured many times is simply the arithmetic average that we are familiar with.

If we measure the length of 12 different eggs, we learn something about the population of eggs.

If we measure the same egg 12 times, we learn something about the repeatability of our measurements.

The mathematical way of writing of the mean of N measurements of xi:

xmean = (x1 + x2 + … + xN) / N = [ Si xi ] / N.

Example: the measurements are: 71, 72, 72, 73, 71
The sum is 359, and divide this by 5 to get a mean value of 71.8.

Lecture 4 (Sept. 20 & 21, 2001)

Statistical analysis (Chap. 4) continued:

II. Standard deviation of the sample (the variance)

In my example above (71, 72, 72, 73, 71), the average is 71.8
The difference between these numbers and 71.8, the deviations, are -0.8, 0.2, 0.2, 1.2, and -0.8.
How much are the meaurements off by, on average? Well if I average the deviations I get zero! Always, because the average is in the middle by definition. So I need to find a way to find the average deviation that doesn't come up with zero.
If a number is squared, the sign is lost, so the sum of the squares will prevent the numbers from tending to cancel.
We find the standard deviation from the square root of the sum of the squares of the individual deviations, dividing by N-1 before taking the square root.

The mathematical expression for the standard deviation, s:
….…___________________
s = Ö S (xi - xmean)2 / (N-1)

where the xi are the individual meaurements and xmean is the mean value of the measurements.
The square root may not reproduce well on a web page, another way to write this is

s2 = S (xi - xmean)2 / (N-1)

Note: the upper case Greek S (sigma) is the summation sign and the lower case Greek s is the standard deviation. In MathCad, use s or S, then use Ctrl-G to convert to Greek.

Example:
The standard deviation of (71, 72, 72, 73, 71) is obtained by finding the deviations of these numbers from the average 71.8, summing the squares of the deviations, dividing by 4 to get 0.7, then taking the square root to get 0.84 which I can round to 0.8.
Then the measurements can be written: 71.8 + 0.8.

Question for you: Do all the measurements lie between 71.8 + 0.8 and 71.8 - 0.8?

In some textbooks you may see a different definition of standard deviation that substitutes division by N for division by N-1 (on page 99 of our textbook for example). We are using the definition on p. 100 and on the inside of the cover.  The reason to use N-1 and not N is mathematically very subtle and has to do with "degrees of freedom." If you know the average of N samples, and the value of N-1 of the samples, you can calculate the value of the missing sample. So only N-1 of the samples are random.

III. Standard deviation of the mean, smean

When I average many measurements, the random errors tend to cancel, since by definition, random errors tend to be equally plus and minus. The following we will use without proof:

Rule for finding standard deviation of the mean:
The uncertainty in the average of N measurements of the same thing (the standard deviation of the mean) is smaller than the standard deviation of the sample (the N measurements) by a factor of 1 / Ö N.

smean = s / Ö N.

For our measurements above (71, 72, 72, 73, 71), the standard deviation of the mean is 0.44 / Ö 5 = 0.375.
So my repeated measurements of a value close to 72 have yielded 71.8 with a standard deviation of the mean of 0.37.

What does this mean?

If I make the measurement 36 times, the uncertainty in the average of these measurements (smean) is smaller by a factor of 6 than the standard deviation (s) calculated for the numbers that I averaged.

Quick review:

For a quantity measured many times, the "best" value is the mean of the measurements.

The standard deviation s describes the variation about the mean value.

The uncertainty in the mean value is actually less than the variation and is the standard deviation of the mean, smean = s / Ö N.

Example: textbook problem 4.7

What are xmean, s and smean for these Geiger counter measurements: 16, 21, 12, 13, 15

The mean is xmean = 77/5 = 15.4

The standard deviation is: (0.6)2 + (5.6)2 + (3.4)2 + (2.4)2 + (0.4)2 = 49.2,
divide by N-1 = 4, get 12.3, Ö12.3 = 3.5, so s = 3.5

smean = s / Ö N = 3.5/ Ö 5 = 1.56

So the way to write this is:
x +
dx = xmean + smean = 15.4 + 1.56
which should be rounded to 15 + 2 if it goes in a lab report.

A very subtle point (optional material not covered in lecture):

If one egg is 59 + 2 mm long, how long are 2 eggs?

Let’s say z = 2 y, then 2 is an exact number, so the relative uncertainty in z is the same as that in y. Here we are multiplying so we use the relative error. The answer is 118 mm + 4 mm. Note that 4/118 = 2/59.

Let’s say, instead that z = y + y. Here we are adding so we use the absolute error dy, adding in quadrature two dy2. The answer for the error is (Ö 2) dy. which gives 118 + 3 mm.

Which is right? Both are, but for different circumstances.
If we take different eggs from a crate, some will be on the plus side and some on the minus side and the measurement errors will tend to cancel so the relative error will go down. This is what we get when we do z = y + y. We get a smaller relative error.
On the other hand, if we clone one egg and measure the length of these two identical eggs, this does not reduce the uncertainty and 59 + 2 mm for one egg becomes 118 + 4 mm for the egg and its duplicate. If the measurement for the initial egg was high, the measurement is high when applied to the duplicate. There is no tendency for the error to cancel. Use z = 2 y for identical eggs and  z = y + y for two different eggs.

Chapter 5: The normal distribution

Professor's props: A bag of beans from Safeway. 90% are white beans and 10% are red beans and a scoop from a box of Tide.

Today's one big idea:
Random events are described by the normal (or Gaussian) distribution.

Random events:

Examples: airplane crashes (not a good example in Fall 2001), coin tosses, counts of cosmic rays on a Geiger counter.
There is a number associated with random events:

Auto crashes per year
Number of coin tosses with "heads"
Counts per minute on the Geiger counter

The rule:
The measured quantity associated with a random event is distributed according to the normal distribution when the number of events becomes very large.

What is very large?
If you toss the coin only once, you don't learn that 50% of the time it's heads and 50% of the time it's tails. You have to toss it many times. So one is not large. A thousand is large. A hundred is pretty good, and ten is kind of iffy. Personally, I would toss the coin 25 times and then quit.

The normal curve:

The formula for the normal curve is: G(x) = [1/sÖ(2p)] e-(x-X)² / 2s²

where x is the variable, X is the mean value of x, and s is the standard deviation. G(x) gives the probability of having the value actually be x. The maximum value of G(x) occurs at the mean value X.

Which in MathCad looks like:

 (NormalCurveEquation.gif)

 

In this formula, the exponential function (e) is dwarfed by all the things around it.
This function "goes like" exp[-y2]. It peaks in the center and falls of away from the center.
y is this case is [x-X]2 / 2s2. The s is the standard deviation which determines the width of the curve.
The X simply shifts the peak over so the curve is centered on the mean value.
For students tossing coins and reporting the percentage of heads, the normal curve predicts the answer, and that looks like:
 (NormalCurve2.gif)

Note that equation 5.25 on page 133 in the textbook is missing the minus sign in the exponent. It appears correctly on the inside of the front cover.

Lecture 5 (Sept. 27 & 28, 2001)

Uncertainty in a random variable:

Rule: the absolute uncertainty in a random variable with value N, is ÖN.

The same rule, restated: the relative uncertainty in a random variable with value N is 1 / ÖN.

I divided ÖN by N and got 1 / ÖN, so these two rules are the same rule.

For now, we will forget trying to prove this.

Example: I toss a coin 100 times. What number should be heads? What is the uncertainty in the number of heads?
If the coin is honest, then there should be 50 heads so N = 50.
Ö N = 7 (about) so N = 50 + 7.
If I did the experiment, I would most likely get a number from 43 to 57 and I might conclude that 50% should be heads.

Example: How many times must I toss the coin to determine that the answer is 50% heads and not 51% or 49%?
I want an answer like 0.500 + 0.005. The uncertainty is one part in 100. So 1 / Ö N is 0.01.
And N = 10,000. That's a lot of tosses so I will ask the teaching assistants to do it and go to Starbucks for coffee.

What is very large (again)?
From the example above, you see that "very large" depends upon how good you want your measurement to be. If you want to know the random variable to 10%, you need 100 measurements and if you want to know to 1% you need 10,000 measurements.

Let's have students toss a coin 100 times. Each reports the percentage of heads. I plot a histogram of the number students reporting 49, 50, 51 heads, etc. It looks like:

(100Tosses.gif)

The curve shows that the most frequent answer is 50, but some sets of tosses gave 45 and others gave 55. The "answer" looks like 50 + 5 or 10% uncertainty. So I could look at the curve and guess that the students tossed the coin about 100 times because 10% uncertainty in a random variable implies a sample size of about 100.

 (10000Tosses.gif)

The mean of the data is 4995 heads in 10,000 tosses. The standard deviation is 46.5 which rounds to about 50.
This is 1% of the answer which is consistent with 10,000 tosses because the uncertainty in N things is about the square root of N.
Most students (68% of them) are getting answers between 4950 and 5050 which is a variation of about 1%. This is much better than the 10% variation (the previous graph) from doing only 50 tosses.
The standard deviation of the mean turns out to be 4.65 which rounds to 5. The mean is 4995 which is based upon all the tosses (1 million of them!). The mean is only off by about 5 tosses because when we add together the results of the 100 students (with some being high and some being low), the answer is closer to the true answer than the result of any student alone.

Normal curve again: let's take the coin toss data above and plot it with the normal curve

Normal…Tosses.gif

The normal curve works! I had to adjust the height of the curve so that the number of tosses comes out to be 15 in the highest bin of the histogram. [The G(x) formula above is designed so that the area under the curve is one, so it doesn't work for 5000 tosses.] Also, I set s = 46.5 and X = 4995, the calculated standard deviation and the mean. So there is no fudge factor in the width.

A comment about "pooled" data (optional material):
Is it better for 25 students to measure 100 times or one student to measure 2500 times?
Case A: 25 students tossing the coin 100 times
I get 25 measurements good to 10% because there are 100 tosses. Now I average these 25 measurements. The standard deviation of the mean is a factor of 5 less because there are 25 measurements. The relative error is only 2%.
Case B: The professor takes the measurements from the students and pretends he made them all himself.
There are 2500 coin tosses. The uncertainty in N is 1 /
Ö 2500 or 2%. It’s the same.
Is there any advantage then to having 25 sets of 100 tosses rather than 2500 tosses?
Yes. Using 25 students helps eliminate systematic errors. A few students may be incompetent and write down heads when they meant tails. One of the coins may have two heads and land on heads 100% of the time. A selection of coins and students tends to eliminate this kind of error.

Does this have anything to do with my lab reports?

Well, yes. You can now see from these examples why it's ok to get (310 + 20) m/s for the speed of sound when the real answer is 340 m/s. The standard deviation is a measure of the average value of the error. For errors made by a class of 200 students doing lab reports, about 1/3 of the errors will be bigger than the average error and 2/3 will be less.

 

Using the normal curve to predict probability:

Let's call the area unde the normal curve 100%. Then 68% of the area is under the part within one standard deviation of the center. That tells us that for measurements with random error, the answer we measure will be within one standard deviation of the true answer 68% of the time. 95% of the area is within 2 standard deviations so 95% of the time our meaurement will be within two standard deviations of the true answer. We can construct a table like the one in Appendix A of the textbook:

How close?

Percent of area under curve:

Percent not included:

1 s

68.3%

31.7%

2 s

95.5%

4.5%

3 s

99.73%

0.27%

4s

99.994%

0.006%

5s

99.99994%

0.00006%

The curve below shows the 68.3% of the area that is within one standard deviation of the mean.
Then 31.7% is in the two dark colored "wings" on the sides. In each of these is about 16% of the area.
So 16% of the time, some random variable will be more than one standard deviation too high.

NormalCurveArea.gif

 

 

How is this useful?

I. Checking results

The known answer for the speed of sound is 340 m/s with essentially no error. You measure 300 m/s and calculate the error in your measurement to be 10 m/s. You are 4s (four standard deviations) too low. There is an 0.006% chance of this. So your error is probably not random. It's probably because you did something wrong or there is a systematic error..

II. Predicting outcomes

You toss a coin 100 times. What is the chance of getting 65 or more heads?
For tossing a coin 100 times, the mean number of heads is 50 and the standard deviation is 5 (see the histogram in the previous lecture). Then 65 heads is three standard deviations higher than the mean. So 99.73% of the time answers will be between 50-15 = 35 and 50+15 = 65. And 0.27% of the time the answer will be below 35 or above 65. Half of this applies to above 65. So you expect more than 65 heads 0.135% of the time.

III. Checking signficance

There were 200 hurricanes in the 1970s and 250 in the 1990s. Is this a significant increase?
[These data are made up. Do not worry.]
First, we assume these occur randomly. It looks like the rate is 20 to 25 per year.
In the 200 hurricanes (call this x), the uncertainty is Ö200 = 14.
In the 250 hurricanes (call this y), the uncercainty is Ö250 = 16.
What is the uncertainty in the increase, z = y - x?
z = 50. The uncertainty is Ö(250+200) = 21. So z = 50 + 21.
The chance of zero increase is what? Well, zero is 50 - 2.5 s. Because 2.5 s = 50 approximately.
The CHART in appendix A says that 2.5 s includes 98.76% of the data, so 1.24% of the data is outside of this, and half of this or 0.62% is in the "wing" on the small side (below 2.5 s below the mean).
So, there is 0.62% of this happening by chance.
That means it probably didn't happen by chance and the increase is statistically significant.

Usually, if someone asks "is this signficicant", they want a "yes" if the answer is more than one standard deviation different from what is expected and a "no" if it is within one standard deviation.

IV. Proving something is wrong or proving something new

If the accepted ratio of the proton mass to the electron mass is 1836, and you find 1831 + 1, you are five standard deviations low and one of these two answers is for sure wrong because the chance of this is negligible. In most fields of science, a five standard deviations difference is considered proof that one of the two numbers is wrong.

Some mathematical proofs:

Random walk (in one dimension):

The random walk is the standard example that is given for a random process. It is simplest to understand in one dimension. A student (drank too much??) takes random steps forward or backward. How much progress is made?

A sober student could model this by flipping a coin. If "heads" she takes one step forward and if "tails", one step backwards. What is the mean distance traveled and the standard deviation?

Let each step be xi, where xi = +1 or -1. These each have 50% probability.

The total distance traveled is the sum of the steps: Xtotal = (x1 + x2 + … + xN) = Si xi .

The mean value is xmean = (x1 + x2 + … + xN) / N = [ Si xi ] / N.

The most probable value for Xtotal and xmean is going to be zero if the number of steps is large, because +1 and -1 occur with equal probability.

What is the most likely value of Xtotal2 ? This is the square of the final position and is always a positive number. So if you take many random walks and average the values for the final position, the average is not zero.

From the definition of X total above, Xtotal2 = x1 (x1 + x2 + … + xN) + x2 (x1 + x2 + … + xN) + etc

All the terms like x1x1, x2x2 (same subscript) are positive and add up to something and all the other terms (different subscripts) could be equally positive or negative and add up to zero. So we get

Xtotal2 = x1x1+ x2x2 + ... + xnxn = N. Then Ö(Xtotal)2= ÖN.

So we just proved that for the random walk of N steps, the place you are most likely to be is the origin, and the uncertainty in the outcome is ÖN.

Why do errors add in quadrature?

This can be seen by adding two random walks. The number of steps in the two walks is N1 and N2.

If I add the two walks, the number of steps is N1 + N2. The uncertainties are dN1 = ÖN1and dN2 = ÖN2.

From our rule for calculating uncertainties in random numbers: d(N1 + N2) = Ö(N1 + N2).

From our rule about calculating the uncertainty in a sum: d(N1 + N2) = Ö[(dN1 )2+ (dN2)2] = Ö(N1 + N2).

So the addition of errors in quadrature gives the same number for the error in N1 + N2 that we get from using the square root of N1 + N2.

This finishes the material for this course!

 Old Lecture 6 (Oct. 11 & 12, 2001)

Lecture 6 is a review for the exam.

Old Lecture 7 (Oct. 18 & 19, 2001)

Lecture 7 is the exam.