Lesson 2: Empirical Probabilities and Probability Trees

Links

Theory

At the end of the last lesson, we learned how to calculate the probability of some event in a sample space by dividing the number of outcomes in the event by the number of outcomes in the sample space. What we calculated is technically called the theoretical probability, because we based it on nothing but mathematical reasoning. Assuming that each outcome had the same individual probability, all we needed to know was how many outcomes there were in total. Unfortunately, not every chance experiment in the real world will be this easy. Imagine a simple carnival game where you throw baseballs at a target. How can you possibly count all of the possible outcomes? There are an infinite number of points where the ball can hit, an infinite number of ways it can fly through the air... We need another way to calculate probability numbers for situations like this, if we're going to use mathematics to think about them.

Practice

Empirical (or experimental) probabilities are our answer. Instead of starting from nothing, and trying to predict what will happen, we can simply perform the chance experiment over and over and observe what actually does happen. Assuming nothing changes, we can still use the results of our experiments to calculate a probability number that we can use just like a theoretical one.

As an example of a simple experiment, let's consider coin flips again. While writing this lesson, Peter took out a quarter and flipped it fifteen times, and got these results:

Coin #HeadsTails
1:H
2:H
3:T
4:T
5:T
6:H
7:H
8:T
9:H
10:T
11:H
12:H
13:T
14:T
15:H
Totals:87

Calculating empirical probabilities from these results works a lot like calculating theoretical probabilities, except instead of looking at the number of outcomes in the event and the number in the sample space, we look at the number of times we ran the experiment, and the number of times the event occured:

(Times the event occured) / (Times we ran the experiment)

So in this experiment, the probabilities are:

P(H) = 8/15 ~= 0.533 ; P(T) = 7/15 ~= 0.467

Sample sizes, error, and statistical power

You might notice something funny about these probabilities, though: they're wrong! Unless there's something wrong with the coin, shouldn't each side have a probability of exactly ½? This is the most important problem with empirical probabilities: they won't give you the exact, correct answer. In fact, with the experiment Peter did, it's absolutely impossible to get exactly 0.5, because 15 is an odd number: no matter what, one of the two outcomes will occur at least one more time than the other one. This doesn't mean empirical probabilities aren't useful — in fact they're often the only way we can calculate probability numbers for a chance experiment — but it does mean that we have to be careful to remember that the number we're working with has some error built in. Error, in statistics, is simply the amount by which a measurement, such as an empirical probability, differs from the real, exact value. [1] In this case, because there are only two outcomes, each one has the same amount of error, but in opposite directions:

E(H) = 8/15 - 1/2 = 16/30 - 15/30 = 1/30 ~= 0.033 ; E(T) = 7/15 - 1/2 = 14/30 - 15/30 = -1/30 ~= -0.33

Statistical power is a very deep mathematical concept on its own, but all you need to know for now is that it's the likelihood (the probability, in fact!) that some prediction or conclusion we make with statistics (usually that we have estimated some number with no more that a certain amount of error) is correct. [2] The exact numbers aren't important for now — its enough to realize that more statistical power is good, and error is bad. We want our numbers to be as close to right as possible, and we want to be as sure of that as we can possibly be. Fortunately, there's only one thing we need to do to get more statistical power, and less error: more tests!

Sample size (not to be confused with the sample space!) is the number of tests we've run, or samples we've taken, before calculating our empirical probability. For the very simple kinds of statistics you'll be using to test your carnival game, all you really need to know is that the larger your sample size is, the closer your empirical probability calculations will be to the theoretical probability. More tests means we will have a smaller error, and we will have more statistical power to back up the claim that our empirical probability is good enough to use.

Review

Let's go back over what we learned in this lesson: