Uncertain Observations

What happens when you are uncertain about observations you made? For instance, you remember something happening, but you don’t remember who did it. Or you remember some fact you read on wikipedia, but you don’t know whether it said that hydrogen or helium was used in some chemical process.

How do we take this information into account in the context of Bayes’ rule? First, I’d like to note that there are different ways something could be uncertain. It could be that you observed X, but you don’t remember if it was in state A or state B. Or it could be that you think you observed X in state A, but you aren’t sure.

These are different because in the first case you don’t know whether to concentrate probability mass towards A or B, whereas in the second case you don’t know whether to concentrate probability mass at all.

Fortunately, both cases are pretty straightforward as long as you are careful about using Bayes’ rule. However, today I am going to focus on the latter case. In fact, I will restrict my attention to the following problem:

You have a coin that has some probability \pi of coming up heads. You also know that all flips of this coin are independent. But you don’t know what \pi is. However, you have observed this coin n times in the past. But for each observation, you aren’t completely sure that this was the coin you were observing. In particular, you only assign a probability r_i to your ith observation actually being about this coin. Given this, and the sequence of heads and tails you remember, what is your estimate of \pi?

To use Bayes’ rule, let’s first figure out what we need to condition on. In this case, we need to condition on remembering the sequence of coin flips that we remembered. So we are looking for

p(\pi = \theta | we remember the given sequence of flips),

which is proportional to

p(we remember the given sequence of flips | \pi = \theta) \cdot p(\pi = \theta).

The only thing that the uncertain nature of our observations does is cause there to be multiple ways to eventually land in the set of universes where we remember the sequence of flips; in particular, for any observation we remember, it could have actually happened, or we could have incorrectly remembered it. Thus if \pi = \theta, and we remember the ith coin flip as being heads, then this could happen with probability 1-r_i if we incorrectly remembered a coin flip of heads. In the remaining probability r_i case, it could happen with probability \theta by actually coming up heads. Therefore the probability of us remembering that the ith flip was heads is (1-r_i)+r_i \theta.

A similar computation shows that the probability of us remembering that the ith flip was tails is (1-r_i)+r_i(1-\theta) = 1-r_i\theta.

For convenience of notation, let’s actually split up our remembered flips into those that were heads and those that were tails. The probability of the ith remembered heads being real is h_i, and the probability of the jth remembered tails being real is t_i. There are m heads and n tails. Then we get

\displaystyle p(\pi = \theta | \mathrm{\ our \ memory}) \propto p(\pi = \theta) \cdot \left(\prod_{i=1}^m (1-h_i)+h_i\theta \right) \cdot \left(\prod_{i=1}^n 1-t_i\theta\right).

Note that when we consider values of \theta close to 0, the term from the remembered tails becomes close to 1-\theta raised to the power of the expected number of tails, whereas the term from the remembered heads becomes close to the probability that we incorrectly remembered each of the heads. A similar phenomenon will happen when \theta gets close to 1. This is an instance of a more general phenomenon whereby unlikely observations get “explained away” by whatever means possible.

A Caveat

Applying the above model in practice can be quite tricky. The reason is that your memories are intimately tied to all sorts of events that happen to you; in particular, your assessment of how likely you are to remember an event probably already takes into account how well that event fits into your existing model. So if you saw 100 heads, and then a tails, you would place more weight than normal on your recollection of the tails being incorrect, even though that is the job of the above model. In essence, you are conditioning on your data twice — once intuitively, and once as part of the model. This is bad because it assumes that you made each observation twice as many times as you actually did.

What is interesting, though, is that you can actually compute things like the probability that you incorrectly remembered an event, given the rest of the data, and it will be different from the prior probability. So in addition to a posterior estimate of \pi, you get posterior estimates of the likelihood of each of your recollections. Just be careful not to take these posterior estimates and use them as if they were prior estimates (which, as explained above, is what we are likely to do intuitively).

There are other issues to using this in practice, as well. For instance, if you really want the coin to be fair, or unfair, or be biased in a certain direction, it is very easy to fool yourself into assigning skewed probability estimates towards each of your recollections, thus ending up with a biased answer at the end. It’s not even difficult — if I take a fair coin, and underestimate my recollection of each tails by 20%, and overestimate my recollection of each heads by 20%, then all of a sudden I “have a coin” that is 50% more likely to come up heads than tails.

Fortunately, my intended application of this model will be in a less slippery domain (hopefully). The purpose is to finally answer the question I posed in the last post, which I’ll repeat here for convenience:

Suppose that you have never played a sport before, and you play soccer, and enjoy it. Now suppose instead that you have never played a sport before, and play soccer, and hate it. In the first case, you will think yourself more likely to enjoy other sports in the future, relative to in the second case. Why is this?

Or if you disagree with the premises of the above scenario, simply “If X and Y belong to the same category C, why is it that in certain cases we think it more likely that Y will have attribute A upon observing that X has attribute A?”

In the interest of making my posts shorter, I will leave that until next time, but hopefully I’ll get to it in the next week.


One Response to Uncertain Observations

  1. Pingback: Generalizing Across Categories « Academically Interesting

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: