# Latticework of Mental Models: Bayes Theorem

A statistics professor who travels a lot was concerned about the possibility of a bomb onboard his plane. He determined the probability of this and found it to be low but not low enough for him. So now he always travels with a bomb in his suitcase. He reasons that the probability of two bombs being onboard would be infinitesimal.

Do you think he has really reduced the risk?

Even those who aren’t well versed with the basic concepts of probability can say that the professor’s logic seems absurd.

Well, the bomb riddle is a famous joke among mathematicians. Nevertheless, it’s a thought provoking joke.

A man wakes up in the middle of the night with a splitting headache. He remembers that there are few aspirin bottles in the bathroom. He dizzily stumbles into his bathroom to grab one of the four bottles in the dark and pops a pill from that bottle. An hour later, instead of getting relief from headache, he starts feeling a terrible nausea. Suddenly he realizes that only three of the four bottles in the bathroom contained aspirin and the fourth bottle contained poison.

The tragedy of the situation was that, like poison, aspirin also causes nausea sometimes. The label on the poison bottle said that 75 percent people who take the poison will show the nausea symptoms. Whereas the sticker on the aspirin bottle read that only 10 percent would feel the same symptoms after taking aspirin.

What are the chances that the man took poison pill instead of aspirin?

This question is different from – what’s the probability that man took poison before he started getting nausea? – because here we have an additional piece of information i.e., the man is experiencing nausea.

I guess before you start thinking about the solution, your first comment would be – “A man stupid enough to keep a poison bottle next to aspirin deserves to die like this.” I agree 🙂

But this is just an imaginary case study to illustrate an academic idea in a simplified form. The idea we’re going to explore comes from the field of probability. It’s known as Bayes Theorem.

Bayes theorem is named after the 18th century English minister Thomas Bayes whose essays concerned how we should adjust probabilities when we encounter new data.

## Conditional Probability

Bayes rule helps us calculate the conditional probability which measures the probability of an event, given that another event has occurred. Most of us remember learning this definition and its formula in school and have conveniently forgotten about it. That’s because it’s a text book definition. It doesn’t really add much to our understanding about its utility in real world.

The big idea behind Bayes theorem is that we must continuously update our probability estimates on an as-needed basis.

Instead of looking at the Bayes formula, let’s first try to solve the ‘man with poison pill’ problem using basic concepts of probability. To make it simpler let’s talk in terms of numbers instead of percentages. So let’s extend the problem and assume that there were 400 men going through the same ordeal at the same time on the same fateful night.

Step 1
Since there were 4 bottles in each case, 1 in 4 men is likely to pick up the poison bottle and 3 in 4 men are likely to pick up the aspirin bottle.

• Number of men who took the poison pill = 100
• Number of men who took the aspirin = 300

Step 2
Based on the information provided by medical text on the bottle labels, we can say –

• Number of men who took the poison pill and showed the symptoms = 0.75 X 100 = 75 (75% according to the medical text)
• Number of men who took the aspirin and showed the symptoms = 0.10 X 300 = 30 (10% according to the medical text)

Step 3

• Total number of men who showed the symptoms = 75 + 30 = 105
• Number of men with symptom who actually took the poison pill = 75

Step 4
So the probability of a man taking the poison pill, given that he showed symptoms = 75/105 = 0.71 (71 percent)

Thus, 71 of those nauseate men are likely to die of poisoning. And the answer to the problem is – the man should call the doctor immediately.

To save yourself from doing all these elaborate calculations Pierre-Simon Laplace, a French mathematician and astronomer, translated Bayes idea into following formula –

`P (A/B) = P(B/A) x P(A) / P(B)`
• P(A) = probability of taking the poison pill (This is 0.25 as there are four bottles)
• P(B) = probability of person showing symptoms (This is what we calculated in step 2)
• P(B/A) = probability of symptoms, given that person had taken the poison (This is what we calculated in step 3)
• P(A/B) = probability of taking the poison, given that the person had symptoms (This is the answer to the main question i.e. step 4)

## Using Bayes Formula

Let’s take a look at a realistic case where Bayes rule will help us make a rational decision. The following example is from Daniel Kahneman’s book Thinking Fast and Slow.

A taxi was involved in a hit-and-run accident at night. Two taxi companies, the Green and the Blue, operate in the city. You are given the following data

• 85% of the taxis in the city are Green and 15% are Blue.
• A witness identified the taxi as Blue. The court tested the reliability of the witness under the circumstances that existed on the night of the accident and concluded that the witness correctly identified each one of the two colors 80% of the time and failed 20% of the time.

What is the probability that the taxi involved in the accident was Blue?

If this was a courtroom scene in a Bollywood movie the drama would probably revolve around the argument about witness’s honesty. But for a Bayesian thinker, the answer will be based on a probabilistic outcome. Let’s use Bayes rule here.

P(A) =The probability that the involved taxi was Blue = 0.15

P(B/A) = The probability of witness identifying a taxi as blue, given that the taxi involved indeed was blue.

= 0.8

P(B) = The probability of witness identifying the involved taxi as blue.
= Probability of witness identifying a blue taxi correctly

+ Probability of witness identifying a green taxi incorrectly.
= 0.15 X 0.8 + 0.85 X 0.2 = 0.29

P(A/B) = The probability of a taxi being Blue given that it was involved in the hit and run.

Plugging the values in the Bayes formula we get
P(A/B) = 0.15 X 0.8 / 0.29 = 0.41

Therefore, the confidence level of jury (for blue taxi being the culprit) should be only 41 percent!

No matter how many examples I quote, you can always argue that we don’t really have the luxury of knowing the precise values for probabilities in many cases in real life. How do you put a number on the reliability of the witness’s testimony?

The idea for learning Bayes theorem is not to master the formula but to develop the Bayesian thinking. And how do you do that? We’ll explore that but before we dig deeper let’s first understand the two important ideas on which Bayes theorem rests.

First is the base rate (or prior odds) and the second is likelihood ratio (strength of the new evidence).

In our taxi example, the base rate is the relative population of taxis. In other words, in the absence of witness, the probability of the guilty taxi being Blue (15 percent) is the base rate of that outcome. Similarly, in the ‘man with a poison pill’ example, the base rate was 25 percent (because there were four bottles out of which one was poison). The base rates typically come from historical statistical information.

The eye witness’s reliability to identify the taxi correctly (80 percent) is the likelihood ratio. The likelihood ratio represents new information about a specific case which changes the base rate for that particular case.

Using these two ideas, base rates and likelihood ratio, Daniel Kahneman simplifies the Bayes rule which becomes –

Posterior odds (Conditional Probability) = Prior odds (Base Rate) × Likelihood ratio

Watch this video from Julia Galef which visually explains this idea.

Imagine that a new eyewitness comes out in the taxi case and holds the same opinion as the first witness. In his case he’s able to correctly identify each one of the two colors 70% of the time and failed 30% of the time.

With this new evidence (the new likelihood ratio is 0.7), the probability of involved taxi being blue will change again. By how much?

Use Bayes rule but this time the base rate will not be 15 percent. Now you need to use the new base rate which is 41 percent, calculated as the posterior odds previously. So the new probability (the new posterior odds) will now become (using the same Bayes formula) 61 percent.

## Bayesian Thinking

People who aren’t familiar with Bayesian thinking make two kinds of mistakes. In both cases, the final estimate of the probability will lead to erroneous judgment.

The first is when they completely ignore the base rates and tend to get influenced by the story of single evidence. Base rate neglect is a serious thinking error where people forget about the historical statistical evidence and tend to believe the anecdotal evidence. A simple example is when someone argues – “My grandfather was a lifelong chain smoker and still lived up to a ripe age of 90 years. So don’t tell me that smoking kills.” He’s ignoring the fact that statistically a chain smoker isn’t likely to live a long life. This bias is sometimes referred as ‘I know a man syndrome’. The reason for this bias is our love for stories.

How does an investor avoid this mistake of base rate neglect in picking stocks? You should know the historical success rates in different industries and businesses. IPOs have been known to lose money for investors historically. Companies where the integrity of the management is questionable or where the balance sheet is highly leveraged, you have to find out the base rate of success in such situations.

The second mistake is when we underestimate the strength of the new evidence (or a compelling story) and fail to update the base rates. Some of the world’s best thinkers change their mind in the light of new evidence. John Maynard Keynes reminds us –

When facts change I change my mind. What do you do sir?

Prof. Bakshi explains this beautifully in his talk Worldly Wisdom in an Equation

For investors investigating a specific opportunity, a genuinely good story improves the likelihood ratio which then translates into higher posterior odds…When it comes to narratives, it’s important to recognize that underneath every great compounding machine, there is a compelling story which makes it different from the rest of the crowd. Usually, that story is about an extraordinary individual or a group of such individuals who have demonstrated capabilities of creating value even in those businesses where it’s hard to create a lot of value. Charlie Munger likes to call such individuals intelligent fanatics.

Sometimes, information specific to the situation is so powerful that it should force you to change your mind. But how do you know that the facts are strong enough and relevant enough to invite a change in your mind? In fact, that’s one of the key problems faced by investors – how to respond to new information?

You need to learn to differentiate between noise and signal. Essentially you should avoid news because most news doesn’t just distract but it can be toxic. Prof. Bakshi writes –

How does one go about teasing subtle signals noisy news flows? To observe, one has to first quieten the mind. And then one has to look for slow gradual changes that are taking place. The way to do that is focus on long-term changes and not quarterly changes. These could be changes in the quality of the balance sheet, the earnings statement and the cash flow statement. And those changes should be related to a qualitative analysis of the reasons. Often such analysis creates unique insights.

## Where Bayes Fails

Besides seeing the world as an ever shifting array of probabilities, we must also remember the limitations of inductive reasoning. There are certain situations where progressive small evidences don’t reveal the underlying risk.

Nassim Taleb, in his book The Black Swan, writes about the Turkey problem –

Consider a turkey that is fed everyday. Every single feeding will firm up the bird’s belief that it is the general rule of life to be fed everyday by friendly members of the human race “looking out for its best interests,” as a politician would say. On the afternoon of the Wednesday before Thanksgiving, something unexpected will happen to the turkey. It will incur a revision of belief.

This is what happened during 2008 financial crisis. Such extreme events, which are not only high impact but are hard to predict, have been termed as Black Swan events by Taleb.

Just because it fails to predict the black swan events doesn’t mean that one should abandon the Bayesian thinking. The way to handle black swan is to limit your exposure to it i.e. building a margin of safety.

## Conclusion

Developing the habit of Bayesian thinking is an admirable quality. In fact, thinking like a Bayesian is a way of life. If you learn it and practice it, it will change you in many ways.

Prof. Bakshi summarizes it brilliantly –

Basically, what Bayes Rule tells you is to be a bit less prejudiced. You may have a prejudice against family owned businesses, or Hyderabad companies, or Delhi based companies or turnaround situations or highly leveraged businesses or holding companies etc. That prejudice is reflected in your prior odds. At the same time, however, you should recognize the possibility that this particular business which you are evaluating could be different from the statistical class to which it belongs.

Charlie Munger says –

If you don’t get this elementary, but mildly unnatural, mathematics of elementary probability into your repertoire, then you go through a long life like a one-legged man in an ass-kicking contest. You’re giving a huge advantage to everybody else.

I think that statement from Munger should be more than enough to motivate you to learn more about probability and Bayesian thinking.

Take care and keep learning.

Anshul Khare worked for 12+ years as a Software Architect. He is an avid learner and enjoys reading about human behaviour and multidisciplinary thinking. You can connect with Anshul on Twitter.

1. Amazing post which explains the Bayesian theory quite beautifully. The connect of Bayesian theory to the world of investing has been done in a very lucid manner. Thanks for the wonderful explanation.

Bayesian theory is useful for revisiting and revising the assumptions underlying any decision in light of more data becoming available.

Since the goodness of any decision is primarily based on the assumptions made to arrive at it, use of Bayesian approach ensures appropriate course corrections can be determined whenever there are any changes to the assumptions.

• Anshul Khare says:

2. Nirint says:

Is the following calculation correct?
P(A/B) = 0.15 X 0.12 / 0.29 = 0.41

0.12/0.29 = 0.41
0.15 X 0.41 = 0.0615

Is my understanding correct?

• Anshul Khare says:

Thanks for pointing out the error Nirint.

I have corrected the mistake. P(B/A) shouldn’t be 0.15 *0.8. It should just be 0.8. I have updated the definition of each event to make it clearer.