Bayes’ Theorem and Intelligence

Beliefs are based on probabilistic information. Bayes Theorem says that our intial beliefs are updated to to posterior beliefs after observing new conditions.

This is highly subjective, and is somewhat controversial compared to more objective probability theories in statistics. Bayes Rule states that our initial beliefs have a high margin of error. As we observe more conditional events, we grow more certain of the probability. This is the tool we use to measure incomplete knowledge and uncertainty. It’s part of the inductive process of learning.

I simply want to make the point that Intelligence analysis uses Bayes’ Theorem and is very subjective. I hope to clarify the some misunderstandings about how probability estimates are determined. The results are often very counterintuitive.

The prior belief is updated to a posterior belief after the observation of conditional events. This is a variant of the Bayes Rule formula, where P is probability, C is the conditional event, and O is the observation, and ¬ is not.

p(C|O) = p(O|C)p(C)/p(O|C)p(C) + p(O|¬C)p(¬C)

In English, the posterior belief is equal to the prior belief divided by the marginal probability.

Bayesian probability produces interesting results because it accounts for uncertainties created by False Positives and False Negatives.

As we accumulate more evidence, we become more certain. Say we see two possible events: A and B. Without knowing their actual frequency, we assign a 50% probability of either happening. We then see A occur. The probability of A occuring is now 66.6%. We see A occur again. The probability of A is now 75%. If we start see A constantly, but never seen B so far, we end up with a 99% probabilty of A. But we can not rule out B simply because we have not seen it yet.

There is a case of this occurring. Europeans believed that swans were always White and there could be no Black Swans. They updated their probabilty of a Swan being White to 99% based on their limited experiences. As they explored the world, they found Black Swans in Australia. This reduced the probability of a swan being white and increased the probability of a swan being black. This process of inductive reasoning can be explained via Bayesian probability.

Eliezer Yudkowsky offers an excellent description of the mathematics behind Bayes and explains the results.

Let’s make a simple test example. There are 10,000 civilians. 1% of whom are insurgents pretending to be civilians. Police can investigate individuals and determine if they are an insurgent or civilian with 95% certainty.

Prior Probability is this: 0.01 (10,000) and 0.99(10,000). So
Group 1: 100 insurgents
Group 2: 9,900 Civilians

The Police investigate the entire population. This produces four groups:
Group 1: Insurgents – Positive test (0.95)
Group 2: Insurgents – False Negative test (0.05)
Group 3: Civilians – False Postive test (0.05)
Group 4: Civilians – Negative test (0.95)

How certain are the police that the men they captured are actually insurgents? The answer is 16%.
(0.95 x 0.01)/ (0.95 x 0.01) + (0.05 x 0.99) =
0.0095/0.0590 = 0.161

The result is very counterintuitive. This is because of the high level of uncertainty created by false negatives and false positives. If police investigate the entire population, up to 6% of the population will test positive for being insurgents. But we know only 1% of them can be insurgents and the others are innocent. We also know some insurgents may have escaped detection. Thus the 16% certainty.

The “true positive” rate of a test is deceptive. It assumes an even 50% probability of a person being an insurgent or civilian. If half the population are insurgents, then the certainty of an accurate test really is 95%. But if only 1% of the population are insurgents, then the certainty 16%.

Say there are two investigations of the community. So there are independent tests 1 and 2. Individually, the produce only 16% certainty. But combined they produced 78.26% certainty. Consider the overlap between the two tests. That narrows the field through naive Bayesian assumptions. It’s naive because it assumes the results of one test do not affect the probability of the second test.

The only way to make the test more accurate is to repeat the test again and again. More complex Bayesian reasoning updates the probabilities after every test. This is why Police carefully investigate suspects many times, and occassionally widen their field of suspects if they believe their initial investigation led them in the wrong direction. They narrow the field of suspects down by accumulating evidence and growing more certain that they actually captured an insurgent or criminal.

Actual Bayesian probability in the real world grows painfully complex and only recently could be done through advanced computing. The above insurgent/civilian example is idealistic and simplified.

Take a more realistic example. I won’t even try to do the math on this one:

Intel knows there are insurgents in the town. The census says the town has a population of 10,000, but traders and migrants are constantly moving around and the census has not been updated in a few years. They have a 90% confidence that the population is between 9,000 and 11,000 at any point in time. Intel has a 80% confidence that there may be between 100 to 200 insurgents in the town. Intel has a 90% confindence interval that 50% of the insurgents move in and out of the town regularly. Intel has investigated an estimated 14% of the town with a test that gives them 60% accuracy. And they must use spatial and time analysis to estimate the probability of an insurgent being in a particular building at a particular time.

Now the command officer walks up to the S-2 intel officer and demands to know exactly how many insurgents there are and in which houses at exactly 3:00pm today so he can eliminate them without causing any civilian casualties. Yes sir. The math goes to hell and back.

Intelligence uses predictive analysis to predict a range of events in the face of extreme uncertainty. Yes, it is possible. It predicts the probability of events, but it cannot predict which events will occur.

It is similar to forcasting the weather. Meteorologists in the morning predict that there is a 70% chance of rain this afternoon, based on their current knowledge and modeling of the clouds and wind patterns. By noon, their radar updated the cloud positions and wind direction and determined that there is a 50% chance of showers in the afternoon. It does not rain that day. The Meteorologists were correct. They never said for certain that it would rain. But there are a lot of people carrying an umbrella on a sunny day.

One of the biggest problems in intelligence are “Black Swans.” These are highly improbable events that never occured before. Just because they have not occurred yet does not mean they will not. Yet, there is no real way to predict or prepare for such events. It’s almost irrational to worry about and make expensive perparations for Black Swan events. And yet, they sometimes do happen.

Intel Analysts are in a difficult situation. Social Sciences today are an historical science, much like the science of biological evolution. They offer probabilities of future but do not make predictions as in the mechanical sciences. Intelligence is about people, not machines. Intel must use whatever tools available to them to shed a little light for leaders stumbling through the darkness.

I want to correct the wrong perception that Intelligence actually predicts future events. The inability to stop a Black Swan event, or a false prediction of a Black Swan event, does not always mean that the intelligence community “failed.” Politicians have long distorted the role of intel analysis. They treat Intel like it is magic, like analysts are ancient soothsayers.

Net Wars