Bayes's Theorem
In the study of probability, two events A and B are said to be independent events if neither event influences or effects the other. For example, tossing a coin twice yields independent events because whether the coin shows heads or tails on the second toss is not influenced by what it showed on the first toss, or vice versa. Whenever two events, A and B, are independent, we can compute the probability of both A and B happening by simply multiplying the individual probabilities of A and B, i.e., P(A and B)=P(A)P(B), also written as P(AB)=P(A)P(B). The English mathematician Thomas Bayes (1702-1761) considered the situation in which events A and B are not independent. In this situation, the formula for P(AB) is not as simple as in the independent case. It first requires a notation for the probability of B happening given that A has happened. This notation is P(B│A) and this is known as "the conditional probability of B given A." Now if event B is influenced by event A, then the probability of both A and B happening is equal to the probability that A happens times the probability of B happening given that A happens, or, in symbols, P(AB)= P(A)P(B│A). To illustrate, suppose that we toss a fair die and ask for the probability that a number less than 4 turns up. If no further information is given, then the probability is P(1 or 2 or 3)= P(1)+P(2)+P(3)=1/6+1/6+1/6=1/2. But suppose that the question is about the probability that a number less than 4 turns up given that the toss turned up an odd number. Then this is a conditional probability P(less than 4│odd) = P(less than 4 and odd)/P(odd) = (2/6)/(1/2)= 2/3. Note that the additional information that the upturned number was odd increased the probability from ½ to ⅔. Bayes would use this principle of the effect of increased information on probabilities to establish the theory of what is now called "Bayesian estimation." This program brought forth the possibility of using "subjective" probabilities as well as "objective" probabilities. Objective probabilities are those describing events such as coin tossing, dice throwing, card playing where the probabilities are objectively calculated by the underlying mathematics of the event. Subjective probabilities, on the other hand, can be human estimates of the likelihood that certain events will occur given some past history of similar events occurring. This use of subjective probability estimates is controversial in the community of classical statisticians, but it is used frequently in decision making when objective probabilities are unknown. Bayes was the first mathematician to suggest such an inductive approach to the calculation of probabilities.
Bayes's theorem first appeared in "An Essay Toward Solving a Problem in the Doctrine of Chances" published posthumously in 1763. The theorem addresses the following issue: Given that an event has occurred and that this event may have been the result of two or more causes, what is the probability that the event was the result of a particular cause. To start simply, let us assume that an event A may have been the result of one of the two causes: A1 or A2. Bayes asks the question: What is the probability that A1 caused A? Or, in other words, what is the probability that A1 occurred given that A occurred? Thus we are looking for P(A1 │A), which, from the discussion in the above paragraph, is equal to P(A1A)/P(A), which is equal to P(A1)P(A│ A1)/P(A). Now the "total" probability of A or P(A)=P(AA1)+P(A A2)=P(A1)P(A│ A1)+P(A2)P(A│ A2. Putting all this together, we have Bayes's theorem:
P(A1│A)=P(A1)P(A│ A1)/ (P(A1)P(A│ A1)+ P(A2)P(A│ A2))
The theorem may be extended to any number of possible causes, in which case it is written as:
P(Ak│A)=P(Ak)P(A│ Ak)/(P(Ak)P(A│Ak), where k takes on integer values from 1 to n, the total number of possible causes.
In the above formula, P(Ak│A) is called the "posterior" probability of Ak given A. This is what we seek to calculate. The expressions P(A│Ak) are called "prior" probabilities. These are often subjective and, therefore, controversial to some statisticians. The scientific, engineering, and business worlds seem to have made their peace with prior probabilities because these disciplines make extensive use of Bayesian statistics in decision making. Ironically, it has been in the latter half of the 20th century that this 18th century theorem has come into its own, largely due to the number-crunching capabilities of modern high speed computers. When the number of prior probabilities is large, the hand calculation of posterior probabilities by Bayes's theorem is practically impossible, but poses no problem for the lightning fast computers available at the end of the 20th century. In the 1990s, computing power and speed reached a level at which it was possible to bring Bayes's theorem to bear on complex decision making through the concept of "Bayesian networks." These are complex diagrams that organize a knowledge base in a given field by mapping out cause and effect relationships among key variables and assigning prior probabilities that represent the extent to which one variable is likely to affect another. When these Bayesian networks are programmed into computers, they can generate optimal predictions even in the absence of key pieces of information. The idea is that professionals in a field can make intuitive educated estimates about certain prior probabilities for which the precise values may be unknown. Furthermore, Bayesian networks allow practitioners to periodically update prior probabilities on the basis of new information. Microsoft Corporation was a pioneer in the research leading to the utilization of Bayesian networks, which it now includes in many of its software products. The German conglomorate Siemens has adopted Bayesian networks for application in industry. Hospitals are using Bayesian techniques to assist doctors in making diagnoses of diseases. General Electric has introduced Bayesian networks to take information from sensors attached to an engine and combine this information with data on past engine performance and expert opinion from its own engineers and scientists to predict the likelihood of various engine problems. The field of Artificial Intelligence has received a boost from the introduction of Bayesian networks that are able to "learn" automatically based on new knowledge. The essence of Bayes's message to the world is that decision making can be greatly improved when new information is allowed to be taken into account when calculating probabilities.
This is the complete article, containing 1,046 words
(approx. 3 pages at 300 words per page).