Forgot your password?  


Information and Information Theory | Research & Encyclopedia Articles

Print-Friendly   Order the PDF version   Order the RTF version
About 5 pages (1,394 words)
Information theory Summary

 


Information and Information Theory

In 1948 Claude Shannon published a seminal paper, "A Mathematical Theory of Communication," in the Bell System Technical Journal, inaugurating the body of concepts now known as information theory. In his paper, Shannon defined a precise quantitative measure for the amount of "information" contained in any message and proved important mathematical theorems about the fundamental limits at which one can communicate information. To state and prove his theorems, Shannon used a probabilistic model of a communication system which consists of a source, say X, that generates symbols belonging to some predefined alphabet, which are to be transmitted over an imperfect communication channel to a decoder, Y, which then hopefully reproduces the original message (string of alphabetic symbols).

In this model of communication, an important characteristic of the source is its entropy, H(X). If the source X emits N different symbols xi, each with probability p(xi), then the entropy H(X) is defined as the sum over all the N alphabet symbols xi of the quantity p(xI) log 1/p(xi). If the logarithm in the foregoing measure is taken in base 2 then the entropy is said to have units of bits. The entropy of the source X measures the amount of uncertainty inherent in the source. One can understand this most simply by calculating the entropy in a few simple cases. Suppose one has a source that emits symbols from an alphabet of size N; furthermore suppose that each symbol is equally likely to occur (i.e., occurs with probability 1/N). In this case the reader may calculate the entropy to be simply log N. However, suppose another source also emits N possible symbols, but emits one of them with an extremely high probability (close to 1, which signifies certainty) and the rest with negligibly small probabilities. In this case the entropy would be very close to 0. The former case represents the situation of maximum uncertainty, where we have no right to expect any source symbol to be transmitted versus any other before receiving the actual message. The latter case represents one of almost total certainty, hence of almost zero entropy or uncertainty. In general, for a source with an N-symbol alphabet the entropy ranges from zero to log N bits depending on how spread out the probability distribution for symbol emission is.

As mentioned above, the entropy of a message source has units of bits. The entropy can be interpreted as specifying the amount of information transmitted by a source per symbol. Note that the word "bit" is thus used in two very different senses with regard to communications channels. We may receive a 1 or a 0--physically, a single "bit"--from a message source that is nevertheless sending less than 1 bit of information per symbol (i.e., has an entropy of less than 1). In other words, not every physical bit represents a bit of actual information. A completely predetermined 1 or 0, in fact, supplies 0 bits of information. A hard drive might be packed with billions of random "bits" that contain zero bits of information.

A deeper understanding of the entropy of an information source can be obtained by considering the problem of constructing codes to communicate the information. A code for an information source, in computer applications, usually takes the form of assigning a binary number, or code word, to each alphanumeric symbol. These binary code words can then be used to transmit information over a digital communication network. An important measure of the efficiency of a code is the average code-word length, which is merely the sum over all alphabet symbols of the number of binary digits assigned to each alphabet symbol weighted by the probability of that alphabet symbol's occurrence. In other words, the average code word length is the number of bits needed to transmit a symbol using that particular code (averaged over many messages). An important question in information theory is: Given a particular source X, what is the most efficient code that can be constructed, where "most efficient" means having the minimum average code word length?

Shannon answered this question in his celebrated Source Coding Theorem, which states that the average code worth length for a source X cannot be less than the entropy H(X) of the source. For a source with N symbols with each symbol equally likely, it is obvious that one can simply use log2 N bits to represent each of the N symbols. If N happens to be a power of two, this relationship is particularly simple: 2B = N, where B is the number of bits in each and every word in the code alphabet. However, if some alphabet symbols are more likely than others, representing a situation with less source uncertainty, Shannon's theorem indicates that we can do better. One intuitive (and workable) strategy for constructing more efficient codes is to assign shorter code words to symbols that are very likely and leave the leftover, longer code words for symbols that are unlikely to be produced by the source. This simple strategy was already exploited in Morse code even before the advent of information theory. In this code for English-language message sources, the high-probability letter "e" is assigned the short code dot, whereas low-probability letter "q" is assigned code dash-dash-dot-dash. It is clear that this strategy will reduce the average code word length; it is the principle idea behind Huffman coding, which is a well-known coding scheme that can be shown to approach the fundamental limits of efficiency outlined by Shannon in his source coding theorem.

Another important pillar of information theory established by Shannon is his Channel Coding Theorem, which places a fundamental limit on the rate at which information can be reliably transmitted by a communication channel. In order to prove this theorem, Shannon again uses a probabilistic description for a channel, namely a set of conditional probabilities. Given a source X which sends codes for symbols through a channel to a decoder Y, the properties of the channel are characterized by the conditional probabilities p(xi|yi) which are to be understood as the probability that the source did indeed send the symbol xi if we know that the decoder has emitted the symbol yi. If the channel were perfect, then these conditional probabilities would all be 0 unless xi = yi, in which case the probability would be 1. In other words, if the decoder says yi than we can be sure, every time, that the source sent yi; with an ideal channel, there is no chance of a mistake. The more general formulation allows for channel error. Given this model for the channel, one can define the concept of relative entropy or H(X|Y), which represents the amount of uncertainty remaining about the source after the decoder has been observed. Its definition is similar to the definition of H(X) but uses the conditional probabilities for the channel. The mutual information is then defined as I(X|Y) = H(X) - H(X|Y) and represents how much uncertainty in the source was reduced by the process of observing the decoder. When one maximizes the mutual information over all possible source probability distributions, one obtains the channel capacity C, a measure of how good the communication channel is. Finally, and as an additional justification for this interpretation of C, the Channel Coding Theorem states that if the channel capacity C is greater than the entropy H(X) of the source, then there exists a coding scheme where data transmission can take place with an arbitrarily low probability of error. Conversely, if the channel capacity C is smaller than the source entropy H(X), no such coding scheme can be found.

Unfortunately, the proof of the Channel Coding Theorem is non-constructive; that is, Shannon does not indicate how to construct an optimal code, only that it can be constructed. The construction of efficient error-correcting codes has become an important area of research in information theory, and many such codes exist, including Hamming code, BCH code, the Reed-Muller code, the Reed-Solomon code, and others. However, the success of information theory as a science lies not so much in the construction of specific codes as in the construction of a precise theoretical framework in which one can describe the process of communication mathematically and by means of which one can place fundamental limits on rates of information transfer. These fundamental limits continue to stand as benchmarks for all coding and transmission algorithms.

This is the complete article, containing 1,394 words (approx. 5 pages at 300 words per page).

More Information
  • View Information and Information Theory Study Pack
  • Search Results for "Information and Information Theory"
  • More Products on This Subject
    Information Theory
    "Information" is a term used universally in fields associated with computing technolo... more

    Information Theory
    While researching methods for how to more efficiently transmit information over noisy communication... more


    Ask any question on Information theory and get it answered FAST!
    Answer questions in BookRags Q&A and earn points toward
    discounted or even FREE Study Guides and other BookRags products!
    Learn more about BookRags Q&A
    Copyrights
    Information and Information Theory from World of Computer Science. ©2005-2006 Thomson Gale, a part of the Thomson Corporation. All rights reserved.

    Join BookRagslearn moreJoin BookRags

    Join BookRagslearn moreJoin BookRags