BookRags.com Literature Guides Literature
Guides
Criticism & Essays Criticism &
Essays
Questions & Answers Questions &
Answers
Lesson Plans Lesson
Plans
My Bibliography Periodic Table U.S. Presidents Shakespeare Sonnet Shake-Up
Research Anything:        
History | Encyclopedias | Films | News | Create a Bibliography | More... Login | Register | Help

Not What You Meant?  There are 20 definitions for Folding.  Also try: Coding or Programming or WAD or Opacity.

Coding Techniques

Print-Friendly  Order the PDF version  Order the RTF version
About 4 pages (1,312 words)
Computer programming Summary

Bookmark and Share Questions on this topic? Just ask!

Coding Techniques

Representation of information is a fundamental aspect of all communication from bird songs to human language to modern telecommunications. In the case of digital storage and transmission of information, mathematical analysis has led to principles that drive the design of symbolic representations. For example, it has let a binary code be defined as a mapping from a set of source symbols, or source alphabet, to unique bit strings. A familiar example is the standard American Standard Code for Information Interchange (ASCII) code (see Table 1) in which each character from a standard keyboard is represented as a 7-bit sequence.

ASCII is an example of a block code, where each symbol is represented by a fixed-length "block" of n bits. Given a number of symbols (K) to encode in binary form, a number of bits (n) is chosen such that there are enough binary patterns of that length to encode all K symbols. With n bits, 2n unique strings exist, and so we choose the smallest integer n that satisfies K 2n. Thus a 3-bit code can represent up to eight symbols, a 4-bit code can be used for a set of up to 16 symbols, etc.

Because of its universal use, the ASCII code has great advantages as a means of storing textual data and communicating between machines. On the face of it, the ASCII design seems perfectly reasonable. After all, a common language is central to communication. However, ASCII lacks certain properties desirable in a code. One of these is efficiency and the other is robustness.

Efficiency

Knowledge of symbol probabilities can be exploited to make a code more efficient. Morse code, the system of dots and dashes used for telegraph transmission

Table 1. THE 7-BIT ASCII CODE

SymbolCodeSymbolCodeSymbolCodeSymbolCode
NUL0000000 0100000@1000000'1100000
SOH0000001!0100001A1000001a1100001
STX0000010"0100010B1000010b1100010
ETX0000011#0100011C1000011c1100011
EOT0000100$0100100D1000100d1100100
ENQ0000101%0100101E1000101e1100101
ACK0000110&0100110F1000110f1100110
BEL0000111'0100111G1000111g1100111
BS0001000(0101000H1001000h1101000
TAB0001001)0101001I1001001i1101001
LF0001010*0101010J1001010j1101010
VT0001011+0101011K1001011k1101011
FF0001100,0101100L1001100l1101100
CR0001101-0101101M1001101m1101101
SO0001110.0101110N1001110n1101110
SI0001111/0101111O1001111o1101111
DLE001000000110000P1010000p1110000
DC1001000110110001Q1010001q1110001
DC2001001020110010R1010010r1110010
DC3001001130110011S1010011s1110011
DC4001010040110100T1010100t1110100
NAK001010150110101U1010101u1110101
SYN001011060110110V1010110v1110110
ETB001011170110111W1010111w1110111
CAN001100080111000X1011000x1111000
EM001100190111001Y1011001y1111001
SUB0011010:0111010Z1011010z1111010
ESC0011011;0111011[1011011{1111011
FS0011100<011110010111001111100
GS0011101=0111101]1011101}1111101
RS0011110>0111110^1011110~1111110
US0011111?0111111_1011111[.alpha]1111111

in the early days of electric communication, made use of such knowledge. By representing the more frequent letters in common English with shorter dash-dot sequences, the average time to transmit a character is reduced in a message whose character statistics are consistent with the assumed frequencies (see Table 2).

Consider codes in which the number of bits assigned to each symbol is not fixed, and let l i denote the number of bits in the string for the i th symbol si. In such a variable length code, it makes sense to assign shorter bit strings to symbols that tend to occur more frequently than average in typical use. Hence, an efficient code can be designed by making l i a function of a

Table 2. MORSE CODE

SymbolCodeSymbolCodeSymbolCode
A. -N- .0- - - - -
B-. . .O- - -1. - - - -
C-. - .P. - - .2. . - - -
D-. .Q- -. -3. .. - -
E.R. - .4. .. . -
F. . - .S. . .5. .. . .
G- - .T-6-. .. .
H. .. .U. . -7- -. . .
I. .V. .. -8- - -. .
J. - - -W. - -9- - - - .
K-. -X-. . -period. -. -. -
L. -. .Y-. - -comma- -. . - -
M- -Z- -. .
symbol's probability, pi. Let the efficiency of a code be measured as the average number of bits per symbol, Lavg, weighted by the probabilities [Eq. 1].

The example that follows illustrates the increase in efficiency offered by a variable length code. Consider a set of four symbols a, b, c, and d with corresponding probabilities p a 0.5, p b 0.25, p c pd 0.125. Two codes are listed in Table 3, with the average lengths for each computed according to Equation 1. Note that the average length computed for Code II depends on the probability distribution, whereas the average number of bits per symbol for an n -bit block code is obviously n, regardless of the probabilities. Which code would be more efficient if the four symbols all have the same probability? (They would be equally efficient, both two bits long for each symbol.)

A potential problem with variable length codes is that an encoded string of symbols may not be uniquely decodeable; that is, there may be more than one interpretation for a sequence of bits. For example, let the symbol set {a,b,c} be encoded as a 0, b 10, c 01. The sequence 010 could be interpreted as either ab or ca, thus this code is not uniquely decodeable. This problem does not occur with all variable length codes. Huffman Codes are uniquely decodeable codes that are generated based on symbol probabilities.

The entropy of a probability distribution (denoted H and defined in Equation 2) is the lower bound for L avg. That is, for a given set of probabilities p 1, p 2,… p K, the most efficient uniquely decodeable code must satisfy:

Robustness

A principle of redundancy underlies the design of error correcting codes. By using more bits than are actually required to represent a set of symbols uniquely, a more robust code can be generated. If the code is designed such that any two legal codewords differ in at least 3 bits, then the result of "flipping" the value of any bit (that is, converting a 1 to a 0 or vice versa) will result in a string that remains closer to the original than it is to any other codeword. Similarly, if the minimum distance is 5 bits, double errors can be reliably corrected, with a 7-bit minimum distance, triple errors can be corrected, etc. A very simple illustration of this principle is the case of two symbols. In each of the four codes in Table 4, the symbol A is represented by a set of 0s, and B is represented by a block of 1s. For codes of increasing blocksize, more errors can be tolerated. For example, in the case of the 5-bit double-error correcting code, the received sequence 10100 would be interpreted as an A.

Table 4. SIMPLE ERROR CORRECTING CODES

SymbolUnique encodingSingle error correctingDouble error correctingTriple error correcting
A0000000000000000
B1111111111111111

Table 3. TWO CODES ON FOUR SYMBOLS

SymbolProbabilityCode ICode II
a0.5000
b0.250110
c0.12510110
d0.12511111
Average Length 21.75

Table 5. A 7-BIT HAMMING

CODE
SymbolCodewordSymbolCodeword
A0000000
B0001011
C0010110
D0011101
E0100111
F0101100
G0110001
H0111010
I1000101
J1001110
K1010011
L1011000
M1100010
N1101001
O1110100
P1111111

The codes noted in Table 4 are inefficient, in that they require many bits per symbol. Even the single error correcting code in Table 4 uses three times as many bits than are necessary without error correction. Much more efficient error-correcting codes are possible. The code in Table 5 is an example of a family of codes developed by Richard Hamming. It is a representation of a set of 16 symbols using 7 bits per symbol. While 4 bits would be sufficient to represent each of the symbols uniquely, this 7-bit code designed by Hamming guarantees the ability to correct a single bit error in any 7-bit block. The Hamming code is designed such that any two codewords are different in at least 3 bits. Hence, if one bit is altered by a storage or transmission error, the resulting bit string is still closer to the original codeword than it is to any of the other 15 symbols. Thus, this code is robust to single errors. Note that if two errors occur in the same block, the code fails.

TABLE 5. A 7-BIT HAMMING

Conclusion

The primary framework for symbolic representation of information is human language, which has evolved over a period spanning more than 100,000 years. But only the past century has seen the application of mathematical principles to the design of encoding schemes. In combination with high-speed electronic signal transmission, the result is a system that enables communication with efficiency, reliability, and range that would have been inconceivable a few generations ago. Ongoing improvements in high-density magnetic and optical storage media have brought about a tremendous reduction in the physical space required to store information, thus amplifying the utility of these recently developed encoding techniques.

Paul Munro

Bell Labs; Binary Number System; Codes.

Bibliography

Lucky, Robert W. Silicon Dreams. New York: St. Martin's Press, 1989.

Wells, Richard B. Applied Coding and Information Theory for Engineers. Upper Saddle River, NJ: Prentice Hall, 1999.

This complete Coding Techniques contains 1,292 words. This article contains 1,312 words (approx. 4 pages at 300 words per page).

More Information
  • View Coding Techniques Study Pack
  • 20 Alternative Definitions
  • Search Results for "Coding Techniques"
  • Add This to Your Bibliography
  • More Products on This Subject
    Programming
    Award-winning computer designer and engineer W. Daniel Hillis captured the essence of programming w... more

    Programming
    Computers have one big problem—they do not comprehend English, nor in fact any other human la... more


     
    Ask any question on Computer programming and get it answered FAST!
    Answer questions in BookRags Q&A and earn points toward
    discounted or even FREE Study Guides and other BookRags products!
    Learn more about BookRags Q&A
    Copyrights
    Coding Techniques from Macmillan Science Library: Computer Sciences. Copyright © 2001-2006 by Macmillan Reference USA, an imprint of the Gale Group. All rights reserved.

    Join BookRagslearn moreJoin BookRags




    About BookRags | Customer Service | Report an Error | Terms of Use | Privacy Policy