Chomsky and Computer Science
In the 1950s the formal study of languages—linguistics—was revolutionized almost single-handedly by Noam Chomsky, a linguist, M.I.T. professor, and political activist.
Chomsky's Linguistics
Traditional linguistic theory has two major subdivisions: syntax and semantics. Syntax is the study of linguistic form, and its "primary concern is to determine the grammatical sentences of any given language and to bring to light their underlying formal structure," Chomsky writes. Semantics, on the other hand, is concerned "with the meaning and reference of linguistic expressions. It is thus the study of how this instrument . . . is actually put to use in a speech community." It is through syntax that the fundamental nature of language can best be understood, in Chomsky's view. Much effort, he notes, has been expended in attempts to answer the question, "how can you construct a grammar with no appeal to meaning. But one might ask with equal justification: "How can you construct a grammar with no knowledge of the hair color of speakers?"
Instead of focusing on minimal units of sound, as the structural linguists had, Chomsky focuses on the creation of rudimentary sentences. In his scheme of "transformational grammar" every intelligible sentence follows not only the learned grammatical rules of a particular language but also a universal grammar that is innate in the human brain. This "deep structure" underpins all languages and can be studied as an abstract system. In Chomsky's view, only such a quasi-mathematical analysis will establish linguistics as a science.
Prior to the publication of his Syntactic Structures, social scientists viewed most of human behavior, including language, as a product of conditioning. Learning to speak was, like other physical skills, acquired through observation and trial and error. Chomsky's argument that language acquisition was based on capacities that were part of the genetic endowment of human beings was heretical. According to Chomsky experience does not supply linguistic understanding but merely awakens and enriches a framework of shared parameters that he calls "universal grammar." The relatively simple rules of this grammar allow for rich variation in content. "This is the way we learn language. We simply learn the label that goes with the preexisting concept."
Syntactic Structures defines language as a finite or infinite set of sentences, "each finite in length and constructed out of a finite set of elements. . . . Similarly, the set of 'sentences' of some formalized system of mathematics can be considered a language." A series of "rewrite rules" can be used to generate the underlying phrase-structure of a sentence, and further rules can form more-complex sentences from the phrase-structure. The result of this transformational-generative grammar is a surface structure that is identical to an actual sentence in a natural language. All languages, Chomsky says, share the same deep-structure elements; they differ from each other in surface structure because of the application of differing rules for transformations, pronunciation, and word usage. English, like other languages, consists of an infinite number of possible sentences, only a small fraction of which have been or ever will be uttered. Any system of linguistic analysis has to account for the ability of language users to create and to understand sentences that they have never before said or heard: "It seems clear that we must regard linguistic competence--knowledge of a language--as an abstract system underlying behavior, a system constituted by rules that interact to determine the form and intrinsic meaning of a potentially infinite number of sentences." Chomsky calls this system a "generative grammar." The universal grammatical rules that make the utterances possible are simple, but the operations that give rise to sentences that convey meaning are complex.
In Syntactic Structures Chomsky moved from the realm of descriptive linguistics, based on empiricism, to the rationalism and idealism of generative linguistics. He thereby put the field on a footing closer to the physical sciences than it had ever been before.
Importance to Computer Science
It is important to understand that the word "languages" here refers not only to real, spoken languages such as English or Urdu but to virtually any expressive system that can be built up by stringing together symbols. The study of "languages" in this broad sense includes not only spoken (or "natural") languages but also artificial languages such as those used to program computers, hence the usefulness of this field to computer science. Especially in the study of computability (solvable vs. unsolvable problems) and in the design of compilers (the programs that translate programs in C, FORTRAN, BASIC, or other high-level languages into executable machine code) the mathematical theory of formal languages is an essential tool.
To create a formal language one begins by defining an alphabet , that is, a set of symbols or marks: the statement "B = {a, b,1, <S>}" says that B is an alphabet containing four symbols, namely a, b, 1, and <S>. A grammar is a set of rules that produces strings of alphabet symbols; in Chomsky's words, "a grammar can be regarded as a device that enumerates the sentences of a language" (given an alphabet to work with). That is, if one takes the rules of a given grammar and applies them systematically to an alphabet, producing all the strings that can be made by all possible combinations of the rules, the resulting set of strings or "sentences" (which may be infinitely large) is the formal language defined by that grammar. A language is thus said to be "generated by" its grammar. A string of symbols that does not obey the grammar of a language is not a statement in that language: "dog blue car" is a string of English words but is not an English sentence because it does not obey English grammar.
A simple grammar that generates a language from the alphabet B = {a, b, 1, <S>} is given by the following four rules (also called "productions," because they produce one string from another):
Rule 1. <S> a
Rule 2. <S> b
Rule 3. <S> <S>a
Rule 4. <S> b<S>
The symbol "" means "may be replaced by." Rule 1, for instance, says that wherever the symbol <S> occurs, whether alone or embedded in a longer string, it may be replaced by a. If we start with the symbol <S> (as we clearly must) and apply rules 3, 4, and 1, we obtain a finished sentence (baa) in the formal language generated by this grammar:
In Chomsky's words, "a grammar can be regarded as a device," that is, as a machine (a software machine), a set of rules that does things to symbols mechanically, without concern for their meaning. Seeing grammars as machines that produce languages was one of Chomsky's key contributions to formal language theory (and thus to computer science). For if grammars are machines, we can build them to order. We can design grammars that are very simple or very complex. We can analyze them with all the rigor with which we are accustomed to analyzing other machines.
While a "grammar" is a software machine for generating strings that belong to a language, an "automaton" is a software machine for testing strings to see whether they belong to a given language. An automaton may be conceived as a device that reads the symbols of a string one by one and decides, after having read in the whole string, whether to accept it (certify it as a valid sentence) or reject it. An automaton indicates its decision by setting a binary output flag: Accept or Reject. The simpler the grammar, the simpler the automaton that checks for obedience to it.
In the language described above, it is easy to see that no sentence consisting entirely of a's or b's can be generated; an automaton checking for obedience to this grammar would therefore have to reject aaa or bbbb. It would also (as the reader can verify) have to reject aba and accept bbaa.
Chomsky showed that all possible grammars, along with the automata that check for obedience to them, fall into four classes. He also showed that these classes can be ranked from narrowest to most inclusive and that each level in the hierarchy includes all the levels below it in the sense that whatever language features a lower-level grammar can produce, the grammars above it can all produce too. This structure is known as the Chomsky hierarchy and is basic to the study of formal languages in computer science. The Chomsky hierarchy, including the types of automata that are capable of deciding to Accept or Reject at each level, is as follows (ordered from the most inclusive class of grammars down to the narrowest):
The relevance of grammars, automata, and the Chomsky hierarchy to computers is basic. Say that a program is written in the high-level language C--or what the programmer hopes is C. The program may contain mistakes, that is, words or phrases that are not really part of the C language and which the computer cannot use. The first step of compiling such a program is to check all its words to see if they are part of the word list or "lexicon" of C. This stage of compilation is carried out by an automaton called a lexical analyzer or lexer. The lexer checks each word in the program; if it accepts them all, the program at least might be a valid C program. (Or, like "dog blue car," the program might contain valid words but fail to make higher-level sense.) A lexer is checking for grammatical correctness at the lowest level of the Chomsky hierarchy, that of "regular grammar." If a program passes this test, a compiler's next step is to accept or reject at the next-highest level of the Chomsky hierarchy, that of "context-free grammar."
Natural languages like English can be successfully described, for the most part, by grammars at the third level of the Chomsky hierarchy (context-sensitive). Most programming languages can be fully described at the second level (context-free), but may contain some context-sensitive features. The highest level of the hierarchy seems, so far, to be mostly of theoretical interest.
This is the complete article, containing 1,654 words
(approx. 6 pages at 300 words per page).

Chomsky and Computer Science article
Copyrights
Chomsky and Computer Science from World of Computer Science. ©2005-2006 Thomson Gale, a part of the Thomson Corporation. All rights reserved.