A compiler is a computer program that converts the program as written by the programmer, or the "source code" as it is known, into a different form that the computer can understand. This second form is called "object code" and represents very "low-level" instructions that tell the computer's Central Processing Unit to perform very specific and basic functions. This layer of abstraction is important because it means that the same source code can (theoretically at least) be compiled on many different machines using a compiler that knows what instructions to write for each machine.
Object code is an intermediate form of code between the source code and the executable code the computer can run. The object code contains the instructions in the right form, but they need organizing in order to use the computer's memory correctly, and they need to be reconciled with a few other run-time systems. Object code is turned into executable code by a "linker." Most compilers also act as a linker and accept options when they are executed to tell them whether they should perform the link step. There are usually many more options that compilers accept that dictate the way they produce the object code; these options are often dependent on the machine the compiler is running on and can be extremely specific and abstruse.
In effect a compiler "represents" the language it is compiling because one of its tasks is to read the source code and decide if the code comprises legal statements and expressions of the language. It does this by "parsing" the code and creating an "abstract syntax tree." An "abstract syntax tree" is a data structure that compilers and interpreters use internally to represent some sequence of language code that has been parsed. The "abstract syntax" of the language represents the rules of the language and defines every possible legal "abstract syntax tree" that can be created.
If, and only if, the code is syntactically correct will the compiler create the object code and, if the link option is required, link the object code into an executable program. The compiler does not, however, make any checks on the meaning or "semantics" of the code it has compiled. This means that if the programmer has made a mistake in how calculations are made, or even a mistake that can cause a catastrophic system failure, the compiler cannot know this and will happily write code that does the wrong thing.
Compilers are different from "interpreters" because interpreters typically take each statement of source code at a time and then execute it before moving on to the next one; compilers, in contrast, process the whole of the source code in one go and then create a file that the user gets the computer to run at a later date. The advantage with compilers is that compiled code is usually much faster when it runs than interpreted code because all of the hard work has been done ahead of time. Compiled code, however, cannot be run immediately as it takes time for the compilation and linking processes to complete, whereas interpreters can begin executing code straight away.
In recent years, however, relatively new languages like Perl and, especially, Java have blurred the distinction between compilers and interpreters almost to the point of invisibility. Perl and Java compile their source code to an intermediate form called "byte code." Conceptually byte code is similar to object code, but the difference is that byte code is then interpreted by a program called a "virtual machine" rather than being linked into an executable file of machine instructions.
This means that the byte code for a given fragment of source code is identical no matter what physical machine produced it, whereas object code is different from machine type to machine type. One of the advantages of this is that the language becomes more "portable"; that is, the same source code can be compiled once and then executed on any machine that supports a virtual machine of the correct type. This is markedly different from truly compiled languages like C and C++, which are minefields of incompatibility problems when it comes to different machines and operating systems.
The way the Perl and Java compilers work differ in several ways. For example the Perl compiler behaves in an interpreter-like fashion in that, once it has compiled the source code to byte code, it immediately begins to execute the byte code. This means that unless the Perl programmer uses an add-on Perl compiler (a program that is not part of the Perl distribution that turns Perl byte code to machine-specific executable code), he or she must distribute the source code in its entirety. For some applications this has unacceptable implications for intellectual property.
The Java compiler on the other hand can behave in the same interpreter-like way, but it is more often used to emit byte code into a file that the Java Virtual Machine can then read in and execute at a different time, either on the same machine or on a different one. This allows Java programmers to distribute Java programs as byte code in much the same way as C programmers can distribute executable programs.
This is the complete article, containing 857 words
(approx. 3 pages at 300 words per page).