Instruction Set
The functioning of a CPU, also known as a computer chip, involves many secondary aspects--the voltages and currents that are required, the cooling arrangements that are in place to prevent the chip from overheating, the proper functioning of ancillary devices, and so on. However, in bare terms, the features of the CPU itself may be stripped down or abstracted to just the instruction set it implements. An instruction set is the set of basic "instructions" or programmer-visible commands that a chip can carry out directly. A chip that has a specific instruction set has to be given orders that are instructions from that instruction set; complex commands must be broken down into sequences of instructions from that set.
To see what an instruction looks like, consider the following list of instructions, which are among those implemented by the 80x86 family of processors (the predecessors of the Pentium series of chips, which also have some of these, along with a great many more). The notation in brackets indicates the chip that first featured the instruction; later ones in the series have it as well.
- ADC--Add With Carry (8086)
- ADD--Arithmetic Addition (8086)
- BSR--Bit Scan Reverse (80386)
- BSWAP--Byte Swap (80486)
- BT--Bit Test (80386)
- BTC--Bit Test with Compliment (80386)
- BTR--Bit Test with Reset (80386)
- CLC--Clear Carry (8086)
- CLD--Clear Direction Flag (8086)
- SHL--Shift Logical Left (8086)
- SHR--Shift Logical Right (8086)
- XOR--Exclusive OR
These instructions, as most others, don't make much sense in plain English—but they're not supposed to. Unless one has studied the machine language of that family of processors and knows something about microprocessor instruction sets in general, it is hard to make out what a specific instruction means. Fortunately, the existence of compilers makes it possible for programmers and other end-users of computer chips to ignore the raw details of instruction sets in favor of a more refined and higher-level understanding.
Instruction sets are designed along a few major trends called instruction set architectures, just as we have Tudor, Greek, Gothic, Victorian, Renaissance, Spanish, and others among the types of architectures used in building constructions. In general, the type of internal storage in the CPU is the most basic cause of differentiation among architectures, and the major choices available here are a stack, or an accumulator, or a set of registers. Based on this, the corresponding instruction set architectures are called a stack architecture, or an accumulator architecture, or a general-purpose register architecture.
Instructions take the form of a command (usually called an opcode, for operation code) and an operand that modifies the opcode. Translating an instruction into English, for example, one may one that means, "Add the number 1 to the value stored in register A"--here the opcode might be ADD, and the operand would be 1. The operands in a stack architecture are usually found on top of the stack (hence the name), while in an accumulator architecture one operand is (implicitly) used as the accumulator. General-purpose register (GPR) architectures have only explicit operands, either registers or memory locations. At this time, such architectures are quite common.
Within GPR architectures, it is possible to access memory as part of any instruction--this corresponds to the register-memory architecture; one may also access memory only with load and store instructions, which corresponds to the register-register or load-store architecture. Another kind of GPR architecture, not currently used in chips, is the memory-memory architecture that keeps all operands in memory constantly. (There are seven possible types of GPR architectures along the lines indicated here, but these three cover almost all such instruction sets ever implemented.)
Operands may be of many types (integer, floating-point, characters, etc.). The type of an operand is usually designated by encoding it in the opcode with which it is invoked. It is also possible for the operand data to be annotated with tags that specify its type. However, instruction sets using tagged data are relics of the early decades of computer design, and are not found in contemporary instruction sets.
In encoding an instruction set, an architect also faces a choice between whether to have variable instruction codings (allow all addressing modes to be used with all operations), or fixed instruction codings (specify which operation takes which addressing mode). The trade-off involved here is between the size of programs versus the ease of decoding instructions in the CPU. Variable encodings make for smaller program size but somewhat lower performance since the chip has to do more work. Fixed encodings may increase the program size, but usually allow the program to run faster. One of the ways architects have dealt with this trade-off is by use of hybrid designs which are in between the extremes of variable and fixed encodings.
Historically, stack architectures were popular in the 1960s, during the pre-microprocessor era. Given the compiler technology of the day, they were probably the best choice. In the 1970s, after the crisis in software, the major concern became reduction of software costs, and this meant transferring as much of the overall system burden to hardware as possible. This in turn led to the creation of specialized architectures like the VAX, with large numbers of addressing modes, multiple data types, and so on. Starting in the 1980s, with a better grasp of software engineering and with many improvements in compiler technology, instruction set design started to return to the simpler GPR style, particularly using the load-store model.
This is the complete article, containing 873 words
(approx. 3 pages at 300 words per page).