Assembly Language
Computer programming can be categorized in a variety of ways. Programming methods can be procedural, object oriented, or event driven, while languages themselves can be divided into two major categories: low-level and high-level. Assembly language and machine language are low-level languages because they "speak," directly or almost directly, the language the computer understands. On the other hand, languages such as C, C++, Pascal, BASIC, Visual Basic, and COBOL are high-level languages because they require more manipulation by the computer (compiling). Assembly language and machine language are considered the most fundamental of all programming languages because of the close relationship between their statements and actual central processing unit (CPU) operations at the bit and byte level. Either language provides programmers with nearly complete control over microprocessor behavior.
Assembly language programs are smaller and run faster than programs written in high-level languages. However, assembly language does not provide prewritten functions that perform common or repeated tasks. Assembly language programs are commonly written as subroutines that are then called by programs written in a high-level language. This use of assembly language subroutines helps speed up programs written in high-level languages. While complete programs (called stand-alone programs) can be written in assembly language, this is not generally practical, given the programming power of today's high-level languages. However, the spreadsheet program Lotus 1-2-3 (prior to version 2.0) was written entirely in assembly language.
Assembly language is an expression of a computer's architecture—the pattern of connectivity among its fundamental working units. Many of the nuances of high-level programming tasks can be traced to the microprocessor and its design. For instance, knowledge of a microprocessor's architecture (and of its machine language) helps in understanding Windows (the operating system for PC-type computers) and protected-mode programming. "Protected-mode programming" refers to the fact that most operating-system code and almost all application programs run in protected mode to ensure that essential data is not unintentionally overwritten.
Assembly language is composed of a set of symbolic instructions that each tell the microprocessor to perform one relatively simple operation, such as adding two numbers. An assembler program (discussed further below) translates these instructions and any data associated with them into a binary form, machine language. Machine language can reside in random-access memory (RAM) for speedy access by the central processor in performing certain tasks.
Here is a sample list of instructions that are recognized by all 8086-family microprocessors, which are products of the Intel corporation. (Below, on the left side in all uppercase letters, are the mnemonic abbreviations for operations; these are entered in the operation code field of each assembly language program instruction, that is, the part of the instruction that specifies the operation to be performed. On the right side of the equals sign are the full names of the mnemonics.)
- CBW = Convert Byte to Word
- INT = INTerrupt
- INTO = INTerrupt on Overflow
- LODSB = LOaD String (Byte)
- MOVSB = MOVe String (Byte)
- MUL = MULtiply
- NOP = No OPeration (do nothing)
- OUT = OUTput to I/O (input-output) port
Some microprocessor families, such as Intel's 80386 and 80486 families, share common instructions; others have instructions that are unique to them. Many if not most of these instructions involve the manipulation of binary numbers. Binary numbers consist of two integers, 1 and 0, because it is easy to build electronic devices that switch rapidly and reliably between two distinct states. Instead of counting in decimal numbers such as 1, 2, 3, 4, and 5, the computer counts in base 2 binary numbers: 1, 10, 11, 100, 101. As you can imagine, counting in binary can create some very long strings of ones and zeros and can be quite awkward to work with. So, another counting system was created as an aid to programmers: the hexadecimal system. Using hexadecimal (also known as hex) numbers provides a compact way to write binary numbers. Understanding assembly language requires an understanding of hexadecimal notation.
The hexadecimal numbering system is base 16; that is, just as the familiar decimal system uses 10 symbols (0-9) in each place before adding a new place to express a higher number, the hex system uses 16 sequential numbers (including 0) in each place before adding a new place. Zero through 9 are represented by the same 10 symbols as in the decimal system, and the symbols A, B, C, D, E, and F represent the values 10 through 15. Every hex symbol represents four bits of binary information (i.e., one of 16 possible numbers): hex symbol 0 = 0000; hex symbol 1 = 0001; hex symbol 2 = 0010; hex symbol B = 1011 (decimal 11); and so on.
A special program called an assembler translates assembly-language code (also known as source code) into machine language (also known as object code). If changes to an assembly-language program alter the address of certain program items, such as instructions or variables, the assembler program computes the new addresses and modifies all references to them in the assembled program.
Assembly language is a useful tool even for today's programmer, but it does have a few disadvantages:
- Different processors have different instruction sets, registers, and memory configurations, requiring a separate assembly language for each processor.
- Assembly language code is difficult to read and maintain (though it is still easier to understand than machine language code).
- The cryptic nature of assembly language code impedes documentation of the functional meaning of the program--what it is meant to accomplish.
This is the complete article, containing 885 words
(approx. 3 pages at 300 words per page).