C
Origins of C
C is a general-purpose, high-level, compiled programming language. Dennis Ritchie developed C in the early 1970s for use with the UNIX operating system running on the Digital Equipment Corporation (DEC) PDP-11 computers at the BELL Laboratories in Murray Hill, New Jersey. The UNIX operating system, most of the utility programs, and the C compiler itself were written in C. (C is sometimes erroneously thought to be inseparable from UNIX.)
Many of the important ideas of C originally came from the language BCPL, developed by Martin Richards. The influence of BCPL on C came indirectly through the language B, which Ken Thompson wrote in 1970 for the very first UNIX system, a DEC PDP-7. By 1972 B had evolved into C. (The original source code of the very first C compiler has been released into the public domain on the Internet at <http://cm.bell-labs.com/cm/cs/who/dmr/primevalC.html>.)
For a long time C was considered difficult to learn and its use was largely confined to DEC machines and dedicated programmers. C did not really escape into the wider world of the IBM-type Personal Computer (PC) and the Disk Operating System (DOS, forerunner of Windows) until 1980. It was used briefly on UNIX's little brother, CP/M, but CP/M was an 8-bit operating system and the consequent memory restrictions meant C was severely limited in what it could do. But in 1980, with the introduction of the Intel Corporation's 8088 and 8086 microprocessor families with their 16-bit address space, C took off and several vendors turned out C compilers.
In 1983 the American National Standards Institute (ANSI) formed a committee, X3J11, to standardize the C language. After a long and difficult process the committee's work was ratified on December 14, 1989 as the standard ANSI X3.159-1989, which was published in the spring of 1990. ANSI C chiefly standardized existing practice, with a few additions from C++ (most notably function prototypes) and added support for multinational character sets. The ANSI C standard also formalized the C run-time library support routines. In 1990 the Standard was adopted as an international standard, ISO/IEC 9899:1990, and this ISO Standard replaced the earlier X3.159 even within the United States.
Uses of C
Programmers often consider C to be only a small step up from assembly language, in which the instructions for the computer are encoded as a series of mnemonics and symbols--a very detailed, specific, and difficult-to-read form of program. C is similar to assembly language in that it shares some of the basic concepts, such as representing operations and data as mnemonics and symbols, as well as allowing direct access to individual bits in the computer's memory. It is, however, more comprehensive than assembly language because it allows much more sophisticated operations using very concise language constructs. For example, C syntax allows programmers to form loops using "for" and "while" statements that the compiler will translate into a much larger number of machine instructions. More important, C embodies the concept of functions, which are like assembly language subroutines but much more powerful.
Functions are self-contained blocks of code that perform a set action and return a value of a known type. It is possible to pass variables to a function and have the function operate on them and even change them if desired. Another powerful characteristic of C is that programs can be written spanning multiple source files (blocks of C code created by a programmer) which can be compiled independently of each other. To form the final program, the independently compiled files are linked into a cohesive whole by the linker (see linkers and loaders). To allow code in one file to use code in another file, C has "header files" or "include files" that contain declarations of what functions and variables exist in the individual source files.
C has frequently been called a "systems programming language" because it works very close to the machine level. That is, it is easy to use C to communicate with and control the physical devices that make up the computer's hardware (e.g., to access disk files and chunks of memory or "memory buffers"). This makes C extremely useful for writing compilers and operating systems. Nevertheless, C is not bound to any one operating system, machine, or machine architecture.
For example, the Windows operating system is written in C, and the original Windows Application Programming Interface (API) is a C API, which means that all the early Windows applications were also written in C. In industry C has been used to develop systems as diverse as Web sites and automated stock trading systems.
As C is a compiled language, C programs are written as human-readable source files and then translated by another program (a compiler) into a form computers can execute directly. These instructions are often called machine instructions or machine codes.
Data Types
In contrast to its predecessors, BCPL and B, C has a number of different built-in data types. This means that every item of data the program operates on is one of a finite number of defined flavors or types that have distinct characteristics and formats. The three built-in type categories are characters, integers, and floating-point numbers. Each of these comes in several sub-types, as discussed below.
This notion of the type is pervasive in C. The value of a variable or the value returned from a function is partially determined by its type. For example, an integer type number, such as 42, cannot have a fractional part, but a floating-point type number, such as 42.237 x 104, must. Every variable in a C program has a type, and C broadly divides object types into two categories: simple and complex.
Simple types are the basic data types that the standard defines; they include signed and unsigned characters, signed and unsigned integers of various lengths, and floating point numbers. The sizes (bit lengths) of these fundamental data types are not actually set by the language standard, except for the character type (type char); the size of a char variable is always eight bits no matter what kind computer is being used. The sizes of the other types depend on what the compiler says they are, and are usually determined by the size of the machine word in the computer hardware. This can cause real problems when trying to use programs on a kind of computer different than the kind they were written on.
ANSI C recognizes five complex (or derived) data types: arrays, functions, pointers, structures, and unions.
An array is a set of contiguously allocated objects in memory, all of which have the same type.
A function type is the type of a value returned by a function. It is thus common to speak of integer functions, character functions, and so on. By default the C compiler will give a function a type of integer. A function's type is often called its return type. Functions can return values of basic types, structures, unions, or pointers. C also supports recursion, which occurs when a function calls itself. Most searching and sorting programs (which locate and order data) rely heavily on recursion for efficient operation. Local variables--variables that are defined only within a function, not in the main body of the program--are generally automatic, that is, are created anew each time the function is called. C does not allow functions to be defined inside other functions, but variables may be declared inside any block of code.
A pointer type variable is a value in memory that tells the programmer the location of an object (piece of information) elsewhere in memory. For instance, if a person's name is stored in memory, then a pointer to this object will contain the location in memory where it resides. One reason for using pointers is that they allow the programmer to pass references to data objects around in the program without having to pass the objects themselves--which are generally much bulkier than the pointers to them. This can make programs run much faster.
A structure type variable is a set of contiguously allocated objects called member objects that may all have different sub-types. These member objects can have basic types or can be other derived types. A structure or object recording information about a person, for example, might have members that stored the person's age, height, gender, address, and so on.
A union type is conceptually like an overlapping structure type where an object of the structure type can have different representations in different circumstances.
Functions
C programs are composed of functions that operate in combination to perform some task. Every C program has at least one function, "main( )." (This is usually just how it looks: rest of the program's code appears after the parentheses, not between them.) Usually a C program will have many functions, perhaps hundreds or thousands, spread across many source files.
Probably the very simplest possible C program is the ubiquitous "Hello World" program encountered by all new programmers:
- #include <stdio.h>
- main(int argc, *char[] argv)
- {
- printf("Hello, World!\n");
- }
The "Hello World" program displays some features typical of all C programs. Within the bounds of a function, which are always delimited with curly braces {like so}, C programs are written as a series of statements, each one being terminated by a semicolon (";"). In the program above there is only one expression statement--printf("Hello, World!\n");--which writes the message "Hello, World!" on the screen and adds a new line after it.
Statements
C has five types of statements: expressions, flow controls, labels, empty statements, and block statements.
Expressions are the mainstay of the language. Expressions are formed from operators and operands (the items operators operate on). Expression statements manipulate data objects or communicate with the world outside the computer via its input/output devices. Assignment statements (e.g., "float a = 7.95;") and function calls (e.g., "a = cos(3.145);") are the two most common types of expressions. C expressions resemble conventional algebraic statements in that they are series of terms joined together by operators.
Flow control statements embody the decision logic that tells the executing program what action to carry out next depending on the values of certain variables or expression statements. Flow control statements include selection, iteration, and jump statements that work together to direct program flow. C has all the basic flow-control constructions needed for writing properly structured programs: decision logic ("if" and "else" statements); selecting one of a finite number of cases ("switch" statements); looping with the termination test located either at the top ("while" and "for" loops) or at the bottom ("do"); early loop repeat ("continue"); and early loop exit ("break"). Flow control statements are often referred to as conditional expressions because the route they choose is conditional on the value of some expression or variable.
Label statements are something of an anomaly in a structured language like C and are reminiscent of lower-level languages like assembly language. Labels cause the program to jump to a pre-defined point using a "goto [label]" instruction. This is analogous to the "jump [address]" instruction found in assembly language.
The null statement is a strange one, being seemingly useless. It does, however, make programs readable and more compact by taking the place of a "real" statement when C syntax demands it. For example, some expressions and functions have very desirable side effects in C, and when these are used in conditional expressions, the null statement fills the syntactically required gap without actually doing anything.
Finally, block statements are blocks of other statements (including, if necessary, block statements) delimited by pairs of braces. Block statements return no value and exist to allow the programmer to write several statements where the syntax requires or allows only one.
Memory Management
The C language does not define any memory-storage allocation facility other than static definition and the stack discipline provided by the local variables of functions; there is no "heap" or "garbage collection" (memory management schemes employed by some other languages). What this means is that the programmer is almost exclusively responsible for allocating and then freeing up the chunks of the computer's memory that a program uses. This has made C notorious as a program that creates "memory leaks" (where memory is allocated and never released) and program "crashes" (where the program writes to memory areas that belong to another program, introducing fatal errors). Even the best programmers make mistakes in memory manipulation.
Conclusion
C is a relatively "low-level" language. This characterization is not a negative thing; it simply means that C deals with the same sort of objects that most computers do, namely characters, numbers, and addresses.
Despite its memory-management drawbacks and relatively low-level nature, C is a very functionally rich language and its closeness to the hardware level makes it a very fast, powerful, and flexible tool for writing programs. Even though it has been overtaken in recent years by C++ (of which C is a subset), Java, and even Perl, the number of extant programs written in C and the amount of current development still going on in C assure that it will continue to be around for a long time to come.
This is the complete article, containing 2,139 words
(approx. 7 pages at 300 words per page).