A randomized algorithm is one which is able to take one or more steps in a non-deterministic fashion, on purpose. Such algorithms are used in many cases. The correctness, or the worst-case or average performance of many deterministic algorithms is often derived from an "adversary argument" which posits the existence of a hypothetical "adversary" who is able to arrange things so as to make things as difficult as possible for the algorithm. In this context, it is often assumed that the adversary has full knowledge of the expected behavior of the algorithm. A randomized algorithm which may behave in several ways at one or more steps may be thought of as a probability distribution on a set of deterministic algorithms (one for each possible choice the randomized algorithm may make). An adversary in such a situation may not be able to devise a single input strategy that defeats any randomly chosen algorithm from this set. This is the paradigm that lies behind the success of randomized algorithms.
Randomized algorithms are used for many reasons, some of which we look at below.
Random sampling--if a population space of items to be evaluated is very large so that not every member of the space can be measured, then a randomization procedure is applied to draw a sample whose characteristics are likely to be close to that of the whole population itself. A real-life example of random sampling is in political polls and the like (where instead of asking the whole electorate whom they will vote for in the next election, a sample of a few thousand or so is asked, and the results extrapolated for the whole community), but in computer science the technique is used most notably in selection algorithms, geometric and graph algorithms, and approximate counting.
Abundance of witnesses--if it is required for an algorithm to determine whether a given input has a certain property (such as whether a number given is prime), then the problem is often solved by finding a "witness" which can attest to the property in the input, or conversely, by showing that no witness exists. In many problems, the search space in which to look for such a witness is very large, and an exhaustive search cannot be made. However, if one is able to establish that there are many witnesses in the search space, it is possible to draw candidates from that space at random and test whether they are witnesses. If the randomization is done well, then after a limited number of trials it can be said with a high degree of assurance that if no witness is found, then the input does not have the required property.
Fingerprinting--a long input such as a string can often be represented by a short "fingerprint" using a random algorithm. If two inputs are to be compared, for example, it may suffice to show that their fingerprints are identical, rather than having to compare the complete inputs in every detail.
Random reordering--in card games, it is almost always the practice to shuffle the deck well before the cards are dealt. The reason for this is that shuffling removes any patterns there may be in the deck (from previous games, etc.), and makes the game fair. Likewise, in many sorting algorithms and the like, it is a good idea to use a pre-processing step and randomly reorder the input before the main algorithm is applied. The chances that the reordered input will be a pathological worst-case for the algorithm is then reduced considerably.
Load balancing--in parallel or distributed computing applications where it is desirable for processors to share the computing burden evenly, randomization is often a good technique to spread the load around. Deterministic load-balancing is usually not possible because real-time global information concerning all processors is not available at any one location in the system.
Randomized algorithms are of two basic types, called Las Vegas and Monte Carlo algorithms. A Las Vegas algorithm will always give a correct result, but its performance with respect to running time and use of other resources may vary. A Monte Carlo algorithm, by contrast, is one that may fail. Monte Carlo algorithms that compute decision problems--those for which the answer is always YES or NO, such as, "Can this non-planar graph be colored with five or fewer colors?"--may be of two types, having either one-sided error or two-sided error. A Monte Carlo algorithm with one-sided error can only be wrong one way; for instance, it may be wrong when it says NO, but it will always be right if it says YES. One with two-sided error can be wrong both ways.
Genetic algorithms are a special kind of randomized algorithms which scour a large search space using randomization.
One important issue in all randomization procedures used in algorithms—or even in the real world--is how one obtains a random value for use. In the real world, it is common to toss a coin or roll a die; if there are no structural deformities that bias the coin or the die to land a particular way, we are able to say that the result is a random value. However, within computers, the problem is more difficult. It has been shown mathematically that no deterministic algorithm can be used to generate truly random numbers. However, a sufficiently complex and well-designed program or routine may generate "pseudo-random numbers" that look sufficiently random unless one understands the workings of the program used to create them. Such a routine is usually begun by taking as input a specific value, called a "seed," and then generates a fixed but random-looking sequence. Every sequence of invocations of the routine starting with a particular seed is the same, which can be useful in some cases. If such repetitive behavior is not desirable, it is common practice to seed a pseudo-random generator using an unpredictable variable, such as the process id or the time of day. (Every process on Unix and other operating systems has a unique process id, one that cannot easily be predicted because on a larger system, many processes are created and killed constantly and the complete process history since the most recent start of the system is not known. Likewise, the time-of-day with an accuracy of microseconds is almost impossible to fix with any accuracy by an adversary.)
This is the complete article, containing 1,044 words
(approx. 3 pages at 300 words per page).