Sorting
In many cases it is necessary to take a sequence of values and sort them in some specified order. For example, a list of class examination scores may be sorted by decreasing order of scores, or by the identification number of the students in the class. In most cases, there is a record with several fields, such as student's name, number, score, and grade. A field such as the score which is sorted against is called a key, while other fields are considered ancillary to it and are referred to as satellite data. When a sorting algorithm rearranges the order of the key fields, it rearranges the satellite fields as well. However, these details are often irrelevant in considering the question of the methods to be used for sorting, as details of implementation are usually left to the programmer. For purposes of understanding the algorithms, we pretend that the key fields are the only ones that exist.
The simplest sorting algorithm is called bubble sort. Here, the list is scanned in order, and when a key value is found to be in the wrong position relative to its immediate neighbor, the two are switched. The sorting is not accomplished in one pass over the entire sequence, however, and as many passes are needed as there are values to be sorted. The algorithm is given its name because a value that needs to move into a different position in the sequence "bubbles through" the sequence one step at a time.
Another simple sorting algorithm is called insertion sort, which proceeds the way people sort a hand of cards while playing a game like gin rummy. The algorithm simply starts with an empty sequence, and inserts each element into its right place respective to the others. This algorithm has the disadvantage of requiring greater storage space, and is not as efficient as possible.
Merge sort is another algorithm that uses the divide-and-conquer approach by breaking up the sequence to be sorted into smaller subsequences, and then sorting them using simple methods. In fact, when the size of each individual subsequence is 1, there is no real work to be done, as each such subsequence is trivially sorted. Using the divide-and-conquer approach, it is then possible to combine the smaller sorted sequences to obtain the total solution.
Two of the other commonly used sorting algorithms are heap sort and quick sort.
Heap sort relies on a special data structure called a heap, which is a complete binary tree with values stored at the nodes, such that each node in the tree has a value no larger than its parent. Because the root of the heap holds the largest value, one can obtain the largest value in the heap immediately simply by reading the root. After the root is read, its value is destroyed, and a value from another node is moved up to take its place. This disturbs the heap so that the "heap property" that no node can be smaller than its child no longer holds, so it is necessary to heapify the data structure again by switching any parent and child wherever the parent is smaller than the child. In this way, the heap is restored again. By repeatedly reading the root and restoring the heap, it is possible to obtain the elements in decreasing order. Alternatively, we could define the heap to have the property that no parent is larger than its child, and obtain them in increasing order as well.
Quicksort is very similar to merge sort in its use of the divide-and-conquer method and recursive solution.
- Divide--the array to be sorted is partitioned into two parts such that every element in the first part is less than or equal to every element in the second part.
- Conquer--the two parts are sorted by recursive calls to quicksort.
- Combine--since the parts are in the correct positions relative to each other, no further work is required, and the final array is readily obtained.
The previous sorting algorithms all involve comparisons between the sequence of values that are to be sorted. In some cases, it is not necessary to carry out comparisons in sorting, and there is a well-known result that says that if one does use comparisons in sorting, then the performance achieved is not going to be proportional to the size of the input array but will be larger, while if comparisons are not used, then such proportional results could be achieved. If one has an input array of size 10000, for instance, the difference could be of a factor of between 4 and 10000 in the time taken. For this reason, in some cases, it is preferable to use methods like counting sort and radix sort, which do not involve comparisons. For example, if it is known that all student identification numbers in a school run from 1 to 2300, then it is not necessary to compare values while sorting student records by identification number; one may place each value appropriately simply as determined by its correct value.
This is the complete article, containing 829 words
(approx. 3 pages at 300 words per page).