Forgot your password?  

Not What You Meant?  There are 15 definitions for Big O.  Also try: Sort or Order (sort).

Sorting Algorithms | Research & Encyclopedia Articles

Print-Friendly   Order the PDF version   Order the RTF version
About 5 pages (1,412 words)
Sorting algorithm Summary

 


Sorting Algorithms

Sorting is the process of taking an initially unordered array of records and arranging them in some specific target order. The target order is specified by a comparison function which, given two records, decides which one should come first. For example, if the records are two integers, the comparison function may be the rule: "put the smaller number earlier in the list." There are many sorting algorithms, and some can be quite computationally complex. The main justification for going through the process of sorting is that once it is done it is relatively easy to find records in the sorted list using fast algorithms such as binary search.

An important criterion used to rate sorting algorithms is their running time. Running time is measured by the number of computational steps it takes the sorting algorithm to terminate when sorting n records. We say that an algorithm is O(n2) (read, "of order n-squared") if the number of computational steps needed to terminate as n tends to infinity increases in proportion to n2

An important theoretical constraint on the speed of sorting algorithms can be obtained from the following reformulation of the sorting problem. For n records there are n! possible linear arrangements, only one of which is the correct, sorted set of records. (n! = 1 x 2 x 3 x . . . [n -2] x [n - 1] x n.) One may imagine these n! arrangements as the leaves of a binary decision tree. The process of sorting then involves starting at the root of the tree (the unordered initial list) and traveling down the nodes, choosing either the left or right node based on results of comparisons. (In this terminology, the "root" of a tree is at the top, and all the branches expand downward.) The maximum number of steps needed would correspond to the height of the tree, which is log n!. Using Stirling's approximation of log n! for large n, this process would be O(n log n). This is the fastest we could hope for in a sorting algorithm based on comparisons.

Examples of some comparison-based sorting algorithms are bubble sort, selection sort, and insertion sort, all of which run quite slowly since they are O(n2). Selection sort is arguably the simplest. This algorithm simply scans the list for the smallest record (assuming, for this example, that record magnitude is the relevant sorting criterion) and switches that record's position with the record in the first position. Then it scans through the remaining n - 1 records, finds the smallest, and switches it with the record in second position, and so on until all the records have been arranged in ascending order.

Bubble sort, instead of going directly for the smallest record, works on the premise that whenever two adjacent records are out of order, they should be switched. On the first pass bubble sort goes down the array of records and compares each pair of records. If record i + 1 is smaller than record i than it switches them. On a single pass the list will not be sorted, but the largest record will definitely be at the end. After n such passes the list is ordered. The name "bubble sort" is derived from the image of smaller items bubbling up to the top of the list. Although both bubble sort and selection sort are O(n2), bubble sort is usually slower, since more exchanges are involved on average, and exchanging large records can be an expensive operation in terms of computation time.

Insertion sort is the algorithm most people use automatically when ordering a deck of cards. At any given point in the algorithm, insertion sort maintains in the first j occupied positions of the record array a sorted list of records. The remaining n - j records are unsorted and need to be inserted into the sorted list one by one. The first record in the unsorted list, in position j + 1, is inserted into the appropriate place in the sorted list, moving any records down if needed to make room. As the algorithm continues in this manner, the sorted list gets larger and the unsorted list gets smaller until the whole list is sorted.

All these algorithms are O(n2), far short of the theoretical limit for comparison searches, which is O(n log n). Heap sort is an algorithm that approaches this limit. The key to its success is the underlying data structure of a heap, which is a complete binary tree, that is, a binary tree where each node has 0 or 2 children, except possibly nodes at the next to highest level. The nodes themselves contain the data to be sorted and satisfy the heap property, which states that for every node other than the root, the value of that node is larger than or equal to that of its parents. Note that this forces the root to contain the largest record. One can put the nodes of the heap into one-to-one correspondence with positions in an array simply by numbering the nodes in order level by level from top to bottom and from left to right within each level. Under this scheme the root (at the very top) is the first element in the array, and the rightmost leaf at the lowest level is the last element of the array. Thus the heap and the array of data to be sorted can be implemented in exactly the same locations in memory and no extra memory is needed. In fact one should think of the heap and the array as one and the same object; the heap is just another way of looking at the array.

We will now see how to use this heap data structure to order the array. We first assume that we can somehow impose a heap structure on the array by rearranging the array elements so that the heap condition is satisfied. Then the largest record will be at the root of the heap or, equivalently, in the first position in the array. If we switch the first record with the last record, the largest record will be in the last position. Now if we ignore the last position, which is equivalent to ignoring the rightmost leaf of the lowest level of the heap, we are left with a smaller heap except for one problem: the root node does not satisfy the heap property since it cannot be larger than its children. A procedure called heapification fixes this problem. Heapification merely amounts to letting the root node contents bubble down the tree by successively exchanging them with the contents of its immediate children until the heap property is satisfied. Then we are left with the original situation where the root note contains the largest element. We again exchange it with the record in position n - 1, ignore the last two positions, and heapify to get a smaller heap. After repeating this n times we get a fully sorted array. The process of heapification is O(log n) since log n is the height of the binary tree, or the longest distance the root contents can bubble down in order to satisfy the heap property. Thus the whole algorithm is O(n log n), provided we can build the initial heap quickly. It turns out that the initial heap can be built in O(n) time by iteratively using the heapify algorithm from the bottom up. Heap sort thus achieves the theoretical limit.

Whereas heap sort achieves its O(n log n) efficiency by making use of a binary tree structure, another algorithm, quicksort, achieves the same efficiency on average by using a recursive divide-and-conquer strategy. The idea behind quicksort is that the given array is first partitioned into two nonempty subarrays such that each element of the first subarray is less than or equal to each element of the second. The two subarrays are then themselves sorted by recursive calls to quicksort. It is clear that this recursive process will completely sort the array. The partitioning itself is done simply by choosing the first element of the array to be partitioned as an element around which to pivot. Any element less than or equal to this first element is inserted directly below it. This achieves the partitioning. Although in the worst case scenario, this strategy can yield an O(n2) algorithm depending on how unbalanced or unequal in size the resulting partitions are, on average the partitions are usually quite balanced and the average behavior is O(n log n).

This is the complete article, containing 1,412 words (approx. 5 pages at 300 words per page).

Ask any question on Sorting algorithm and get it answered FAST!
Answer questions in BookRags Q&A and earn points toward
discounted or even FREE Study Guides and other BookRags products!
Learn more about BookRags Q&A
Copyrights
Sorting Algorithms from World of Computer Science. ©2005-2006 Thomson Gale, a part of the Thomson Corporation. All rights reserved.

Join BookRagslearn moreJoin BookRags

Join BookRagslearn moreJoin BookRags