Sorting

Introduction

There are many algorithms for sorting. Some are faster than others. We can analyze various algorithms to determine their complexity--that is, how fast they run as a function of the size of the input. A bubble sort runs "on the order of n2" where n is the number of values being sorted. This is denoted using "big oh" notation [see Section 11.3 of the text]

O(n2)

and means that, given n values, an algorithm's execution requires at most some constant times n2 operations. These operations are typically comparisons ( ==, !=, <, <=, >, or >=), arithmetic operations (+, -, *, /, %), or assignments (=). So, a bubble sort of 100 numbers will take some constant times 1002 operations to perform its work. [We usually don't worry about the constant and just talk about the 10,000 operations since the constant is usually a lot smaller than n or n2.

An algorithm having complexity

O(n)

takes only on the order of n operations (multiplied y some constant, which we'll ignore) and a sorting algorithm having that complexity would be much better to use than a bubble sort. Unfortunately, the best sorting algorithm has complexity

O(n * log2n)

[log2n is a value x such that 2x = n. For example, log28 = 3 since 23 = 8, log21024 = 10 since 210 = 1024, and  log2100 = 6.643856 since 26.643856 = 100 [well, pretty close!]. So, the best sorting algorithm for 100 values requires on the order of 100 * 6.643856 = 664 operations. That is a significant improvement over 10,000! Now, imagine the improvement when sorting 1,000 or 10,000 values--that is n=1,000 or n=10,000.

In this lab, you will look at six sorting algorithms having various complexities (and constants). In all cases, assume we are sorting an array of n elements--that is, a prototype for such a function is:

void sort(int a[], int n);

References to a and n in the following sections are to the array being sorted and to its length, respectively.

Bubble Sort

In a bubble sort, successive pairs of elements are compared and, if necessary swapped. That is, for all values of i between 0 and n-2, compare a[i] and a[i+1]. If a[i] > a[i+1] then swap them in a. Repeat this process of passing over the n array elements a total of n times--hence, O(n2).

This algorithm can be improved by noticing that after one pass over the array, the largest element must have "bubbled" down to its correct position in a[n-1] (the last element in a). Thus, on the second pass over a, only the elements between index values 0 and n-2 need be processed. On the next pass,only the elements between index values 0 and n-3 need be processed. An so on.

This improvement seems to really speed up sorting, but the algorithm is still O(n2). Why? Because the first pass processes n elements, the next (n-1) elements, the next (n-2) elements and so on down to 1 element. The total is thus

n + (n-1) + (n-2) + ... + 2 + 1 = (n+1)(n) / 2 = (n2 + n)/2

[Aha, that theorem proved in MATH 261 proves useful after all!] The constant 1/2 can be ignored and the "+ n " part doesn't add much compared to n2 when n gets pretty big.

The algorithm can be improved even further by keeping track of whether a swap was needed on any particular pass. As soon as a pass is made for which no swaps were necessary, then the array is sorted and no more processing is required.

Bi-Directional Bubble Sort

The bubble sort algorithm can be modified to work in two directions. On one pass, work from "top" to "bottom" to move the largest value to the "bottom" of the array. On the next pass work from "bottom" to "top" to move the smallest value to the top. Alternate forward and backward passes until the whole array is sorted.

Insertion Sort

The insertion sort algorithm is described in Section 11.2 [pages 428-430]. Insertion sort follows the way that people typically put playing cards in their hands when the cards are dealt:

pick up the first card and put it in your hand;
while ( cards are being dealt ) {
    pick up the next card and, starting from the left, find the position at
        which the card belongs and insert it there;
}

Selection Sort

The selection sort algorithm is described in Section 10.4 of the textbook. [pp. 384-386]. The algorithm is:

for each index position i in a:
    find the index j in the range i .. n-1 that contains the smallest data value;
    swap the values at positions i and j;

Note that i and j could sometimes be the same position. When that happens, swapping the value still works, so that special case is hardly ever tested for in the code.

Mergesort

The mergesort algorithm is based on a problem-solving technique known as divide and conquer--to take a problem, break it up into smaller problems that are easier to solve, and then synthesize solutions to the simpler problems into a solution to the big problem. Recursion usually plays a central role in divide-and-conquer techniques. [Mergesort is described in Section 18.4.]

Mergesort for an array a of n elements takes the following approach:

if ( n <= 3 ) {
    sort a using another algorithm;
}
else {
    use mergesort [recursively] to sort the "top" half of a;
    use mergesort [recursively] to sort the "bottom" half of a;
    merge the sorted top and bottom halves of a into a sorted sequence;
}

The merge operation is commonly used in data processing and is straightforward. [We discussed it in connection with lists of accounts.]

Quicksort

People familiar with sorting algorithms will almost undoubtedly list quicksort as the fastest sorting algorithm [see Section 18.2]. It has a complexity of O(n * log2n).

Quicksort works by first picking one of the values in the array, called the pivot, (picked pretty much at random) and then finding the location in the array at which the value belongs once the sort has completed--that is, position the pivot in the array such that all the values "above" it are less than the pivot value and such that all the values "below' it are greater than the pivot value. Then use quicksort [recursively] to sort the part of the array "above" the pivot in its found location and use quicksort [recursively] to sort the part of the array "below" the pivot.

Exercise

Work alone for this lab.
  1. Create a subdirectory and copy the following programs into that directory:
  2. Run the various sort programs you find there to see how they work. Try to put the algorithms in order according to the time needed to sort--that is, which is fastest? Next fastest? And so on. Make a list and send it via e-mail to csci208@cs.winthrop.edu.

Last updated Wednesday, July 04, 2001 12:25 PM by Will Thacker