Friday, February 14, 2020

Stack Abuse: Selection Sort in Python

Introduction

Sorting, although a basic operation, is one of the most important operations a computer should perform. It is a building block in many other algorithms and procedures, such as searching and merging. Knowing different sorting algorithms could help you better understand the ideas behind the different algorithms, as well as help you come up with better algorithms.

The Selection Sort algorithm sorts an array by finding the minimum value of the unsorted part and then swapping it with the first unsorted element. It is an in-place algorithm, meaning you won't need to allocate additional lists. While slow, it is still used as the main sorting algorithm in systems where memory is limited.

In this article, we will explain how the Selection Sort works and implement it in Python. We will then break down the actions of the algorithm to learn its time complexity.

Selection Sort

So how does the selection sort work? Selection sort breaks the input list in two parts, the sorted part which initially is empty, and the unsorted part, which initially contains the list of all elements. The algorithm then selects the minimum value of all the unsorted file and swaps it with the first unsorted value, and then increases the sorted part by one.

A high level implementation of this sort would look something like this:

def selection_sort(L):
    for i in range(len(L) - 1):
        min_index = argmin(L[i:])
        L[i], L[min_index] = L[min_index], L[i]

In the above pseudocode, argmin() is a function that returns the index of the minimum value. The algorithm uses a variable i to keep track of where the sorted list ends and where the unsorted one begins. Since we start with no sorted items and take the minimum value, it will always be the case that every member of the unsorted part is greater than any member of the sorted part.

The first line increments the value of i, the second line finds the minimum value's index, and the third line swaps those values. Swapping works because Python calculated the right-hand side before assigning anything to the left-hand side, so we don't need any temporary variables.

Let's see how it works in action with a list that contains the following elements: [3, 5, 1, 2, 4].

We begin with the unsorted list:

  • 3 5 1 2 4

The unsorted section has all the elements. We look through each item and determine that 1 is the smallest element. So, we swap 1 with 3:

  • 1 5 3 2 4

Of the remaining unsorted elements, [5, 3, 2, 4], 2 is the lowest number. We now swap 2 with 5:

  • 1 2 3 5 4

This process continues until the list is sorted:

  • 1 2 3 5 4
  • 1 2 3 4 5
  • 1 2 3 4 5

Let's see how we can implement this in Python!

Implementation

The trick to implementing this algorithm is keeping track of the minimum value and swapping two elements of the list. Open a file named sort.py in your favorite editor and enter the following code in it:

def selection_sort(L):
    # i indicates how many items were sorted
    for i in range(len(L)-1):
        # To find the minimum value of the unsorted segment
        # We first assume that the first element is the lowest
        min_index = i
        # We then use j to loop through the remaining elements
        for j in range(i+1, len(L)-1):
            # Update the min_index if the element at j is lower than it
            if L[j] < L[min_index]:
                min_index = j
        # After finding the lowest item of the unsorted regions, swap with the first unsorted item
        L[i], L[min_index] = L[min_index], L[i]

Now let's add some code to the file to test the algorithm:

L = [3, 1, 41, 59, 26, 53, 59]
print(L)
selection_sort(L)

# Let's see the list after we run the Selection Sort
print(L)

You can then open a terminal and run to see the results:

$ python sort.py
[3, 1, 41, 59, 26, 53, 59]
[1, 3, 26, 41, 53, 59, 59]

The list was correctly sorted! We know how it works and we can implement the Selection Sort in Python. Let's get into some theory and look at its performance with regards to time.

Time Complexity Calculation

So how long does it take for selection sort to sort our list? We are going to take an approach and calculate exactly how much time the selection sort algorithm takes, given an array of size n. The first line of the code is:

def selection_sort(L):

This line shouldn't take that much since it's only setting the function stack. We say that this is a constant - the size of our input does not change how long it takes for this code to run. Let's say it takes c1 operations to perform this line of code. Next, we have:

for i in range(len(L)-1):

This one is a little trickier. First of all, we have two function invocations, len() and range(), which are performed before the for loop begins. The cost of len() is also independent of size in CPython, which is the default Python implementation on Windows, Linux, and Mac. This is also true for the initialization of range(). Let's call these two together c2.

Next, we have the for, which is running n - 1 times. This is not a constant, the size of the input does make an impact on how long this is executed. So we have to multiply whatever time it takes for one loop to complete by n - 1.

There is a constant cost for evaluating the in operator, let's say c3. That covers the outer for-loop.

The variable assignment is also done in constant time. We'll call this one c4:

min_index = i

We now encounter the inner for-loop. It has two constant function invocations. Let's say they take c5 operations.

Note that c5 is different from c2, because range here has two arguments, and there is an addition operation being performed here.

So far we have c1 + c2 + (n - 1) * (c3 + c4 + c5) operations, and then our inner loop begins, multiplying everything by...? Well, it's tricky, but if you look closely, it takes n - 2 times in the first loop, n - 3 in the second one, and 1 in the last time.

We need to multiply everything by the sum of all numbers between 1 and n - 2. Mathematicians have told us that the sum would be (n - 2) * (n - 1) / 2. Feel free to read more about the sum of integers between 1 and any positive number x here.

The contents of the inner loop are completed in constant time as well. Let's assign the time it takes Python to do the in, if, assignment statement and the variable swap take up an arbitrary constant time of c6.

for j in range(i+1, len(L)-1):
    if L[j] < L[min_index]:
        min_index = j
L[i], L[min_index] = L[min_index], L[i]

All-together we get c1 + c2 + (n - 1) * (c3 + c4 + c5) + (n - 2) * (n - 3) * c6 / 2.

We can simplify this to: a * n * n + b * n + c, where a, b and c representing the values of the evaluated constants.

This is known as O(n2). What does that mean? In summary, our algorithm's performance is based on the squared size of our input list. Therefore, if we double the size of our list, the time it takes to sort it would be multiplied by 4! If we divide the size of our input by 3, the time would shrink by 9!

Conclusion

In this article, we looked at how Selection Sort works and implemented it in Python. We then broke the code down line by line to analyze the algorithm's time complexity.

Learning sorting algorithms will help you get a better understanding of algorithms in general. So, in case you haven't already, you can check out our sorting algorithms overview!



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...