Tuesday, June 15, 2021

Stack Abuse: Counting Sort in Python

Introduction

Counting sort is a sorting algorithm used to sort elements of an array in linear time. We usually use Counting Sort to sort integer arrays.

Counting Sort a stable, non-comparative algorithm.

Non-comparative sorting algorithms perform sorting without any comparison between elements to be sorted.

Stable sorting algorithms preserve the relative order of elements with the same value in the sorted array. That means that the relative order of two same-value elements in the original array will be the same as their relative order in the sorted array.

Stable sorting

Counting sort is not an in-place algorithm, it uses an auxiliary array to sort elements of an input array.

How Does Counting Sort Work?

Let's first take an intuitive look at how the algorithm works.

Assume that we have the array I = [2, 2, 0, 6, 1, 9, 9, 7] and we want to sort it using. We'll call the array I the input array.

Counting Sort works by counting the number of elements, that fit a distinct key value, and then calculates the positions of each key.

First of all, we need to find the element with the highest value, we'll call it the maximum element - maxElement = 9.

Then, we'll create an auxiliary array with maxElement+1 elements, called the count array (C). We'll use it to store the number of occurrences of each individual element in the input array I. Therefore, all counts should be initialized to 0:

           C = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] # Count array     
 # indices: 0  1  2  3  4  5  6  7  8  9

Now, we need to go through the following steps:

1. Go over each element of the input array and increase its corresponding count by 1

For example, if we come across an element with the value of 2 in the input array (I), we add 1 to the element with the index 2 in the count array:

    I = [2, 2, 0, 6, 1, 9, 9, 7] # The first element is 2
         ^
        
    C = [0, 0, 1, 0, 0, 0, 0, 0, 0, 0] # We increase count of 2nd element by 1
#indices: 0  1  2  3  4  5  6  7  8  9

After this step, the count array will store the number of occurrences of each element in the input array:

     C = [1, 1, 2, 0, 0, 0, 1, 1, 0, 2] 
#indices: 0  1  2  3  4  5  6  7  8  9
   
# Element 0 has 1 occurrence
# Element 1 has 1 occurrence
# Element 2 has 2 occurrences 
# Element 3 has no occurrences...

2. For each element in the count array, sum up its value with the value of all its previous elements, and store that value as the value of the current element:

     C = [1, 2, 4, 4, 4, 4, 5, 6, 6, 8] 
#indices: 0  1  2  3  4  5  6  7  8  9
# Element  0 = 1
# Element  1 = 1 + 1
# Element  2 = 1 + 1 + 2
# Element  3 = 1 + 1 + 2 + 0
#...

This way, we are storing the cumulative sum of the elements of the count array, on each step.

3. Calculate element position based on the count array values:

To store this sorted sequence, we'll need to create a new array. Let's call it the output array (O), and initialize it with k zeros, where k is the number of elements in the input array:

     O = [0, 0, 0, 0, 0, 0, 0, 0] // Initialized output array
#indices: 0  1  2  3  4  5  6  7 

For each element I[i] (starting from the end) in the input array:

  1. Find the index in the count array that is equal to the value of the current element I[i]
    • That's the element C[j] where j=I[i]
  2. Subtract 1 from the value of the C[i]
    • Now we have newValue = C[i]-1
  3. Store the I[i] to the O[newValue]
  4. Update the C[i] with the newValue

Counting sort

In the end, the output array contains the sorted elements of the input array!

Counting Sort - Python Implementation

Now, with all that out of the way - let's go ahead and implement Counting Sort in Python:

def countingSort(inputArray):
    # Find the maximum element in the inputArray
    maxElement= max(inputArray)

    countArrayLength = maxElement+1

    # Initialize the countArray with (max+1) zeros
    countArray = [0] * countArrayLength

    # Step 1 -> Traverse the inputArray and increase 
    # the corresponding count for every element by 1
    for el in inputArray: 
        countArray[el] += 1

    # Step 2 -> For each element in the countArray, 
    # sum up its value with the value of the previous 
    # element, and then store that value 
    # as the value of the current element
    for i in range(1, countArrayLength):
        countArray[i] += countArray[i-1] 

    # Step 3 -> Calculate element position
    # based on the countArray values
    outputArray = [0] * len(inputArray)
    i = len(inputArray) - 1
    while i >= 0:
        currentEl = inputArray[i]
        countArray[currentEl] -= 1
        newPosition = countArray[currentEl]
        outputArray[newPosition] = currentEl
        i -= 1

    return outputArray

inputArray = [2,2,0,6,1,9,9,7]
print("Input array = ", inputArray)

sortedArray = countingSort(inputArray)
print("Counting sort result = ", sortedArray)

Running the code above will produce the following output:

Input array =  [2, 2, 0, 6, 1, 9, 9, 7]
Counting sort result =  [0, 1, 2, 2, 6, 7, 9, 9]

The Complexity of the Counting Sort Algorithm

The Counting sort algorithm uses only simple for and while loops without any complex recursions and subroutine calls, therefore, its complexity analysis is a pretty straightforward process.

Before we dive into the complexity analysis, let's label the length of the input array as n and the value of the maximum element in the input array as k.

Time Complexity

The first step of the algorithm iterates over the input array n times in order to initialize the count array, so it has the complexity of O(n).

The second step iterates over the count times k times in order to calculate the cumulative sum of each element, so it has the complexity of O(k).

The third step performs the sorting based on the counting array, so it has to iterate in a while loop n times, therefore it has the complexity of O(n).

Collectively, the time complexity of the Counting Sort algorithm is O(n+k).

Space Complexity

Counting sort uses input and output array, both of length n and one count array of length (k+1).

Therefore, the total space that this algorithm uses is O(n+k).

Conclusion

All in all, Counting Sort is a great and efficient, yet simple sorting algorithm. In ideal circumstances, it is really easy to understand and learn but still manages to maintain linear complexity.

The real problem occurs when the value of the largest element k exceeds the number of elements in the input array n. As the k approaches , the time complexity of counting sort gets closer to O(n²), which is a horrible time complexity for a sorting algorithm. Therefore, it's not recommended to use counting sort if the input array has a large range of values.

Ideally, we will use Counting Sort to sort some integer arrays with a small range of values or as a subroutine for some other soring algorithm, such as Radix Sort. That way, we will ensure maximizing the full potential of the counting sort, while still avoiding all of the suboptimal use cases.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...