Wednesday, December 23, 2020

Python Pool: TimSort: Algorithm and Implementation in Python

Hello coders!! In this article, we will learn about the TimSort algorithm and learn its implementation in Python. Tim Peters created TimSort in the year 2002 to improve the sorting performance of the list. sort() function makes use of this algorithm and is the fastest sorting algorithm.

What is Python TimSort Algorithm?

TimSort is a sorting algorithm that is a hybrid of merge sort and insertion sort algorithm. It is a stable algorithm and works on real-time data. In this, the list that needs to be sorted is first analyzed, and based on that the best approach is selected.

Python TimSort Algorithm:

  1. Divide the array into blocks known as run
  2. The size of a run can either be 32 or 64
  3. Sort the elements of every run using insertion sort
  4. Merge the sorted runs using the merge sort algorithm
  5. Double the size of the merged array after every iteration

Understanding the algorithm in detail:

Calculation of minrun:

We know now that the first step of the Python Timsort algorithm is to divide the blocks into runs. minrun is the minimum length of each run. It is calculated from the number of elements of the array N.

  • It should not be very long as we will implement insertion sort to sort each run and we know that insertion sort works more efficiently for shorter arrays
  • However, it should also not be too short as the next step is to merge these runs, shorter runs will result in more number of runs
  • It is beneficial if N/minrun is a power of 2 as it will result in the best performance by merge sort.

Splitting and sorting of runs:

When we reach this step, we already have two values – N and minrun

  • descending flag is by the comparison between the first 2 items, if there is only 1 item left, then it is set as false.
  • Then the other elements are iterated and checked whether they are still in ascending or “strict” descending order until an item that does not follow this order is found.
  • After this, we will have a run in either ascending or “strict” descending order. If it is in a “strict” descending order we need to reverse it.
  • Then we check to make sure that the length of the run satisfies minrun. If it is smaller, we compensate for the following items to make it achieve the minrun size.

merging of runs:

  • At first, we will create a temporary run having the size of the smallest of the merged runs.
  • After this, we need to copy the shortest run into the temporary one.
  • Now, we will mark the first element of the large and temporary array is as the current position.
  • In each following step, we will compare the elements in the large and temporary arrays, and the smaller ones will be moved into a new sorted array.
  • We need to repeat the above step until one of the arrays runs out.
  • Lastly, we will add All the remaining elements to the end of the new one.

Implementation of Tim’s Algorithm in Python:

MINIMUM= 32

def find_minrun(n): 

        r = 0
        while n >= MINIMUM: 
                r |= n & 1
                n >>= 1
        return n + r 

def insertion_sort(array, left, right): 
    for i in range(left+1,right+1):
        element = array[i]
        j = i-1
        while element<array[j] and j>=left :
            array[j+1] = array[j]
            j -= 1
        array[j+1] = element
    return array
                        
def merge(array, l, m, r): 

        array_array_length1= m - l + 1
        array_array_length2 = r - m 
        left = []
        right = []
        for i in range(0, array_length1): 
                left.append(array[l + i]) 
        for i in range(0, array_length2): 
                right.append(array[m + 1 + i]) 

        i=0
        j=0
        k=l
 
        while j < array_length2 and  i < array_length1: 
                if left[i] <= right[j]: 
                        array[k] = left[i] 
                        i += 1

                else: 
                        array[k] = right[j] 
                        j += 1

                k += 1

        while i < array_length1: 
                array[k] = left[i] 
                k += 1
                i += 1

        while j < array_length2: 
                array[k] = right[j] 
                k += 1
                j += 1

def tim_sort(array): 
        n = len(array) 
        minrun = find_minrun(n) 

        for start in range(0, n, minrun): 
                end = min(start + minrun - 1, n - 1) 
                insertion_sort(array, start, end) 
 
        size = minrun 
        while size < n: 

                for left in range(0, n, 2 * size): 

                        mid = min(n - 1, left + size - 1) 
                        right = min((left + 2 * size - 1), (n - 1)) 
                        merge(array, left, mid, right) 

                size = 2 * size 




array = [-1,5,0,-3,11,9,-2,7,0] 

print("Array:") 
print(array) 

tim_sort(array) 

print("Sorted Array:") 
print(array)

Output:

Output for timsort algorithm obtained by executing the given codeOUTPUT

Complexity Analysis of Python TimSort Algorithm:

  • Worst-case time complexity: O(n log n)
  • Best-case time complexity: O(n)
  • Average-case time complexity: O(n log n)
  • Worst-case space complexity: O(n)

Advantages of Python Timsort:

  • It is a stable sorting algorithm
  • Works for real-time data

Must Read

Conclusion:

That is all about the Python TimSort algorithm. The in-built sort() function of Python uses this algorithm as it provides a stable sorting approach. It is faster than other sorting algorithms and can be directly implemented in python using the sort() method.

The post TimSort: Algorithm and Implementation in Python appeared first on Python Pool.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...