Saturday, February 5, 2022

Stack Abuse: Python - How to Sort List with sort() and sorted()

In this short guide, learn how to sort a list in Python using the built-in sort() and sorted() functions.

  • sort() is a method of the list class, and sorts the list in-place, returning None.
  • sorted() is a method built into the Python namespace, and sorts the list out-of-place, returning a sorted copy of the list, without affecting the original one.

Generally speaking - sort() is more efficient on larger datasets, while sorted() is more convenient, because it returns a copy of the list and leaves the original one intact.

Note: Both methods, by default, use the logical less-than (<) operator for element comparison, and thus sort in ascending order. You can override the default comparison function and by extension, both the logic of comparison and sorting order.

In the proceeding sections, we'll take a look at how to sort lists of various data types, including custom comparison functions, descending order, and perform a performance benchmark on the two.

Sort List with sort() and sorted()

The sort() function is called on a list we'd like to sort, and sorts it in-place, returning None:

# my_list is sorted in-place - the original list is changed
my_list.sort()

It sorts in ascending order by default. To sort in descending order, you can supply the reverse=True argument to the function:

my_list.sort(reverse=True)

The sorted() function works in much the same way as the sort() function does - and also accepts the same arguments. However, sorted() creates a copy of the list we supply, sorts the copy, and returns it - leaving the original intact:

# Sorts copy of `my_list` and returns it
sorted_list = sorted(my_list)

The way comparisons are done depends on the data type of the elements of the list. Strings are compared differently than integers, which are in turn, compared differently to custom objects, for instance.

Sort List of Strings

Strings are sorted lexicographically, when compared with the > operator:

string_list = ['I', 'see', 'skies', 'of', 'blue', 'red', 'roses', 'too']

string_list.sort()
print(string_list)
# ['I', 'blue', 'of', 'red', 'roses', 'see', 'skies', 'too']

The same logic is applied to the sorted() function:

sorted_list = sorted(string_list)
print(sorted_list)
# ['I', 'blue', 'of', 'red', 'roses', 'see', 'skies', 'too']

I has a lesser lexicographical value than blue, even though b should be before i in the dictionary, because capital letters always have a lesser lexicographic value than lowercase letters. Other than the capital letters - the rest of the strings are sorted in ascending dictionary order!

Sort List of Integers

Integers are more straightforward in their comparison with the > operator:

int_list = [1, 7, 2, 3, 8, 5, 6]
int_list.sort()

print(int_list)
# [1, 2, 3, 5, 6, 7, 8]

Or, with sorted():

sorted_list = sorted(int_list)
print(sorted_list)
# [1, 2, 3, 5, 6, 7, 8]

Sort List of Tuples

Tuples are sorted by key, not value. For instance, say you had a leaderboard of preferred programming languages, stored in a tuple of (language, rank) format - you might want to sort them in order of rank:

tuple_list = [('Java', 2), ('Python', 1), ('JavaScript', 3)]
tuple_list.sort()

print(tuple_list)
# [('Java', 2), ('JavaScript', 3), ('Python', 1)]

Or, to sort a list of tuples with sorted():

sorted_tuples = sorted(tuple_list)
print(sorted_tuples)
# [('Java', 2), ('JavaScript', 3), ('Python', 1)]

Since tuples are sorted by key, this list of tuples is sorted lexicographically, by the strings used as the keys.

Sort List of Tuples with Custom Key

To alter the item based on which tuples are sorted, without changing the tuples themselves - you can specify any item in a tuple instead as the key argument. Typically, it's easiest to map the key to another item in the list of tuples, through a lambda function:

tuple_list = [('Java', 2), ('Python', 1), ('JavaScript', 3)]
tuple_list.sort(key=lambda x:x[1])
print(tuple_list)
# [('Python', 1), ('Java', 2), ('JavaScript', 3)]

Or, with sorted():

sorted_tuples = sorted(tuple_list, key=lambda x:x[1])
print(sorted_tuples)
# [('Python', 1), ('Java', 2), ('JavaScript', 3)]

Here, we've mapped the key by which to sort, to the second item (indexing is 0-based) of the tuple, thus, sorting by the second item (integer).

If you'd like to read more about Lambda Functions - read our Guide to Lambda Functions in Python!

Note: The key doesn't correspond to the first value of the tuple, which is oftentimes referred to as a "key" as in, a "key-value" pair. The key refers to the key by which the sort() method sorts elements.

This stands true for any number of tuple elements:

tuple_list = [('Java', 2, 'General purpose'), ('Python', 1, 'General purpose'), ('JavaScript', 3, 'Web-oriented')]
tuple_list.sort(key=lambda x:x[1])

print(tuple_list)
# [('Python', 1, 'General purpose'), ('Java', 2, 'General purpose'), ('JavaScript', 3, 'Web-oriented')]

Or, with sorted():

sorted_tuples = sorted(tuple_list, key=lambda x:x[1])
print(sorted_tuples)
# [('Python', 1, 'General purpose'), ('Java', 2, 'General purpose'), ('JavaScript', 3, 'Web-oriented')]

Sort List with Custom Comparator

Ultimately, you might want to supply a custom comparator to the key argument of either sort() or sorted()! A comparator is simply a function that returns a comparable return type. For instance, you can sort by length, by passing in the len() function:

string_list = ['I', 'see', 'skies', 'of', 'blue', 'red', 'roses', 'too']
string_list.sort(key=len)

print(string_list)
# ['I', 'of', 'see', 'red', 'too', 'blue', 'skies', 'roses']

Or, with sorted():

sorted_list = sorted(string_list, key=len)
print(sorted_list)
# ['I', 'of', 'see', 'red', 'too', 'blue', 'skies', 'roses']

Similarly enough, you can sort by any custom function:

def custom_comparator(element):
    return element[-1]

string_list = ['I', 'see', 'skies', 'of', 'blue', 'red', 'roses', 'too']
string_list.sort(key=custom_comparator)

print(string_list)
# ['I', 'red', 'see', 'blue', 'of', 'too', 'skies', 'roses']

Or, with sorted():

sorted_list = sorted(string_list, key=custom_comparator)

print(sorted_list)
# ['I', 'red', 'see', 'blue', 'of', 'too', 'skies', 'roses']

Here, we've simply returned the last character of a string, via the slice notation, and sorted by that returned character. If you pay attention to the last character of each word (excluding the capital letter) - they're sorted in lexicographical order.

Benchmarking sort() vs sorted()

As stated earlier - sorted() is slightly less efficient than sort(), mainly because it creates a copy and sorts that copy, rather than changing the original collection. Though, just how much is "slightly less" efficient?

This depends on various factors - such as your hardware, and the specifics of that hardware, but you can run a very simple test to check which one works better for you, based on multiple input sizes.

Let's sort lists of 10, 100 and 1000 elements respectively, and time the runtimes of these functions using timeit. To make sure that the test is fair, we want to ensure that:

  • The lists of elements are generated before calling timeit() so the generation logic doesn't account for the benchmark time
  • The methods are run on the exact same input

Since sort() changes the lists in-place, we'll run sorted() first, and then benchmark how long it takes sort() to do those same lists:

import timeit
import random

def generate_random_strings(num):
    result = []
    for i in range(num):
        s = ''.join(random.choice([chr(i) for i in range(ord('a'),ord('z'))]) for _ in range(5))
        result.append(s)
    return result

ten = generate_random_strings(10)
hundred = generate_random_strings(100)
thousand = generate_random_strings(1000)

# For eval() statements where input is translated to list names
mapping = {
    10:'ten',
    100:'hundred',
    1000:'thousand'
}

# Based on input, evaluate the expression to sort adequate list
def run_sorted(num):
    eval(f'sort({mapping[num]})')

# Based on input, evaluate the expression to sort adequate list
def run_sorted(num):
    eval(f'sorted({mapping[num]})')

for index, num_samples in enumerate([10, 100, 1000]):
    result = timeit.timeit(f"run_sorted({num_samples})", number=100000, globals=globals())
    print(f'sorted() on {num_samples} took {result} seconds')

print('____________________________________________________')    
  
for index, num_samples in enumerate([10, 100, 1000]):
    result = timeit.timeit(f"run_sort({num_samples})", number=100000, globals=globals())
    print(f'sort() on {num_samples} took {result} seconds')

This piece of code compares the time it takes to run 100k iterations of each of the run_sort() and run_sorted() methods, on the same lists of 10, 100, 1000, and 1000000 elements produced by the generate_random_strings() method, and results in:

sorted() on 10 took 0.5450385000003735 seconds
sorted() on 100 took 0.9972869999996874 seconds
sorted() on 1000 took 10.934083999999984 seconds
____________________________________________________
sort() on 10 took 0.4839348999998947 seconds
sort() on 100 took 0.5398832000000766 seconds
sort() on 1000 took 1.3094285000001946 seconds

For 10 elements, the time is practically the same - ~0.5s. However, as soon as with 100 elements, sort() takes half the time to sort the same list. Finally, at 1000 elements, sorted() takes nearly ten times as much computation time as sort() does.

The larger the dataset you're working with - the more benefits you'll be reaping with using sort() instead of `sorted(), if you don't need an out-of-place sort.

Conclusion

In this short guide - we've taken a look at how to sort a list in Python with the help of sort() and sorted().

We've then explored sorting in descending order, instead of ascending, as well as setting a different sorting key, including writing a custom sorting method.

Finally, we've benchmarked the two methods and explored how they perform with an increasing input size.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...