Wednesday, January 2, 2019

Stack Abuse: Introduction to Python's Collections Module

Introduction

Collections in Python are containers that are used to store collections of data, for example, list, dict, set, tuple etc. These are built-in collections. Several modules have been developed that provide additional data structures to store collections of data. One such module is the Python collections module.

Python collections module was introduced to improve the functionalities of the built-in collection containers. Python collections module was first introduced in its 2.4 release. This tutorial is based on its latest stable release (3.7 version).

Collections Module

In this tutorial we will discuss 6 of the most commonly used data structures from the Python collections module. They are as follows:

The Counter

Counter is a subclass of dictionary object. The Counter() function in collections module takes an iterable or a mapping as the argument and returns a Dictionary. In this dictionary, a key is an element in the iterable or the mapping and value is the number of times that element exists in the iterable or the mapping.

You have to import the Counter class before you can create a counter instance.

from collections import Counter  

Create Counter Objects

There are multiple ways to create counter objects. The simplest way is to use Counter() function without any arguments.

cnt = Counter()  

You can pass an iterable (list) to Counter() function to create a counter object.

list = [1,2,3,4,1,2,6,7,3,8,1]  
Counter(list)  

Finally, the Counter() function can take a dictionary as an argument. In this dictionary, the value of a key should be the 'count' of that key.

Counter({1:3,2:4})  

You can access any counter item with its key as shown below:

list = [1,2,3,4,1,2,6,7,3,8,1]  
cnt = Counter(list)  
print(cnt[1])  

when you print cnt[1], you will get the count of 1.

Output:

3  

In the above examples, cnt is an object of Counter class which is a subclass of dict. So it has all the methods of dict class.

Apart from that, Counter has three additional functions:

  1. Elements
  2. Most_common([n])
  3. Subtract([interable-or-mapping])
The element() Function

You can get the items of a Counter object with elements() function. It returns a list containing all the elements in the Counter object.

Look at the following example:

cnt = Counter({1:3,2:4})  
print(list(cnt.elements()))  

Output:

[1, 1, 1, 2, 2, 2, 2]

Here, we create a Counter object with a dictionary as an argument. In this Counter object, count of 1 is 3 and count of 2 is 4. The elements() function is called using cnt object which returns an iterator which is passed as an argument to the list.

The iterator repeats 3 times over 1 returning three '1's, and repeats four times over 2 returning four '2's to the list. Finally, the list is printed using the print function.

The most_common() Function

The Counter() function returns a dictionary which is unordered. You can sort it according to the number of counts in each element using most_common() function of the Counter object.

list = [1,2,3,4,1,2,6,7,3,8,1]  
cnt = Counter(list)  
print(cnt.most_common())  

Output:

[(1, 3), (2, 2), (3, 2), (4, 1), (6, 1), (7, 1), (8, 1)]

You can see that most_common function returns a list, which is sorted based on the count of the elements. 1 has a count of three, therefore it is the first element of the list.

The subtract() Function

The subtract() takes iterable (list) or a mapping (dictionary) as an argument and deducts elements count using that argument. Check the following example:

cnt = Counter({1:3,2:4})  
deduct = {1:1, 2:2}  
cnt.subtract(deduct)  
print(cnt)  

Output:

Counter({1: 2, 2: 2})  

You can notice that cnt object we first created, has a count of 3 for '1' and count of 4 for '2'. The deduct dictionary has the value '1' for key '1' and value '2' for key '2'. The subtract() function deducted 1 count from key '1' and 2 counts from key '2'.

The defaultdict

The defaultdict works exactly like a python dictionary, except for it does not throw KeyError when you try to access a non-existent key.

Instead, it initializes the key with the element of the data type that you pass as an argument at the creation of defaultdict. The data type is called default_factory.

Import defaultdict

First, you have to import defaultdict from collections module before using it:

from collections import defaultdict  

Create a defaultdict

You can create a defaultdict with the defaultdict() constructor. You have to specify a data type as an argument. Check the following code:

nums = defaultdict(int)  
nums['one'] = 1  
nums['two'] = 2  
print(nums['three'])  

Output:

0  

In this example, int is passed as the default_factory. Notice that you only pass int, not int(). Next, the values are defined for the two keys, namely, 'one' and 'two', but in the next line we try to access a key that has not been defined yet.

In a normal dictionary, this will force a KeyError. But defaultdict initialize the new key with default_factory's default value which is 0 for int. Hence, when the program is executed, and 0 will be printed. This particular feature of initializing non-existent keys can be exploited in various situations.

For example, let's say you want the count of each name in a list of names given as "Mike, John, Mike, Anna, Mike, John, John, Mike, Mike, Britney, Smith, Anna, Smith".

from collections import defaultdict

count = defaultdict(int)  
names_list = "Mike John Mike Anna Mike John John Mike Mike Britney Smith Anna Smith".split()  
for names in names_list:  
    count[names] +=1
print(count)  

Output:

defaultdict(<class 'int'>, {'Mike': 5, 'Britney': 1, 'John': 3, 'Smith': 2, 'Anna': 2})  

First, we create a defaultdict with int as default_factory. The names_list includes a set of names which repeat several times. The split() function returns a list from the given string. It breaks the string whenever a white-space is encountered and returns words as elements of the list. In the loop, each item in the list is added to the defaultdict named as count and initialized to 0 based on default_factory. If the same element is encountered again, as loop continues, count of that element will be incremented.

The OrderedDict

OrderedDict is a dictionary where keys maintain the order in which they are inserted, which means if you change the value of a key later, it will not change the position of the key.

Import OrderedDict

To use OrderedDict you have to import it from the collections module.

from collections import OrderedDict  

Create a OrderedDict

You can create an OrderedDict object with OrderedDict() constructor. In the following code, You create an OrderedDict without any arguments. After that some items are inserted into it.

od = OrderedDict()  
od['a'] = 1  
od['b'] = 2  
od['c'] = 3  
print(od)  

Output:

OrderedDict([('a', 1), ('b', 2), ('c', 3)])  

You can access each element using a loop as well. Take a look at the following code:

for key, value in od.items():  
    print(key, value)

Output:

a 1  
b 2  
c 3  

Following example is an interesting use case of OrderedDict with Counter. Here, we create a Counter from a list and insert element to an OrderedDict based on their count.

Most frequently occurring letter will be inserted as the first key and the least frequently occurring letter will be inserted as the last key.

list = ["a","c","c","a","b","a","a","b","c"]  
cnt = Counter(list)  
od = OrderedDict(cnt.most_common())  
for key, value in od.items():  
    print(key, value)

Output:

a 4  
c 3  
b 2  

The deque

The deque is a list optimized for inserting and removing items.

Import the deque

You have to import deque class from collections module before using it.

from collections import deque  

Creating a deque

You can create a deque with deque() constructor. You have to pass a list as an argument.

list = ["a","b","c"]  
deq = deque(list)  
print(deq)  

Output:

deque(['a', 'b', 'c'])  

Inserting Elements to deque

You can easily insert an element to the deq we created at either of the ends. To add an element to the right of the deque, you have to use append() method.

If you want to add an element to the start of the deque, you have to use appendleft() method.

deq.append("d")  
deq.appendleft("e")  
print(deq)deque  

Output:

deque(['e', 'a', 'b', 'c', 'd'])  

You can notice that d is added at the end of deq and e is added to the start of the deq

Removing Elements from the deque

Removing elements is similar to inserting elements. You can remove an element the similar way you insert elements. To remove an element from the right end, you can use pop() function and to remove an element from left, you can use popleft().

deq.pop()  
deq.popleft()  
print(deq)  

Output:

deque(['a', 'b', 'c'])  

You can notice that both the first and last elements are removed from the deq.

Clearing a deque

If you want to remove all elements from a deque, you can use clear() function.

list = ["a","b","c"]  
deq = deque(list)  
print(deq)  
print(deq.clear())  

Output:

deque(['a', 'b', 'c'])  
None  

You can see in the output, at first there is a queue with three elements. Once we applied clear() function, the deque is cleared and you see none in the output.

Counting Elements in a deque

If you want to find the count of a specific element, use count(x) function. You have to specify the element for which you need to find the count, as the argument.

list = ["a","b","c"]  
deq = deque(list)  
print(deq.count("a"))  

Output:

1  

In the above example, count of 'a' is 1. Hence '1' printed.

The ChainMap

ChainMap is used to combine several dictionaries or mappings. It returns a list of dictionaries.

Import chainmap

You have to import ChainMap from the collections module before using it.

from collections import ChainMap  

Create a ChainMap

To create a chainmap we can use ChainMap() constructor. We have to pass the dictionaries we are going to combine as an argument set.

dict1 = { 'a' : 1, 'b' : 2 }  
dict2 = { 'c' : 3, 'b' : 4 }  
chain_map = ChainMap(dict1, dict2)  
print(chain_map.maps)  

Output:

[{'b': 2, 'a': 1}, {'c': 3, 'b': 4}]

You can see a list of dictionary as the output. You can access chain map values by key name.

print(chain_map['a'])  

Output:

1  

'1' is printed as the value of key 'a' is 1. Another important point is ChainMap updates its values when its associated dictionaries are updated. For example, if you change the value of 'c' in dict2 to '5', you will notice the change in ChainMap as well.

dict2['c'] = 5  
print(chain_map.maps)  

Output:

[{'a': 1, 'b': 2}, {'c': 5, 'b': 4}]

Getting Keys and Values from ChainMap

You can access the keys of a ChainMap with keys() function. Similarly, you can access the values of elements with values() function, as shown below:

dict1 = { 'a' : 1, 'b' : 2 }  
dict2 = { 'c' : 3, 'b' : 4 }  
chain_map = ChainMap(dict1, dict2)  
print (list(chain_map.keys()))  
print (list(chain_map.values()))  

Output:

['b', 'a', 'c']
[2, 1, 3]

Notice that the value of the key 'b' in the output is the value of key 'b' in dict1. As a rule of thumb, when one key appears in more than one associated dictionaries, ChainMap takes the value for that key from the first dictionary.

Adding a New Dictionary to ChainMap

If you want to add a new dictionary to an existing ChainMap, use new_child() function. It creates a new ChainMap with the newly added dictionary.

dict3 = {'e' : 5, 'f' : 6}  
new_chain_map = chain_map.new_child(dict3)  
print(new_chain_map)  

Output:

ChainMap({'f': 6, 'e': 5}, {'a': 1, 'b': 2}, {'b': 4, 'c': 3})  

Notice that new dictionary is added to the beginning of ChainMap list.

The namedtuple()

The namedtuple() returns a tuple with names for each position in the tuple. One of the biggest problems with ordinary tuples is that you have to remember the index of each field of a tuple object. This is obviously difficult. The namedtuple was introduced to solve this problem.

Import namedtuple

Before using namedtuple, you have to import it from the collections module.

from collections import namedtuple  

Create a namedtuple
from collections import namedtuple

Student = namedtuple('Student', 'fname, lname, age')  
s1 = Student('John', 'Clarke', '13')  
print(s1.fname)  

Output:

Student(fname='John', lname='Clarke', age='13')  

In this example, a namedtuple object Student has been declared. You can access the fields of any instance of a Student class by the defined field name.

Creating a namedtuple Using List

The namedtuple() function requires each value to be passed to it separately. Instead, you can use _make() to create a namedtuple instance with a list. Check the following code:

s2 = Student._make(['Adam','joe','18'])  
print(s2)  

Output:

Student(fname='Adam', lname='joe', age='18')  

Create a New Instance Using Existing Instance

The _asdict() function can be used to create an OrderedDict instance from an existing instance.

s2 = s1._asdict()  
print(s2)  

Output:

OrderedDict([('fname', 'John'), ('lname', 'Clarke'), ('age', '13')])  

Changing Field Values with _replace() Function

To change the value of a field of an instance, the _replace() function is used. Remember that, _replace() function creates a new instance. It does not change the value of existing instance.

s2 = s1._replace(age='14')  
print(s1)  
print(s2)  

Output:

Student(fname='John', lname='Clarke', age='13')  
Student(fname='John', lname='Clarke', age='14')  

Conclusion

With that, we conclude our tutorial on Collections module. We have discussed all the important topics in the collection module. Python collection module still needs improvements if we compare it with Java's Collection library. Therefore, we can expect a lot of changes in upcoming versions.

References



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...