Introduction
Collections in Python are containers that are used to store collections of data, for example, list, dict, set, tuple etc. These are built-in collections. Several modules have been developed that provide additional data structures to store collections of data. One such module is the Python collections module.
Python collections module was introduced to improve the functionalities of the built-in collection containers. Python collections module was first introduced in its 2.4 release. This tutorial is based on its latest stable release (3.7 version).
Collections Module
In this tutorial we will discuss 6 of the most commonly used data structures from the Python collections module. They are as follows:
The Counter
Counter is a subclass of dictionary object. The Counter()
function in collections module takes an iterable or a mapping as the argument and returns a Dictionary. In this dictionary, a key is an element in the iterable or the mapping and value is the number of times that element exists in the iterable or the mapping.
You have to import the Counter
class before you can create a counter
instance.
from collections import Counter
Create Counter Objects
There are multiple ways to create counter
objects. The simplest way is to use Counter()
function without any arguments.
cnt = Counter()
You can pass an iterable (list) to Counter()
function to create a counter
object.
list = [1,2,3,4,1,2,6,7,3,8,1]
Counter(list)
Finally, the Counter()
function can take a dictionary as an argument. In this dictionary, the value of a key should be the 'count' of that key.
Counter({1:3,2:4})
You can access any counter item with its key as shown below:
list = [1,2,3,4,1,2,6,7,3,8,1]
cnt = Counter(list)
print(cnt[1])
when you print cnt[1]
, you will get the count of 1.
Output:
3
In the above examples, cnt
is an object of Counter
class which is a subclass of dict
. So it has all the methods of dict
class.
Apart from that, Counter
has three additional functions:
- Elements
- Most_common([n])
- Subtract([interable-or-mapping])
The element() Function
You can get the items of a Counter
object with elements()
function. It returns a list containing all the elements in the Counter
object.
Look at the following example:
cnt = Counter({1:3,2:4})
print(list(cnt.elements()))
Output:
[1, 1, 1, 2, 2, 2, 2]
Here, we create a Counter
object with a dictionary as an argument. In this Counter object, count of 1 is 3 and count of 2 is 4. The elements()
function is called using cnt
object which returns an iterator which is passed as an argument to the list.
The iterator repeats 3 times over 1 returning three '1's, and repeats four times over 2 returning four '2's to the list. Finally, the list is printed using the print
function.
The most_common() Function
The Counter()
function returns a dictionary which is unordered. You can sort it according to the number of counts in each element using most_common()
function of the Counter
object.
list = [1,2,3,4,1,2,6,7,3,8,1]
cnt = Counter(list)
print(cnt.most_common())
Output:
[(1, 3), (2, 2), (3, 2), (4, 1), (6, 1), (7, 1), (8, 1)]
You can see that most_common
function returns a list, which is sorted based on the count of the elements. 1 has a count of three, therefore it is the first element of the list.
The subtract() Function
The subtract()
takes iterable (list) or a mapping (dictionary) as an argument and deducts elements count using that argument. Check the following example:
cnt = Counter({1:3,2:4})
deduct = {1:1, 2:2}
cnt.subtract(deduct)
print(cnt)
Output:
Counter({1: 2, 2: 2})
You can notice that cnt
object we first created, has a count of 3 for '1' and count of 4 for '2'. The deduct
dictionary has the value '1' for key '1' and value '2' for key '2'. The subtract()
function deducted 1 count from key '1' and 2 counts from key '2'.
The defaultdict
The defaultdict
works exactly like a python dictionary, except for it does not throw KeyError
when you try to access a non-existent key.
Instead, it initializes the key with the element of the data type that you pass as an argument at the creation of defaultdict
. The data type is called default_factory
.
Import defaultdict
First, you have to import defaultdict
from collections
module before using it:
from collections import defaultdict
Create a defaultdict
You can create a defaultdict
with the defaultdict()
constructor. You have to specify a data type as an argument. Check the following code:
nums = defaultdict(int)
nums['one'] = 1
nums['two'] = 2
print(nums['three'])
Output:
0
In this example, int
is passed as the default_factory
. Notice that you only pass int
, not int()
. Next, the values are defined for the two keys, namely, 'one' and 'two', but in the next line we try to access a key that has not been defined yet.
In a normal dictionary, this will force a KeyError
. But defaultdict
initialize the new key with default_factory
's default value which is 0 for int
. Hence, when the program is executed, and 0 will be printed. This particular feature of initializing non-existent keys can be exploited in various situations.
For example, let's say you want the count of each name in a list of names given as "Mike, John, Mike, Anna, Mike, John, John, Mike, Mike, Britney, Smith, Anna, Smith".
from collections import defaultdict
count = defaultdict(int)
names_list = "Mike John Mike Anna Mike John John Mike Mike Britney Smith Anna Smith".split()
for names in names_list:
count[names] +=1
print(count)
Output:
defaultdict(<class 'int'>, {'Mike': 5, 'Britney': 1, 'John': 3, 'Smith': 2, 'Anna': 2})
First, we create a defaultdict
with int as default_factory
. The names_list
includes a set of names which repeat several times. The split()
function returns a list from the given string. It breaks the string whenever a white-space is encountered and returns words as elements of the list. In the loop, each item in the list is added to the defaultdict
named as count
and initialized to 0 based on default_factory
. If the same element is encountered again, as loop continues, count of that element will be incremented.
The OrderedDict
OrderedDict
is a dictionary where keys maintain the order in which they are inserted, which means if you change the value of a key later, it will not change the position of the key.
Import OrderedDict
To use OrderedDict
you have to import it from the collections module.
from collections import OrderedDict
Create a OrderedDict
You can create an OrderedDict object with OrderedDict()
constructor. In the following code, You create an OrderedDict
without any arguments. After that some items are inserted into it.
od = OrderedDict()
od['a'] = 1
od['b'] = 2
od['c'] = 3
print(od)
Output:
OrderedDict([('a', 1), ('b', 2), ('c', 3)])
You can access each element using a loop as well. Take a look at the following code:
for key, value in od.items():
print(key, value)
Output:
a 1
b 2
c 3
Following example is an interesting use case of OrderedDict
with Counter
. Here, we create a Counter
from a list and insert element to an OrderedDict
based on their count.
Most frequently occurring letter will be inserted as the first key and the least frequently occurring letter will be inserted as the last key.
list = ["a","c","c","a","b","a","a","b","c"]
cnt = Counter(list)
od = OrderedDict(cnt.most_common())
for key, value in od.items():
print(key, value)
Output:
a 4
c 3
b 2
The deque
The deque
is a list optimized for inserting and removing items.
Import the deque
You have to import deque
class from collections
module before using it.
from collections import deque
Creating a deque
You can create a deque with deque()
constructor. You have to pass a list as an argument.
list = ["a","b","c"]
deq = deque(list)
print(deq)
Output:
deque(['a', 'b', 'c'])
Inserting Elements to deque
You can easily insert an element to the deq
we created at either of the ends. To add an element to the right of the deque, you have to use append()
method.
If you want to add an element to the start of the deque, you have to use appendleft()
method.
deq.append("d")
deq.appendleft("e")
print(deq)deque
Output:
deque(['e', 'a', 'b', 'c', 'd'])
You can notice that d
is added at the end of deq and e
is added to the start of the deq
Removing Elements from the deque
Removing elements is similar to inserting elements. You can remove an element the similar way you insert elements. To remove an element from the right end, you can use pop()
function and to remove an element from left, you can use popleft()
.
deq.pop()
deq.popleft()
print(deq)
Output:
deque(['a', 'b', 'c'])
You can notice that both the first and last elements are removed from the deq
.
Clearing a deque
If you want to remove all elements from a deque, you can use clear()
function.
list = ["a","b","c"]
deq = deque(list)
print(deq)
print(deq.clear())
Output:
deque(['a', 'b', 'c'])
None
You can see in the output, at first there is a queue with three elements. Once we applied clear()
function, the deque is cleared and you see none
in the output.
Counting Elements in a deque
If you want to find the count of a specific element, use count(x)
function. You have to specify the element for which you need to find the count, as the argument.
list = ["a","b","c"]
deq = deque(list)
print(deq.count("a"))
Output:
1
In the above example, count of 'a' is 1. Hence '1' printed.
The ChainMap
ChainMap
is used to combine several dictionaries or mappings. It returns a list of dictionaries.
Import chainmap
You have to import ChainMap
from the collections
module before using it.
from collections import ChainMap
Create a ChainMap
To create a chainmap we can use ChainMap()
constructor. We have to pass the dictionaries we are going to combine as an argument set.
dict1 = { 'a' : 1, 'b' : 2 }
dict2 = { 'c' : 3, 'b' : 4 }
chain_map = ChainMap(dict1, dict2)
print(chain_map.maps)
Output:
[{'b': 2, 'a': 1}, {'c': 3, 'b': 4}]
You can see a list of dictionary as the output. You can access chain map values by key name.
print(chain_map['a'])
Output:
1
'1' is printed as the value of key 'a' is 1. Another important point is ChainMap
updates its values when its associated dictionaries are updated. For example, if you change the value of 'c' in dict2
to '5', you will notice the change in ChainMap
as well.
dict2['c'] = 5
print(chain_map.maps)
Output:
[{'a': 1, 'b': 2}, {'c': 5, 'b': 4}]
Getting Keys and Values from ChainMap
You can access the keys of a ChainMap
with keys()
function. Similarly, you can access the values of elements with values()
function, as shown below:
dict1 = { 'a' : 1, 'b' : 2 }
dict2 = { 'c' : 3, 'b' : 4 }
chain_map = ChainMap(dict1, dict2)
print (list(chain_map.keys()))
print (list(chain_map.values()))
Output:
['b', 'a', 'c']
[2, 1, 3]
Notice that the value of the key 'b' in the output is the value of key 'b' in dict1
. As a rule of thumb, when one key appears in more than one associated dictionaries, ChainMap
takes the value for that key from the first dictionary.
Adding a New Dictionary to ChainMap
If you want to add a new dictionary to an existing ChainMap
, use new_child()
function. It creates a new ChainMap
with the newly added dictionary.
dict3 = {'e' : 5, 'f' : 6}
new_chain_map = chain_map.new_child(dict3)
print(new_chain_map)
Output:
ChainMap({'f': 6, 'e': 5}, {'a': 1, 'b': 2}, {'b': 4, 'c': 3})
Notice that new dictionary is added to the beginning of ChainMap
list.
The namedtuple()
The namedtuple()
returns a tuple with names for each position in the tuple. One of the biggest problems with ordinary tuples is that you have to remember the index of each field of a tuple object. This is obviously difficult. The namedtuple
was introduced to solve this problem.
Import namedtuple
Before using namedtuple
, you have to import it from the collections
module.
from collections import namedtuple
Create a namedtuple
from collections import namedtuple
Student = namedtuple('Student', 'fname, lname, age')
s1 = Student('John', 'Clarke', '13')
print(s1.fname)
Output:
Student(fname='John', lname='Clarke', age='13')
In this example, a namedtuple
object Student
has been declared. You can access the fields of any instance of a Student
class by the defined field name.
Creating a namedtuple Using List
The namedtuple()
function requires each value to be passed to it separately. Instead, you can use _make()
to create a namedtuple
instance with a list. Check the following code:
s2 = Student._make(['Adam','joe','18'])
print(s2)
Output:
Student(fname='Adam', lname='joe', age='18')
Create a New Instance Using Existing Instance
The _asdict()
function can be used to create an OrderedDict
instance from an existing instance.
s2 = s1._asdict()
print(s2)
Output:
OrderedDict([('fname', 'John'), ('lname', 'Clarke'), ('age', '13')])
Changing Field Values with _replace() Function
To change the value of a field of an instance, the _replace()
function is used. Remember that, _replace()
function creates a new instance. It does not change the value of existing instance.
s2 = s1._replace(age='14')
print(s1)
print(s2)
Output:
Student(fname='John', lname='Clarke', age='13')
Student(fname='John', lname='Clarke', age='14')
Conclusion
With that, we conclude our tutorial on Collections module. We have discussed all the important topics in the collection module. Python collection module still needs improvements if we compare it with Java's Collection library. Therefore, we can expect a lot of changes in upcoming versions.
References
from Planet Python
via read more
No comments:
Post a Comment