Thursday, February 3, 2022

CodersLegacy: Python Object Serialization with Pickle

Object Serialization is a fascinating programming concept, that is readily available in Python using the Pickle Library. Pickle provides a set of inbuilt functions and functionality that make dumping and loading objects to and from files a piece of cake.


What is Object Serialization?

Serialization is the process of converting an object to a byte stream, and the inverse of which is converting a byte stream back to a python object.

In simpler words, Object Serialization is the process of converting actual Python Objects into bytes, allowing the whole object to be preserved (with all it’s current values). This is commonly known as “pickling” or “dumping”, where we save the byte stream into a file.

The reverse process of this is where we convert these bytes back into a Python Object.

Object Serialization is super useful in many scenarios, such as creating save files to store things like game data or training models for AI/Machine Learning problems. It can take a long time for AI algorithms to generate a model, so instead of doing it every time you run the program, you could just dump it to a file once, and then read it from there each time, potentially speeding up your program by 100x times.


Dumping (Pickling) Objects

In this section we’ll explore how to dump (serialize) objects in Python, into Binary files.

Example# 1

All you have to do is use a little file handling to open up a file in binary mode, and then use the pickle.dump() function.

In order to open the file in binary mode, there must be a “b” present in the second parameter. Like instead of “w” for opening a normal file, use “wb” for writing to a binary file and so on. And remember, the file will not have an extension, so don’t bother including one.

The pickle.dump() method takes two parameters, the first being the object to be pickled/dumped, and the second being the file stream

import pickle

mydict = { "Apple": 20, "Meat" : 100, "Bread": 10 }

ofile = open("BinaryData",'wb')
pickle.dump(mydict,ofile)
ofile.close()

If you open up the Binary file where this Data was dumped, you will see something similar to what is shown below.

Python Pickle Object Serialization - Binary Data

As you can see, it doesn’t really make much sense, because it’s stored in Binary format. When you try to read it as text, it comes out all weird. But don’t worry, it doesn’t need to make sense to us. It’s only purpose is being saved in there, until we are ready to read it.


Example# 2

In this example let’s take a look at creating Objects of a Custom class, then dumping them into a Binary file.

We simply created three Objects of Class Student, added them into the list, and then dumped the whole list into the Binary file. Having all the data in one Data structure makes it easier to both dump and load data back and forth.

class Student:
    def __init__(self, Id, Name):
        self.Id = Id
        self.Name = Name

    def display(self):
        print("ID:", self.Id)
        print("Name:", self.Name)


list1 = [Student(1,"Bob"), Student(2,"Sam"), Student(3,"James")]

ofile = open("BinaryData2",'wb')
pickle.dump(list1,ofile)
ofile.close()

You can also choose to dump each student individually into the BinaryData file. However, this can be a little complicated, so we’ll bring this up again a bit later during this tutorial.


Loading (Un-Pickling) Objects

In this section we’ll explore how to load (Unpickle) objects in Python, from Binary files back into Pythonic Objects.

Example# 1

So we’ll be continuing our previous example and show you how to read data from the Binary file. We’ve created an entirely new file here, where we will write our loading code.

All you need to do is open the file in Binary read mode (“rb”), and then use the pickle.load() function. This function takes a single parameter, which is the file stream we opened up earlier.

import pickle

ifile = open("BinaryData",'rb')
mydict2 = pickle.load(ifile)
ifile.close()

print(mydict2["Apple"])
print(mydict2["Bread"])
print(mydict2["Meat"])

This produces the following output:

20
10
100

Example# 2

Here we will do the same as example#1 and reading data from example#2 back into a Python list.

Since we dumped/pickled objects of Class Student, we can now iterate over them and call the display() function to verify all has gone as planned.

import pickle

ifile = open("BinaryData2",'rb')
list2 = pickle.load(ifile)
ifile.close()

for student in list2:
    student.display()

The output:

ID: 1
Name: Bob
ID: 2
Name: Sam
ID: 3
Name: James

Multiple Pickles in One File

You may be wondering whether it’s possible to make multiple pickles (multiple dumps) in a single binary file. Well yes, it’s possible as the below examples show.

The order in which you dump them is going to be the order in which they are read (otherwise they will get mixed up). The below example demonstrates this pretty clearly.

class Student:
    def __init__(self, Id, Name):
        self.Id = Id
        self.Name = Name

    def display(self):
        print("ID:", self.Id)
        print("Name:", self.Name)


s1 = Student(1, "Bob")
s2 = Student(2, "Sam")
s3 = Student(3, "James")

ofile = open("BinaryData3",'wb')
pickle.dump(s1,ofile)
pickle.dump(s2,ofile)
pickle.dump(s3,ofile)
ofile.close()

ifile = open("BinaryData3",'rb')
s4 = pickle.load(ifile)
s5 = pickle.load(ifile)
s6 = pickle.load(ifile)
ifile.close()

s4.display()
s5.display()
s6.display()

Output:

ID: 1
Name: Bob
ID: 2
Name: Sam
ID: 3
Name: James

Making multiple pickles in a single file can be a little confusing however, as you need to keep track of the order or the pickles, and the number of pickles. An easier solution would be to have all the data wrapped in a container like a list or dictionary, then dumped to a file.

This is the same approach we have been using in the examples prior to this section.


UnPickable Types

As awesome as Pickle is, it can’t “pickle” everything. There are certain things that Pickle is unable to dump, and will throw an error if you try. The number of UnPickable Types is rather low, and they are often really complex structures. A situation like this occurring would actually be pretty rare, so don’t worry too much.

List of things Pickle can Pickle:

  • NoneTrue, and False
  • integers, floating point numbers, complex numbers
  • strings, bytes, bytearrays
  • tuples, lists, sets, and dictionaries containing only picklable objects
  • functions defined at the top level of a module (using def, not lambda)
  • built-in functions defined at the top level of a module
  • classes that are defined at the top level of a module
  • instances of such classes whose __dict__ or the result of calling __getstate__() is picklable.

Alternatives:

  1. Dill Library: This is a special extension of the Pickle library with roughly the same syntax and functions. It’s able to Pickle more types of structures than the original Pickle library can, such as nested functions. It’s not able to handle 100% of all situations though, so it’s not a perfect solution.

Compressing Pickle Dumps (Bonus)

As a slight bonus, we’ll also explain how to compress Pickle File that we have already dumped. Pickle Dumps are all in binary format, so it’s pretty easy and simple to compress.

All we need is the use of the bzip2 compression library, which will handle all the hard work for us. We just have to call the right function, and our compressed file will be ready.

Here is the code with which we’ll be using. It’s the same as before, but with a 2D list instead, which has 100 copies of the original list with three Student objects. The reason for this was to make the difference between the non-compressed and compressed version noticeable. Otherwise due to overhead, it’s possible the non-compressed version could be smaller in size if the data is very small.

class Student:
    def __init__(self, Id, Name):
        self.Id = Id
        self.Name = Name

    def display(self):
        print("ID:", self.Id)
        print("Name:", self.Name)


list1= [Student(1,"Bob"),Student(2,"Sam"),Student(3,"James")] * 100

We’ll now try to dump the file with both the regular File handling method, and the bz2 method, and then compare the file sizes using the OS library’s getsize() method.

ofile = open("BinaryData2",'wb')
pickle.dump(list1,ofile)
ofile.close()
print(os.path.getsize("BinaryData2"))

The file size:

722

Now let’s try this with bz2.

ofile = bz2.BZ2File("BinaryData2",'wb')
pickle.dump(list1,ofile)
ofile.close()
print(os.path.getsize("BinaryData2"))

The file size:

155

Using bz2 to compress the data resulted in an almost 5x times improvement. And this is only the beginning. The larger the dataset, the more the compression will be noticeable and can result into 10x – 30x smaller files (depending on contents).

Note: The result of the above test may be slightly skewed, because the same data was written to the file 100 times. The compression efficiency is usually better is the data is similar (due to how compression techniques work).

If you wanna learn more about bz2 and it’s various compression and decompression functions, check out this bz2 (bzip2) tutorial.


This marks the end of the Python Object Serialization with Pickle Tutorial. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the tutorial content can be asked in the comments section below.

The post Python Object Serialization with Pickle appeared first on CodersLegacy.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...