As a developer, you may sometimes need to send complex object hierarchies over a network or save the internal state of your objects to a disk or database for later use. To accomplish this, you can use a process called serialization, which is fully supported by the standard library thanks to the Python pickle
module.
In this tutorial, you’ll learn:
- What it means to serialize and deserialize an object
- Which modules you can use to serialize objects in Python
- Which kinds of objects can be serialized with the Python
pickle
module - How to use the Python
pickle
module to serialize object hierarchies - What the risks are when deserializing an object from an untrusted source
Let’s get pickling!
Free Bonus: 5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you'll need to take your Python skills to the next level.
Serialization in Python
The serialization process is a way to convert a data structure into a linear form that can be stored or transmitted over a network.
In Python, serialization allows you to take a complex object structure and transform it into a stream of bytes that can be saved to a disk or sent over a network. You may also see this process referred to as marshalling. The reverse process, which takes a stream of bytes and converts it back into a data structure, is called deserialization or unmarshalling.
Serialization can be used in a lot of different situations. One of the most common uses is saving the state of a neural network after the training phase so that you can use it later without having to redo the training.
Python offers three different modules in the standard library that allow you to serialize and deserialize objects:
In addition, Python supports XML, which you can also use to serialize objects.
The marshal
module is the oldest of the three listed above. It exists mainly to read and write the compiled bytecode of Python modules, or the .pyc
files you get when the interpreter imports a Python module. So, even though you can use marshal
to serialize some of your objects, it’s not recommended.
The json
module is the newest of the three. It allows you to work with standard JSON files. JSON is a very convenient and widely used format for data exchange.
There are several reasons to choose the JSON format: It’s human readable and language independent, and it’s lighter than XML. With the json
module, you can serialize and deserialize several standard Python types:
The Python pickle
module is another way to serialize and deserialize objects in Python. It differs from the json
module in that it serializes objects in a binary format, which means the result is not human readable. However, it’s also faster and it works with many more Python types right out of the box, including your custom-defined objects.
Note: From now on, you’ll see the terms pickling and unpickling used to refer to serializing and deserializing with the Python pickle
module.
So, you have several different ways to serialize and deserialize objects in Python. But which one should you use? The short answer is that there’s no one-size-fits-all solution. It all depends on your use case.
Here are three general guidelines for deciding which approach to use:
-
Don’t use the
marshal
module. It’s used mainly by the interpreter, and the official documentation warns that the Python maintainers may modify the format in backward-incompatible ways. -
The
json
module and XML are good choices if you need interoperability with different languages or a human-readable format. -
The Python
pickle
module is a better choice for all the remaining use cases. If you don’t need a human-readable format or a standard interoperable format, or if you need to serialize custom objects, then go withpickle
.
Inside the Python pickle
Module
The Python pickle
module basically consists of four methods:
pickle.dump(obj, file, protocol=None, *, fix_imports=True, buffer_callback=None)
pickle.dumps(obj, protocol=None, *, fix_imports=True, buffer_callback=None)
pickle.load(file, *, fix_imports=True, encoding="ASCII", errors="strict", buffers=None)
pickle.loads(bytes_object, *, fix_imports=True, encoding="ASCII", errors="strict", buffers=None)
The first two methods are used during the pickling process, and the other two are used during unpickling. The only difference between dump()
and dumps()
is that the first creates a file containing the serialization result, whereas the second returns a string.
Read the full article at https://realpython.com/python-pickle-module/ »
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
from Planet Python
via read more
No comments:
Post a Comment