Monday, April 27, 2020

Real Python: The Python pickle Module: How to Persist Objects in Python

As a developer, you may sometimes need to send complex object hierarchies over a network or save the internal state of your objects to a disk or database for later use. To accomplish this, you can use a process called serialization, which is fully supported by the standard library thanks to the Python pickle module.

In this tutorial, you’ll learn:

  • What it means to serialize and deserialize an object
  • Which modules you can use to serialize objects in Python
  • Which kinds of objects can be serialized with the Python pickle module
  • How to use the Python pickle module to serialize object hierarchies
  • What the risks are when deserializing an object from an untrusted source

Let’s get pickling!

Free Bonus: 5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you'll need to take your Python skills to the next level.

Serialization in Python

The serialization process is a way to convert a data structure into a linear form that can be stored or transmitted over a network.

In Python, serialization allows you to take a complex object structure and transform it into a stream of bytes that can be saved to a disk or sent over a network. You may also see this process referred to as marshalling. The reverse process, which takes a stream of bytes and converts it back into a data structure, is called deserialization or unmarshalling.

Serialization can be used in a lot of different situations. One of the most common uses is saving the state of a neural network after the training phase so that you can use it later without having to redo the training.

Python offers three different modules in the standard library that allow you to serialize and deserialize objects:

  1. The marshal module
  2. The json module
  3. The pickle module

In addition, Python supports XML, which you can also use to serialize objects.

The marshal module is the oldest of the three listed above. It exists mainly to read and write the compiled bytecode of Python modules, or the .pyc files you get when the interpreter imports a Python module. So, even though you can use marshal to serialize some of your objects, it’s not recommended.

The json module is the newest of the three. It allows you to work with standard JSON files. JSON is a very convenient and widely used format for data exchange.

There are several reasons to choose the JSON format: It’s human readable and language independent, and it’s lighter than XML. With the json module, you can serialize and deserialize several standard Python types:

The Python pickle module is another way to serialize and deserialize objects in Python. It differs from the json module in that it serializes objects in a binary format, which means the result is not human readable. However, it’s also faster and it works with many more Python types right out of the box, including your custom-defined objects.

Note: From now on, you’ll see the terms pickling and unpickling used to refer to serializing and deserializing with the Python pickle module.

So, you have several different ways to serialize and deserialize objects in Python. But which one should you use? The short answer is that there’s no one-size-fits-all solution. It all depends on your use case.

Here are three general guidelines for deciding which approach to use:

  1. Don’t use the marshal module. It’s used mainly by the interpreter, and the official documentation warns that the Python maintainers may modify the format in backward-incompatible ways.

  2. The json module and XML are good choices if you need interoperability with different languages or a human-readable format.

  3. The Python pickle module is a better choice for all the remaining use cases. If you don’t need a human-readable format or a standard interoperable format, or if you need to serialize custom objects, then go with pickle.

Inside the Python pickle Module

The Python pickle module basically consists of four methods:

  1. pickle.dump(obj, file, protocol=None, *, fix_imports=True, buffer_callback=None)
  2. pickle.dumps(obj, protocol=None, *, fix_imports=True, buffer_callback=None)
  3. pickle.load(file, *, fix_imports=True, encoding="ASCII", errors="strict", buffers=None)
  4. pickle.loads(bytes_object, *, fix_imports=True, encoding="ASCII", errors="strict", buffers=None)

The first two methods are used during the pickling process, and the other two are used during unpickling. The only difference between dump() and dumps() is that the first creates a file containing the serialization result, whereas the second returns a string.

Read the full article at https://realpython.com/python-pickle-module/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...