Monday, August 24, 2020

Python mmap: Improved File I/O With Memory Mapping

The Zen of Python has a lot of wisdom to offer. One especially useful idea is that “There should be one—and preferably only one—obvious way to do it.” Yet there are multiple ways to do most things in Python, and often for good reason. For example, there are multiple ways to read a file in Python, including the rarely used mmap module

Python’s mmap provides memory-mapped file input and output (I/O). It allows you to take advantage of lower-level operating system functionality to read files as if they were one large string or array. This can provide significant performance improvements in code that requires a lot of file I/O.

In this tutorial, you’ll learn:

  • What kinds of computer memory exist
  • What problems you can solve with mmap
  • How use memory mapping to read large files faster
  • How to change a portion of a file without rewriting the entire file
  • How to use mmap to share information between multiple processes

Understanding Computer Memory

Memory mapping is a technique that uses lower-level operating system APIs to load a file directly into computer memory. It can dramatically improve file I/O performance in your program. To better understand how memory mapping improves performance, as well as how and when you can use the mmap module to take advantage of these performance benefits, it’s useful to first learn a bit about computer memory.

Computer memory is a big, complicated topic, but this tutorial focuses only on what you need to know to use the mmap module effectively. For the purposes of this tutorial, the term memory refers to random-access memory, or RAM.

There are several types of computer memory:

  1. Physical
  2. Virtual
  3. Shared

Each type of memory can come into play when you’re using memory mapping, so let’s review each one from a high level.

Physical Memory

Physical memory is the least complicated type of memory to understand because it’s often part of the marketing associated with your computer. (You might remember that when you bought your computer, it advertised something like 8 gigabytes of RAM.) Physical memory typically comes on cards that are connected to your computer’s motherboard.

Physical memory is the amount of volatile memory that’s available for your programs to use while running. Physical memory should not be confused with storage, such as your hard drive or solid-state disk.

Virtual Memory

Virtual memory is a way of handling memory management. The operating system uses virtual memory to make it appear that you have more memory than you do, allowing you to worry less about how much memory is available for your programs at any given time. Behind the scenes, your operating system uses parts of your nonvolatile storage, such as your solid-state disk, to simulate additional RAM.

In order to do this, your operating system must maintain a mapping between physical memory and virtual memory. Each operating system uses its own sophisticated algorithm to map virtual memory addresses to physical ones using a data structure called a page table.

Luckily, most of this complication is hidden from your programs. You don’t need to understand page tables or logical-to-physical mapping to write performant I/O code in Python. However, knowing a little bit about memory gives you a better understanding of what the computer and libraries are taking care of for you.

mmap uses virtual memory to make it appear that you’ve loaded a very large file into memory, even if the contents of the file are too big to fit in your physical memory.

Shared Memory

Shared memory is another technique provided by your operating system that allows multiple programs to access the same data simultaneously. Shared memory can be a very efficient way of handling data in a program that uses concurrency.

Python’s mmap uses shared memory to efficiently share large amounts of data between multiple Python processes, threads, and tasks that are happening concurrently.

Digging Deeper Into File I/O

Now that you have a high-level view of the different types of memory, it’s time to understand what memory mapping is and what problems it solves. Memory mapping is another way to perform file I/O that can result in better performance and memory efficiency.

In order to fully appreciate what memory mapping does, it’s useful to consider regular file I/O from a lower-level perspective. A lot of things happen behind the scenes when reading a file:

Read the full article at https://realpython.com/python-mmap/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]



from Real Python
read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...