Heaps and priority queues are little-known but surprisingly useful data structures. For many problems that involve finding the best element in a dataset, they offer a solution that’s easy to use and highly effective. The Python heapq module is part of the standard library. It implements all the low-level heap operations as well as some high-level common uses for heaps.
A priority queue is a powerful tool that can solve problems as varied as writing an email scheduler, finding the shortest path on a map, or merging log files. Programming is full of optimization problems in which the goal is to find the best element. Priority queues and the functions in the Python heapq module can often help with that.
In this tutorial, you’ll learn:
- What heaps and priority queues are and how they relate to each other
- What kinds of problems can be solved using a heap
- How to use the Python
heapqmodule to solve those problems
This tutorial is for Pythonistas who are comfortable with lists, dicts, sets, and generators and are looking for more sophisticated data structures.
Free Download: Get a sample chapter from Python Tricks: The Book that shows you Python's best practices with simple examples you can apply instantly to write more beautiful + Pythonic code.
What Are Heaps?
Heaps are concrete data structures, whereas priority queues are abstract data structures. An abstract data structure determines the interface, while a concrete data structure defines the implementation.
Heaps are commonly used to implement priority queues. They’re the most popular concrete data structure for implementing the priority queue abstract data structure.
Concrete data structures also specify performance guarantees. Performance guarantees define the relationship between the size of the structure and the time operations take. Understanding those guarantees allows you to predict how much time the program will take as the size of its inputs change.
Data Structures, Heaps, and Priority Queues
Abstract data structures specify operations and the relationships between them. The priority queue abstract data structure, for example, supports three operations:
- is_empty checks whether the queue is empty.
- add_element adds an element to the queue.
- pop_element pops the element with the highest priority.
Priority queues are commonly used for optimizing task execution, in which the goal is to work on the task with the highest priority. After a task is completed, its priority is lowered, and it’s returned to the queue.
There are two different conventions for determining the priority of an element:
- The largest element has the highest priority.
- The smallest element has the highest priority.
These two conventions are equivalent because you can always reverse the effective order. For example, if your elements consist of numbers, then using negative numbers will flip the conventions around.
The Python heapq module uses the second convention, which is generally the more common of the two. Under this convention, the smallest element has the highest priority. This might sound surprising, but it’s often quite useful. In the real-life examples you’ll see later, this convention will simplify your code.
Note: The Python heapq module, and the heap data structure in general, is not designed to allow finding any element except the smallest one. For retrieval of any element by size, a better option is a binary search tree.
Concrete data structures implement the operations defined in an abstract data structure and further specify performance guarantees.
The heap implementation of the priority queue guarantees that both pushing (adding) and popping (removing) elements are logarithmic time operations. This means that the time it takes to do push and pop is proportional to the base-2 logarithm of the number of elements.
Logarithms grow slowly. The base-2 logarithm of fifteen is about four, while the base-2 logarithm of a trillion is about forty. This means that if an algorithm is fast enough on fifteen elements, then it’s going to be only ten times slower on a trillion elements and will probably still be fast enough.
In any discussion of performance, the biggest caveat is that these abstract considerations are less meaningful than actually measuring a concrete program and learning where the bottlenecks are. General performance guarantees are still important for making useful predictions about program behavior, but those predictions should be confirmed.
Implementation of Heaps
A heap implements a priority queue as a complete binary tree. In a binary tree, each node will have at most two children. In a complete binary tree, all levels except possibly the deepest one are full at all times. If the deepest level is incomplete, then it will have the nodes as far to the left as possible.
Read the full article at https://realpython.com/python-heapq-module/ »
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
from Real Python
read more
No comments:
Post a Comment