Wednesday, June 30, 2021

Stack Abuse: Graphs in Python: Minimum Spanning Trees - Prim's Algorithm

Introduction

Graphs are a great tool for modeling relationships between objects. Graphs can be used to model complex relationships between any number of objects in a system, which makes them ideal for representing interactions between concepts/items in areas like physics, economics, chemistry.

Thanks to constantly increasing computational power, graphs can be manipulated easier than ever. The study of data structures and algorithms has given us better insight into the use of graphs, thanks to algorithms created by great minds over the years. One of these algorithms is Prim’s Algorithm. It was designed by Vojtech Jarnik in 1930, then redesigned by Robert C. Prim in 1957. This algorithm is used to find the Minimum Spanning Tree in a graph.

In this guide, you'll learn how to implement Prim's algorithm in Python - we'll cover the steps used in the algorithm as well as implement it through a functional example. Before that, we'll dive into the definitions of trees, graphs, adjacency matrtices and minimum spanning trees.

What is a Graph?

Graphs are used to illustrate complex systems and the relationships that exist between points of data. They simplify the visualization of these relationships and their relative importance, and have countless applications, but we commonly study them in discrete mathematics, where they can lead to optimal solutions for a wide variety of problems.

You have probably seen graphs when working with Git repositories. Consulting the commits log of a project often reveals a hierarchical graph.

The vertices and the edges are the principal components of the graph’s fundamental structure. We usually represent the vertices as circles or dots, while the edges are lines or curves that connect the vertices, forming relationships. According to the purpose of the graph, the edges and the vertices can have weights, which are numeric values assigned to these elements.

Weights are an abstraction and can be defined by various measures. On a map (graph) of bus stops (nodes), the streets (edges) can have a traffic jam. The amount of time it takes to go through this jam is the weight of that edge.

The edges that connect two or more vertices are called paths. Here's an example:

graph illustration

If a graph is undirected (all edges are bidirectional) and acyclic (isn't cyclic, doesn't run in a circle), we call it a tree - the graph above is a tree. Let's take a closer look at this.

What is a Tree?

Trees are a type of graph, though not all graphs are trees. The main feature of a tree is that every pair of vertices is connected by one and only one path, so unlike other types of graphs there cannot be cycles in the graph - they're acyclic.

Additionally, trees are undirected - there are no strict directions between nodes that you have to follow.

This is a tree:

tree illustration

Although this might not look much like a tree that you're used to seeing in the park - a lot of trees actually do resemble them, having a hierarchy from the root node with many branches and leaves.

What is an Adjacency Matrix?

A graph’s relationships can be represented in many ways, and one of them is an Adjacency Matrix. This structure shows how the vertices in a graph are connected to other vertices. An adjacency matrix is a square matrix with dimensions n x n, where n is the number of vertices in the graph. Once the matrix structure is defined, its fields define which vertices have paths connected to other vertices.

For example, the vertex N1 is connected to the vertex N2, so you fill the field (N1, N2) in the matrix with the number one, showing that a relationship exists between them. If there is no relationship between the two vertices, fill the field in with a zero. If you're working with a graph with weights, you can fill the fields with the weights of the edges instead of the number one when there is a relationship.

This is how an Adjacency Matrix can look like:

adjacency matrix

Minimum Spanning Trees

With the terminology sorted out, we can take a look at Minimum Spanning Trees (MST). These are a common occurrence in the real world where we'd like to find, say, the "cheapest" coverage of a certain area. For example, a pizza restaurant chain might want to know where to open up their restaurants to cover the largest area of delivery, within their guaranteed time, with the minimum number of open restaurants.

The time it takes to traverse a street is its weight, so the goal is to create a tree within the city, that connects all of the nodes (delivery areas/houses), without any cycles (inefficient), and have the sum of all the weights be the lowest it can be.

The result is a Minimum Spanning Tree of their restaurants.

For every graph with weights, you can calculate a Minimum Spanning Tree. Each of these trees fulfills the following conditions:

  • Is a subgraph (this means the MST contains some or all the relationships from the original graph, no more).
  • Is a tree.
  • The MST weight (sum of weights) is the minimum weight possible of the different potential spanning trees in the graph.
  • The general graph must be undirected.

A Spanning Tree (Maximum or Minimum) connects all of the general graph's vertices, but not necessarily all the edges of the graph - some are avoided to reduce cost and even if not, using all the edges may turn the path into a cyclic one, which then defeats the purpose of a tree.

With this summary, we can get into Prim's Algorithm.

Prim's Algorithm Intuition

In 1957 Robert C. Prim designed (or rather, redesigned) a sequence of steps to find a graph's Minimum Spanning Tree using path weights.

The algorithm's steps are these:

  1. Select a random vertex.
  2. Choose the path with the minimum weight connected to the chosen vertex.
  3. The path will lead you to a new vertex, position yourself there.
  4. Once you have formed/updated the initial tree, choose the path with the minimum weight that is connected to the whole tree. You must avoid creating cycles.
  5. Repeat the steps 3 and 4 until you have covered all the vertices.

Let's get some more intuition for how this works with this gif:

prim's algorithm gif

Implementing Prim's Algorithm in Python

Creating the Adjacency Matrix

We'll use two adjacency matrices - one for the general graph and another for the MST, and we'll define them using list comprehensions. Since we are creating a matrix, we'll create one list full of other lists. Every row will be another list and there we'll store the weight values.

Note: Every column and row of the matrix represents a vertex.

def primsAlgorithm(vertices):

    # Creating the adjacency Matrix with n x n dimensions filled with zeros, where n is the graph number of vertices:
    adjacencyMatrix = [[0 for column in range(vertices)] for row in range(vertices)]

    # Creating another adjacency Matrix for the Minimum Spanning Tree:
    mstMatrix = [[0 for column in range(vertices)] for row in range(vertices)]

One useful attribute of an undirected graph's adjacency matrix is that it is symmetric. This means the upper half (above the graph's diagonal) is a mirror of the lower half. Knowing this means we don't have to fill the whole matrix because there will be repeated values. Taking advantage of this property leads us to this:

    # Filling the adjacency matrix:
    for i in range(0,vertices):
        # Since the adjacency matrix is Symmetric we don't have to fill the whole matrix, only the upper half:
        for j in range(0+i,vertices):
            # Asking for the edges weights:
            adjacencyMatrix[i][j] = int(input('Enter the path weight between the vertices: ({}, {}):  '.format(i,j)))

            # Again, we use the Symmetric Matrix as an advantage:
            adjacencyMatrix[j][i] = adjacencyMatrix[i][j]

We've filled in the general graph adjacency matrix. Let's get to filling in the matrix's values.

As we use Prim's algorithm, we are going to be comparing the weights and looking for the minimum one, so we should define a really large number that works as a temporary minimum. Python will help us with that task:

    # Defining a really big number:
    positiveInf = float('inf')

We also have to track the selected vertices to learn which vertices are included in the MST. Once all the vertices are part of the subgraph, we can stop looking. To achieve this, we are going to create another comprehension list with Boolean values. Every column in this new comprehension list will represent a vertex and if the vertex was chosen as part of the MST the field will be True, and False otherwise:

    # This is a list showing which vertices are already selected, so we don't pick the same vertex twice and we can actually know when stop looking
    selectedVertices = [False for vertex in range(vertices)]

Now, let's see how all of the elements we've covered so far come together.

Prim's Algorithm

In the code below, we'll be repeating the searching process (using a while loop) until all the fields in the selectedVertices list are made true. As we said, the positiveInf values will act as a temporary minimum since every weight we intact with will be less than that value. We'll define two variables: start and end. These variables will act as the vertices to be connected and we are going to use them to fill in our MST matrix.

After defining these variables, we have to create two loops, so that we can move through the adjacency matrix of the general graph (the first loop is for the x-axis or the rows, and the second loop for the y-axis or columns). Before entering the second loop, we have to validate that the vertex given by the first loop is selected, which ensures it's part of the MST graph. We handle this with the if selectedVertices[i]: block in the code below.

When we start building the tree, none of the vertices will be selected, they are all False, so the loop will end before we even enter the second loop. For this reason, the variables start and end are initially defined as 0, and when we exit from the loop the Boolean value assigned to the end position will become True. As a result, one field of the mstMatrix will be filled with the existing minimum, and since the mstMatrix is symmetrical we can use the same trick on the adjacency matrix to fill in another field.

Now that we have a selected vertex, which is Step 1of Prim's Algorithm, let's discuss the second loop.

First, we'll use the symmetric matrix trick to move through the y-axis and examine the relationships between our selected vertex and other vertices. Our selected vertex will be compared with the other n vertices according to the following parameters:

  • The vertex given by i must have a path that connects it with vertex j (this means the weight in the (i,j) position of the adjacency matrix must be greater than zero).
  • The vertex j must be not selected (if it's already selected this can lead to a cycle, which is something we don't want since we're trying to construct a tree).

Given these two conditions, we can compare the edge weight of a given relationship with the general minimum of the MST. If the weight is less than the minimum, then it will become the new minimum and the variables start and end will receive the i and j values. If the weight is more than the minimum, then we keep searching through the remaining columns.

The start and end will populate the MST matrix, creating the tree we are looking for. We then have to repeat the process until all the vertices are selected.

Let's take a look at the code:

# While there are vertices that are not included in the MST, keep looking:
    while(False in selectedVertices):
        # We use the big number we created before as the possible minimum weight
        minimum = positiveInf

        # The starting vertex
        start = 0

        # The ending vertex
        end = 0

        for i in range(0,vertices):
            # If the vertex is part of the MST, look its relationships
            if selectedVertices[i]:
                # Again, we use the Symmetric Matrix as an advantage:
                for j in range(0+i,vertices):
                    # If the vertex analyzed have a path to the ending vertex AND its not included in the MST to avoid cycles)
                    if (not selectedVertices[j] and adjacencyMatrix[i][j]>0):  
                        # If the weight path analyzed is less than the minimum of the MST
                        if adjacencyMatrix[i][j] < minimum:
                            # Defines the new minimum weight, the starting vertex and the ending vertex
                            minimum = adjacencyMatrix[i][j]
                            start, end = i, j
        
        # Since we added the ending vertex to the MST, it's already selected:
        selectedVertices[end] = True

        # Filling the MST Adjacency Matrix fields:
        mstMatrix[start][end] = minimum
        
        # Initially, the minimum will be Inf if the first vertex is not connected with itself, but really it must be 0:
        if minimum == positiveInf:
            mstMatrix[start][end] = 0
            
        # Symmetric matrix, remember
        mstMatrix[end][start] = mstMatrix[start][end]

    # Show off:
    print(mstMatrix)
    
    # Here the function ends

Now, we just have to call the function we created. The line below prompts the user to enter a chosen number of vertices and then fills the adjacency matrix:

#Call to the function:
primsAlgorithm(int(input('Enter the vertices number: ')))

Let's test the code by running it, which will create our number of chosen vertices and fill in the graph adjacency matrix:

running prim's algorithm

We have 5 vertices and the edges weights, our input will look like this:

Enter the vertices number: 5
Enter the path weight between the vertices: (0, 0):  0
Enter the path weight between the vertices: (0, 1):  15
Enter the path weight between the vertices: (0, 2):  0
Enter the path weight between the vertices: (0, 3):  0
Enter the path weight between the vertices: (0, 4):  0
Enter the path weight between the vertices: (1, 1):  0
Enter the path weight between the vertices: (1, 2):  7
Enter the path weight between the vertices: (1, 3):  10
Enter the path weight between the vertices: (1, 4):  0
Enter the path weight between the vertices: (2, 2):  0
Enter the path weight between the vertices: (2, 3):  30
Enter the path weight between the vertices: (2, 4):  3
Enter the path weight between the vertices: (3, 3):  0
Enter the path weight between the vertices: (3, 4):  0
Enter the path weight between the vertices: (4, 4):  0

And for the output we got:

[[0, 15, 0, 0, 0], [15, 0, 7, 10, 0], [0, 7, 0, 0, 3], [0, 10, 0, 0, 0], [0, 0, 3, 0, 0]]

That's our adjacency matrix for the Minimum Spanning Tree. Producing the graph will give us this:

prim's algorithm output

And that's it! That's our MST for that graph. Remember that we can start with a random vertex, we don't necessarily need to start with the first one. If you want to challenge yourself, you can modify the code so that it takes a random number (in the correct range of course) as the starting vertex.

Conclusion

Prim's algorithm is not only efficient, but flexible when it comes to finding the Minimum Spanning Tree of a graph. The Python implementation is also really simple. MSTs are useful structures that can be applied in a wide variety of fields, making Prim's algorithm an incredibly important one.

You can access the source code from the guide on GitHub



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...