Sunday, November 28, 2021

Stack Abuse: Graphs in Python: Breadth-First Search (BFS) Algorithm

Introduction

Graphs are one of the most useful data structures. They can be used to model practically everything - object relations and networks being the most common ones. An image can be represented as a grid-like graph of pixels, and sentences can be represented as graphs of words. Graphs are used in various fields, from cartography to social psychology even, and of course they are widely used in Computer Science.

Due to their widespread use, graph search and traversal play an important computational role. The two fundamental algorithms used for graph search and traversal are Depth-First Search (DFS) and Breadth-First Search (BFS).

If you'd like to read more about Depth-First Search, read our Graphs in Python: Depth-First Search (DFS) Algorithm!

In this article we will go over the theory behind the algorithm and the Python implementation of Breadth-First Search and Traversal. First, we'll be focusing on node search, before delving into graph traversal using the BFS algorithm, as the two main tasks you can employ it for.

Note: We're assuming an adjacency-list implemented graph in the guide.

Breadth-First Search - Theory

Breadth-First Search (BFS) traverses the graph systematically, level by level, forming a BFS tree along the way.

If we start our search from node v (the root node of our graph or tree data structure), the BFS algorithm will first visit all the neighbours of node v (it's child nodes, on level one), in the order that is given in the adjacency list. Next, it takes the child nodes of those neighbours (level two) into consideration , and so on.

This algorithm can be used for both graph traversal and search. When searching for a node that satisfies a certain condition (target node), the path with the shortest distance from the starting node to the target node. The distance is defined as the number of branches traversed.

BFS animated

Breadth-First Search can be used to solve many problems such as finding the shortest path between two nodes, determining the levels of each node, and even solving puzzle games and mazes.

While it's not the most efficient algorithm for solving large mazes and puzzles - and it's outshined by algorithms such as Dijkstra's Algorithm and A* - it still plays an important role in the bunch and depending on the problem at hand - DFS and BFS can outperform their heuristic cousins.

If you'd like to read more about Dijkstra's Algorithm or A* - read our Graphs in Python: Dijkstra's Algorithm and Graphs in Python: A* Search Algorithm!

Breadth-First Search - Algorithm

When implementing BFS, we usually use a FIFO structure like a Queue to store nodes that will be visited next.

Note: To use a Queue in Python, we need to import the corresponding Queue class from the queue module.

We need to pay attention to not fall into infinity loops by revisiting the same nodes over and over, which can easily happen with graphs that have cycles. Having that in mind, we'll be keeping track of the nodes that have been visited. That information doesn't have to be explicitly saved, we can simply keep track of the parent nodes, so we don't accidentally go back to one after it's been visited.

To sum up the logic, the BFS Algorithm steps look like this:

  1. Add the root/start node to the Queue.
  2. For every node, set that they don't have a defined parent node.
  3. Until the Queue is empty:
    • Extract the node from the beginning of the Queue.
    • Perform output processing.
    • For every neighbour of the current node that doesn't have a defined parent (is not visited), add it to the Queue, and set the current node as their parent.

Output processing is performed depending on the purpose behind the graph search. When searching for a target node, output processing is usually testing if the current node is equal to the target node. This is the step on which you can get creative!

Breadth-First Search Implementation - Target Node Search

Let's first start out with search - and search for a target node. Besides the target node, we'll need a start node as well. The expected output is a path that leads us from the start node to the target node.

With those in mind, and taking the steps of the algorithm into account, we can implement it:

from queue import Queue

def BFS(adj_list, start_node, target_node):
    # Set of visited nodes to prevent loops
    visited = set()
    queue = Queue()

    # Add the start_node to the queue and visited list
    queue.put(start_node)
    visited.add(start_node)
    
    # start_node has not parents
    parent = dict()
    parent[start_node] = None

    # Perform step 3
    path_found = False
    while not queue.empty():
        current_node = queue.get()
        if current_node == target_node:
            path_found = True
            break

        for next_node in adj_list[current_node]:
            if next_node not in visited:
                queue.put(next_node)
                parent[next_node] = current_node
                visited.add(next_node)
                
    # Path reconstruction
    path = []
    if path_found:
        path.append(target_node)
        while parent[target_node] is not None:
            path.append(parent[target_node]) 
            target_node = parent[target_node]
        path.reverse()
    return path

When we're reconstructing the path (if it is found), we're going backwards from the target node, through it's parents, retracing all the way to the start node. Additionally, we might want to reverse the path for our own intuition of going from the start_node towards the target_node.

On the other hand, if there is no path, the algorithm will return an empty list.

Let's construct a simple graph, as an adjacency list:

graph = {
  1 : [2, 3, 4, 5],
  2 : [1, 3],
  3 : [1, 2, 4, 6],
  4 : [1, 3, 5, 6],
  5 : [1, 4, 6],
  6 : [3, 4, 5]
}

Now, say we'd like to search for node 6 starting at node 1:

path = BFS(graph, 1, 6)
print(path)

Running this code results in:

[1, 3, 6]

Now, let's take a look at a visual representation of the graph itself:

search graph

The shortest path between 1 and 6 is indeed [1, 3, 6]. Though, you could also traverse [1, 4, 6] which appears to be shorter (due to a diagonal path), and [1, 5, 6]. These alternative paths are, fundementally, the same distance as [1, 3, 6] - however, consider how BFS compares nodes. It "scans" from left to right and 3 is the first node on the left-hand side of the adjacency list that leads to 6, so this path is taken instead of the others.

Note: There are cases in which a path between two nodes cannot be found. This scenario is typical for disconnected graphs, where there are at least two nodes that are not connected by a path.

Here's how a disconected graph looks like:

disconnected graph

If we were to try and perform a search for a path between nodes 0 and 3 in this graph, that search would be unsuccessful, and an empty path would be returned.

Breadth-First Implementation - Graph Traversal

Breadth-First Traversal, is a special case of Breadth-First Search that traverses the whole graph, instead of searching for a target node. The algorithm stays the same as we've defined it before, the difference being that we don't check for a target node and we don't need to find a path that leads to it.

This simplifies the implementation significantly - let's just print out each node being traversed to gain an intuition of how it passes through the nodes:

def traversal(adj_list, start_node):
    visited = set()
    queue = Queue()
    queue.put(start_node)
    visited.add(start_node)

    while not queue.empty():
        current_node = queue.get()
        print(current_node, end = " ")
        for next_node in adj_list[current_node]:
            if next_node not in visited:
                queue.put(next_node)
                visited.add(next_node)

Now, let's define a simple graph:

graph = {
    1 : [2, 3],
    2 : [1, 3, 5],
    3 : [1, 2, 4],
    4 : [3],
    5 : [2]
}
traversal(graph, 1)

Finally, let's run the code:

1 2 3 5 4

Step by step

Let's dive into this example a bit deeper and see how the algorithm works step by step. Here's a picture of the graph so we can easily follow the steps:

traversal graph

As we start the traversal from the start node 1, it is put into the visited set and into the queue as well. While we still have nodes in the queue, we extract the first one, print it, and check all of it's neighbours.

When going through the neigbours, we check if each of them is visited, and if not we add them to the queue and mark them as visited:

Steps Queue Visited
Add start node `1` [`1`] {`1`}
Visit `1`, add `2` & `3` to Queue [`2`, `3`] {`1`}
Visit `2`, add `5 `to Queue [`3`, `5`] {`1`, `2`}
Visit `3`, add `4` to Queue [`5`, `4`] {`1`, `2,` `3`}
Visit `5`, no unvisited neighbours [`4`] {`1`, `2,` `3`, `5`}
Visit `4`, no unvisited neighbours [ ] {`1`, `2,` `3`, `5`, `4`}

Time complexity

During Breadth-First Traversal, every node is visited exactly once, and every branch is also viewed once in case of a directed graph, that is, twice if the graph is undirected. Therefore, the time complexity of the BFS algorithm is O(|V| + |E|), where V is a set of the graph's nodes, and E is a set consisting of all of it's branches (edges).

Conclusion

In this guide, we've explained the theory behind the Breadth-First Search algorithm and defined it's steps.

We've depicted the Python implementation of both Breadth-First Search and Breadth-First Traversal, and tested them on example graphs to see how they work step by step. Finally, we've explained the time complexity of this algorithm.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...