Introduction
Graphs are one of the most useful data structures. They can be used to model practically everything - object relations and networks being the most common ones. An image can be represented as a grid-like graph of pixels, and sentences can be represented as graphs of words. Graphs are used in various fields, from cartography to social psychology even, and of course they are widely used in Computer Science.
Due to their widespread use, graph search and traversal play an important computational role. The two fundamental algorithms used for graph search and traversal are Depth-First Search (DFS) and Breadth-First Search (BFS).
If you'd like to read more about Depth-First Search, read our Graphs in Python: Depth-First Search (DFS) Algorithm!
In this article we will go over the theory behind the algorithm and the Python implementation of Breadth-First Search and Traversal. First, we'll be focusing on node search, before delving into graph traversal using the BFS algorithm, as the two main tasks you can employ it for.
Note: We're assuming an adjacency-list implemented graph in the guide.
Breadth-First Search - Theory
Breadth-First Search (BFS) traverses the graph systematically, level by level, forming a BFS tree along the way.
If we start our search from node v
(the root node of our graph or tree data structure), the BFS algorithm will first visit all the neighbours of node v
(it's child nodes, on level one), in the order that is given in the adjacency list. Next, it takes the child nodes of those neighbours (level two) into consideration , and so on.
This algorithm can be used for both graph traversal and search. When searching for a node that satisfies a certain condition (target node), the path with the shortest distance from the starting node to the target node. The distance is defined as the number of branches traversed.
Breadth-First Search can be used to solve many problems such as finding the shortest path between two nodes, determining the levels of each node, and even solving puzzle games and mazes.
While it's not the most efficient algorithm for solving large mazes and puzzles - and it's outshined by algorithms such as Dijkstra's Algorithm and A* - it still plays an important role in the bunch and depending on the problem at hand - DFS and BFS can outperform their heuristic cousins.
If you'd like to read more about Dijkstra's Algorithm or A* - read our Graphs in Python: Dijkstra's Algorithm and Graphs in Python: A* Search Algorithm!
Breadth-First Search - Algorithm
When implementing BFS, we usually use a FIFO structure like a Queue
to store nodes that will be visited next.
Note: To use a Queue
in Python, we need to import the corresponding Queue
class from the queue
module.
We need to pay attention to not fall into infinity loops by revisiting the same nodes over and over, which can easily happen with graphs that have cycles. Having that in mind, we'll be keeping track of the nodes that have been visited. That information doesn't have to be explicitly saved, we can simply keep track of the parent nodes, so we don't accidentally go back to one after it's been visited.
To sum up the logic, the BFS Algorithm steps look like this:
- Add the root/start node to the
Queue
. - For every node, set that they don't have a defined parent node.
- Until the
Queue
is empty:- Extract the node from the beginning of the
Queue
. - Perform output processing.
- For every neighbour of the current node that doesn't have a defined parent (is not visited), add it to the
Queue
, and set the current node as their parent.
- Extract the node from the beginning of the
Output processing is performed depending on the purpose behind the graph search. When searching for a target node, output processing is usually testing if the current node is equal to the target node. This is the step on which you can get creative!
Breadth-First Search Implementation - Target Node Search
Let's first start out with search - and search for a target node. Besides the target node, we'll need a start node as well. The expected output is a path that leads us from the start node to the target node.
With those in mind, and taking the steps of the algorithm into account, we can implement it:
from queue import Queue
def BFS(adj_list, start_node, target_node):
# Set of visited nodes to prevent loops
visited = set()
queue = Queue()
# Add the start_node to the queue and visited list
queue.put(start_node)
visited.add(start_node)
# start_node has not parents
parent = dict()
parent[start_node] = None
# Perform step 3
path_found = False
while not queue.empty():
current_node = queue.get()
if current_node == target_node:
path_found = True
break
for next_node in adj_list[current_node]:
if next_node not in visited:
queue.put(next_node)
parent[next_node] = current_node
visited.add(next_node)
# Path reconstruction
path = []
if path_found:
path.append(target_node)
while parent[target_node] is not None:
path.append(parent[target_node])
target_node = parent[target_node]
path.reverse()
return path
When we're reconstructing the path (if it is found), we're going backwards from the target node, through it's parents, retracing all the way to the start node. Additionally, we might want to reverse the path for our own intuition of going from the start_node
towards the target_node
.
On the other hand, if there is no path, the algorithm will return an empty list.
Let's construct a simple graph, as an adjacency list:
graph = {
1 : [2, 3, 4, 5],
2 : [1, 3],
3 : [1, 2, 4, 6],
4 : [1, 3, 5, 6],
5 : [1, 4, 6],
6 : [3, 4, 5]
}
Now, say we'd like to search for node 6 starting at node 1:
path = BFS(graph, 1, 6)
print(path)
Running this code results in:
[1, 3, 6]
Now, let's take a look at a visual representation of the graph itself:
The shortest path between 1
and 6
is indeed [1, 3, 6]
. Though, you could also traverse [1, 4, 6]
which appears to be shorter (due to a diagonal path), and [1, 5, 6]
. These alternative paths are, fundementally, the same distance as [1, 3, 6]
- however, consider how BFS compares nodes. It "scans" from left to right and 3
is the first node on the left-hand side of the adjacency list that leads to 6
, so this path is taken instead of the others.
Note: There are cases in which a path between two nodes cannot be found. This scenario is typical for disconnected graphs, where there are at least two nodes that are not connected by a path.
Here's how a disconected graph looks like:
If we were to try and perform a search for a path between nodes 0
and 3
in this graph, that search would be unsuccessful, and an empty path would be returned.
Breadth-First Implementation - Graph Traversal
Breadth-First Traversal, is a special case of Breadth-First Search that traverses the whole graph, instead of searching for a target node. The algorithm stays the same as we've defined it before, the difference being that we don't check for a target node and we don't need to find a path that leads to it.
This simplifies the implementation significantly - let's just print out each node being traversed to gain an intuition of how it passes through the nodes:
def traversal(adj_list, start_node):
visited = set()
queue = Queue()
queue.put(start_node)
visited.add(start_node)
while not queue.empty():
current_node = queue.get()
print(current_node, end = " ")
for next_node in adj_list[current_node]:
if next_node not in visited:
queue.put(next_node)
visited.add(next_node)
Now, let's define a simple graph:
graph = {
1 : [2, 3],
2 : [1, 3, 5],
3 : [1, 2, 4],
4 : [3],
5 : [2]
}
traversal(graph, 1)
Finally, let's run the code:
1 2 3 5 4
Step by step
Let's dive into this example a bit deeper and see how the algorithm works step by step. Here's a picture of the graph so we can easily follow the steps:
As we start the traversal from the start node 1
, it is put into the visited
set and into the queue
as well. While we still have nodes in the queue
, we extract the first one, print it, and check all of it's neighbours.
When going through the neigbours, we check if each of them is visited, and if not we add them to the queue
and mark them as visited:
Steps | Queue | Visited |
Add start node `1` | [`1`] | {`1`} |
Visit `1`, add `2` & `3` to Queue | [`2`, `3`] | {`1`} |
Visit `2`, add `5 `to Queue | [`3`, `5`] | {`1`, `2`} |
Visit `3`, add `4` to Queue | [`5`, `4`] | {`1`, `2,` `3`} |
Visit `5`, no unvisited neighbours | [`4`] | {`1`, `2,` `3`, `5`} |
Visit `4`, no unvisited neighbours | [ ] | {`1`, `2,` `3`, `5`, `4`} |
Time complexity
During Breadth-First Traversal, every node is visited exactly once, and every branch is also viewed once in case of a directed graph, that is, twice if the graph is undirected. Therefore, the time complexity of the BFS algorithm is O(|V| + |E|), where V is a set of the graph's nodes, and E is a set consisting of all of it's branches (edges).
Conclusion
In this guide, we've explained the theory behind the Breadth-First Search algorithm and defined it's steps.
We've depicted the Python implementation of both Breadth-First Search and Breadth-First Traversal, and tested them on example graphs to see how they work step by step. Finally, we've explained the time complexity of this algorithm.
from Planet Python
via read more
No comments:
Post a Comment