 Hi and welcome to this next lecture on data structures and algorithms. In this session, we will introduce you to graph traversal algorithms. We have already discussed graphs and different data structure representations for graphs. How do you traverse such graphs? How do you enumerate the vertices and edges of these graphs in a non-redundant and useful manner? Today, we will discuss one such useful algorithm, graph traversal, which is BFS or breadth-first search. So, there are two very well-known techniques for graph traversal when you begin at some source node S. What we will discuss today, breadth-first search, as the name says, it discovers all vertices at a particular distance D from S before discovering any vertices at distance D plus 1. So, the idea of breadth-first search is to discover nodes in the graph slice by slice. So, this is node at depth 1 or depth 0, this is node at depth 1. These are nodes at depth 2 and so on. On the other hand, breadth-first search searches deeper into the graph wherever possible. So, what does this translate to? Well, this means that I am going to first traverse this path as much as possible than this path, than this path and then so on. So, let us do more of breadth-first search. So, the principle idea in breadth-first search is to explore the graph and classify nodes and edges. This is also the idea behind depth-first search, but the classification and the order in which the nodes are explored is different. So, in breadth-first search, you keep track of unexplored nodes and edges to begin with. So, unexplored stands for any node or edge that has not been explored so far. So, the starting point in BFS is traverse through all the vertices of the graph and for each vertex label it as unexplored. Do likewise for every edge in the graph, label every edge as unexplored. And then you start with an unexplored vertex, do a breadth-first search and having completed breadth-first search, go back to a next vertex that is yet unexplored. Now, BFS, the call to BFS on vertex V might itself land up marking many nodes and edges through BFS in G as explored. So, we have different kinds of explored nodes and only after coming back from this call of BFS, when we look at another vertex and check for it being explored or unexplored and find that it is actually unexplored, would we actually explore that again? So, the principle idea here is that your graph might have several disconnected components. So, this is component 1, this is component 2 and what you are doing in the process of calling BFS is calling on this node, marking nodes and then you will find that this node is still unexplored and call BFS here. Now, what does the call to BFS itself do? Well, it marks the nodes within this graph and the other edges within this graph as visited or discovered edges. So, let us do some highlighting, visiting of a node or visiting of an edge within the call to BFS, you would expect to mark these nodes and these edges as visited and discovered respectively. It is also possible that there are some edges across these calls to BFS. So, it is possible that this particular node here is connected to some other node here. Well, that we would call across edge, this is an edge to a sibling or to a child shared with sibling. So, more about this in the next slide. So, this is BFS as applied to a single connected component, let us illustrate this with an example. So, when we begin the BFS on vertex V, let us call this the vertex V, you start with the new empty sequence S0. Now, S0 is going to denote the set of nodes to expand, nodes or vertices to expand the next. So, initially you just have V, so S0 is V and you let I be 0, so S0 or S1, S2 all of them denote set of nodes or vertices to expand next. So, I am going to more generally replace S0 with S i, i is initialized to 0. Now, you are going to pick an element from S i which is V and parallelly start building up S i plus 1 which is the set of nodes or vertices to expand next. So, for each element V, we have only one element to begin with, you look at the edges that are incident on V, we are discussing undirected graphs to begin with. Now, you check the label of each of these nodes, the nodes that are labeled black here or black color are all actually unexplored, all these are unexplored. So, you look at E which is unexplored, get the other corresponding vertex for this edge E, let us say this is W, if W is unexplored which is the case, you are going to set the label of E to discovered and just set the label of W to discovered. Let us do precisely that E, W and in fact, you are going to add each of these to S i plus 1 which is now S 1, S 1 will now contain this node or 2, let us just give them numbers 1, 2, 3, 4 and 5. So, S 1 will contain the node number 2. We need to similarly check for other unexplored nodes and vertices. So, this edge is unexplored, so is this vertex and we are going to add 3 to the list. Is there any other edge incident on V? Well none, this completes our S 1. Now, we are going to explore the nodes in S 1 next. So, S 0 is kind of ignored, you go on to S 1 iterate over all nodes here. So, we go to node number 2. What do we do with node number 2? Well it turns out that node number 2 has one incident edge with a node 3 which is already labeled. So, if get label 3 is not unexplored, then you set the label of the edge to cross and that is what we will do now. We will basically set the label of this edge to cross and proceed. We look at the other edge and the corresponding node is unexplored. So, this will mean both of them are marked as discovered and visited respectively. What do we do next? Well, we look at the neighbors of 4, we find that 3 is already visited. So, marked red. We can then move on to 3 and find that its neighbor is visited through a discovered edge. This is what you will get at the end of the execution of this algorithm. So, note that we have slightly different names for visited nodes and discovered edges because edges that are not discovered, I mean they get discovered they lead to visited nodes. But they can also lead to nodes that were visited earlier. So, the idea of blue discovery edge is that which leads to yet as yet undiscovered node. Whereas the meaning of a red or a cross edge is that it leads to an already discovered node. So, what does a cross mean here? Well, basically a cross means it is an edge to a sibling or descendant of a sibling. And that is precisely why the vertex at the other end of the cross edge turns out to be already visited. So, of course, you can complete and enumerate your S2 and S3 and so on. Now, do you really need to instantiate so many lists? Is it possible to manage with a fewer list? In fact, can we actually do with a single queue? It turns out that we can. And what I do now is present a simpler version of BFS. This will be a BFS with less bookkeeping. And I explain what less bookkeeping means. One big difference here is we are going to manage with a single queue. So, S is initialized to be empty and of course, immediately add V to it. So, summarily it is the same as what we have seen before. The next thing we do is set the label. So, the label of V will be basically visited because we know that we visited it. But we are going to distinguish between visited nodes and expanded nodes. So, yes indeed we have visited it. We will mark it so. But I will introduce one more color to denote nodes that are already expanded. V has not yet been expanded. It has been added to the queue. So, continuing we are going to now iterate on queue. So, while not S dot is empty, we are going to now iterate over all the nodes which are adjacent to V. So, this can be done by iterating over all edges. This is the same as before and like before I am going to say W is E dot get other vertex for V. I am going to check the label as before again if get label W is unexplored then I am going to set the label now. I will not bother much about edges. I will just set the label of W to that of a visited node and what does that mean? It means I am going to add this node to the queue. So, let us do that. But before I do that I am going to also update some arrays. So, I am going to keep track of the depth of node, the depth of node from V. So, depth of V is 0 and I will set the depth of W as depth of V plus 1. I will also keep track of where we got to V from. So, pi of V is nil whereas pi of W is V and finally as promised I am going to add my node W to the queue. So, after having incremented the depth I am going to say S dot append W having appended W to the queue and having iterated on all the edges. What I will do next is mark that V has been already explored. All the neighbors of V have been added to the queue their depth has been updated and so has been their ancestor. So, this we do by introducing a new label and stating the following set label of V not to visited but to explore. This was a label that we have to explicitly introduce because we are managing now with one queue. So, this explore is a new label and what is this explore saying? So, let me explicitly state what explored means. Export for any vertex V the label of V is explored means all the neighbors of V have been either visited or explored. So, an explore node cannot have an unexplored neighbor. So, this is also form of that first search but here we manage with a single queue with one more additional bookkeeping. Some of the new statements we have introduced that include the statement about the depth or the statement about the parent pi. These will be useful when we make use of BFS for purposes such as finding shortest path from new V and in fact finding the distances as well for the shortest path. Of course, you could have done all these bookkeeping even in the original BFS algorithm we presented here. So, I will leave that note and proceed. So, the optional steps here are as follows can set the D of V to be 0 and pi of V to be nil. You could do the same thing here D of W is D of V plus 1 pi of W is V or E if you want to keep track of the edge. Let us rely breakfast search to slightly more complex graph and this node here labeled in blue is a node we want to begin with. So, it is already visited. So, you discover its edge and visit node A, discover another edge visit node B, node E and then finally F. Now, the next node that you get from the queue basically you get it from S1. So, note that A, B, E and F all belong to S1. So, you basically DQA from S1 and look at its neighbor the only neighbor that is unexplored is B. Well D is already explored or visited through the discovered edge here, but whereas this edge is not yet discovered. However, the node B is already visited. So, that is why it was important to assign labels to edges. So, as to avoid enumerating this discovered or blue edge and restrict our attention to this undiscovered or that edge that goes to B. Now, we have labeled this edge as cross because B is already visited by another path. Continuing on well there is one more cross edge from B to E another cross edge a new discovered edge another cross edge another cross edge. So, there are exactly 5 cross edges. Now, these edges could be very useful in determining a tree for the graph spanning tree for this graph. So, we know by observation that these discovery edges they help you span all the nodes in this graph. So, the blue edge discovery edges form spanning tree this is a spanning tree there could be other spanning trees and if you also do the optional step of keeping track of distances we note that the distance of D node D was 0 distance of A is 1 and so on. And let us consider a slightly far off node here distance and the distance of this node say G. Now, distance of G was obtained from the distance of E that is how it was traversed its distance of E plus 1 distance of E in turn is distance of D plus 1 and we realize the distance of D is 0. So, distance of E is 1 and therefore, substituting we get the distance of G to be 2. Note that you could have approached G via B and E which would have given you a path length of 3. However, that is not the path that BFS gives us because the cross edge is never a part of BFS. Thus, we know how BFS helps you find the shortest path to any node in the graph from the source vertex that is D. And in fact, you can find the shortest path as well. So, what is the shortest path to G? Well, the shortest path to G can be obtained by looking at the ancestor the pi of G is E. The pi of E is D and D does not have any parent. So, look at the parent of G and then E and D and then walk backwards. So, that gives you to D E G and has length D G equals 2. And in case you want to be more explicit, you could actually write down this path pi of E comma pi of G. So, if G V is a connected component of a graph G containing vertex V, then BFS G V starting at V visits all the vertices and edges of G V. It labels some edges as discovered and the others as cross. For each edge label discovered, you put them together, you get a spanning tree T V of G and in case your G happens to have several disconnected components V 1, V 2, then T V 1 union T V 2 will give you a spanning forest of G. So, the tree you get from each of these is basically a subset of corresponding graph. Now, for each vertex V in S particular S I, so the S I was the sequence that we populated at different levels. The path from V to U in T V has I edges. So, the S I is basically corresponded to different slices in the same graph. So, S 1, S 2, S 3, these are the slices in a single graph. And what we are saying here, the path from V to U any particular U in T V has I edges if U belongs to S I and every path from U to V in G V has at least I edges. Well, what this means is that the path from U to V is in T V is a shortest path. This is because path from V to U which belong to S I is the shortest path. Analysis is fairly straightforward. For most graph algorithms, we will basically resort to some kind of collective analysis. So, the kind of analysis, collective analysis we will talk about is called aggregate analysis. And the aggregate analysis is by looking at the aggregate statistics of number of accesses. So, each vertex is labeled twice. Initially, it is labeled as unexplored and the moment you visit a label, you label it as visit. In the alternative algorithm that you proposed, each label is labeled thrice. One has unexplored, once has visited and the third time has explored. It is a constant factor. Coming back to the initial algorithm, each edge is also labeled twice, once has unexplored, the other has discovered or has crossed. So, an edge is never labeled as both discovered and crossed. It is either of them. And then each vertex is inserted once into S I for some I. It cannot be inserted into multiple S I's, nearly because vertex is inserted only when it migrates from being unexplored to visited. It happens exactly once. And you look at all the incident edges for U, exactly once for each edge. This will mean in aggregate analysis that you call this incident edges summed over all vertices U times the number of edges incident on U. And this is nothing but the number of edges itself. So, the total time is basically order of V, which comes from the first three steps and four which is order of E. So, total time is order of V plus E. Now, BFS has several interesting applications and all of these applications are run in order of V plus E time. The first is computing a spanning forest of G. We have already seen that this is obtained by a union of all the discovered edges. So, all you need to do is run through all the edges once. And in fact, you can run through them in the order in which the vertices are explored to compute a spanning forest of G. Given two vertices of G, find a path if there exists one in G between them with the minimum number of edges. And this basically means if you are looking at two vertices U and V, then you just need to call BFS on G rooted at U and DV is length of that shortest path. And the cascade of pi V is pi of pi of V and then V is exactly the shortest path. To compute the connected components of G, recall that we iterated upon all the vertices V and for each vertex V that was yet unexplored, we invoked BFS. So, number of connected components is basically all the nodes V for which BFS rooted at V was invoked. You could also determine if G is a forest and then if not a forest, find a simple cycle in G if there exists one. This is not very difficult if G is undirected, becomes messy with breadth-first search if G is directed. And I leave this as a homework problem, but there are other ways to find detect cycles and existence of cycles in a graph. This we will discuss in the context of BFS, depth-first search, which is a natural choice for finding a cycle. BFS is also a very classic strategy for determining a solution to the Rubik's cube problem, where at any particular configuration, you explore all the possible next configurations before diving deep into any one specific alternative. So, BFS is a natural choice for exploring solutions to the Rubik's cube problem. Thank you.