 This is a map of Los Angeles. I want to route from Santa Monica to Griffith Observatory. Google estimates it'll take 39 minutes. But how does it do this? That's exactly what we'll be looking at today. So this is going to be a three-pass explanation where we start out with the big picture and drill down into the details. So buckle up. Now before we get started though, can you please give this video a like and please do subscribe for videos like this. Because I need to grow an audience and I want to grow an audience. Please do subscribe. Thank you and like. Back to the video. Let's start pass one with defining our problem. We want to determine the time taken to go from Santa Monica to Griffith Park. There are many points of uncertainty here that make it difficult to predict directly. So we can break this route down into road segments. And now we need to find the time to traverse each road segment, then sum them up in order to get the ETA. Simple enough, I think it is. Now what kind of model do we want here? The most straightforward solution is using a feedforward neural network. The input would be some encoded representation of the road segment and the output could be a single neuron that gives the time to traverse. So theoretically, I could pass in all the road segments along with the path one at a time and then get the corresponding ETAs one at a time or rather corresponding time to traverse one at a time. And then we can sum those times in order to get ETA of the total trip. Sounds like a nice implausible solution. Or is it? No, there's one problem here. Feedforward neural networks require the samples to be independent of each other. But with road segments, traffic on one road can easily influence the traffic and ETA of the neighboring roads. So there is some dependence between the samples. A good remedy for this would be to use neural networks that can parse sequence data like recurrent neural networks or even a transformer. In the transformer case, we could potentially use the encoder. Pass in the road segment simultaneously, get their corresponding embedding vectors simultaneously and pass each vector through a feedforward neural network to get the ETA for every road segment. Now I haven't seen this information online. This is just me speculating, but I do think that this is a possible implementation solution. But the main downside here is that we would need a lot of data. Transformers have no understanding of how road segments are related to each other. It needs to learn this relationship from scratch with a ton of examples because even in this case, two road segments that are extremely far apart from each other have equal say in being related to each other than two segments that are extremely close to each other that we would expect. If a transformer needs to understand this on its own, it's going to take a lot of compute power, a lot of time and a lot of data. And this only becomes infeasible for super large networks like roads. The cool thing here though is that we have a starting point. We know that roads adjacent to each other are more related than roads that are nowhere near connected. This is important prior information that we can potentially use with transformers. And the most intuitive way to surface this knowledge of roads is with graphs. So graphs consist of vertices and edges. In the context of Google Maps, let's take every road segment to be a vertex. And if two road segments are connected to each other, their corresponding nodes will be connected by an edge in the graph. So we can essentially create a graph for all of Los Angeles or any section of any city that you want. The input to the graph network is a node network. Once trained, we will have access to the embeddings of the road segments. Each of the road segments along with the desired path is piped to a simple feed for neural network. And the output is a single neuron that describes the time to traverse each road segment. We sum up the times to get the trip ETA. So that's a big overview and all for pass one that we have here where we intuitively define the problem and have a plausible model architecture. Now let's get into some details with pass two. For the graph network, we take every road segment as a node. And they are connected by an edge if the roads are connected. And the goal now is to learn the embeddings for the nodes. That is the road segments. And this is done by an iterative process called message passing. So let's talk about that with a much simpler example. Consider a graph with just two nodes A and B that point to a third node C. We're going to perform message passing for node C. The two sub steps here are first is create the message and then update the state of node C. Creating the message for node C involves taking the inbound nodes to C and applying some operation on the vectors. So it could be something as simple as a summation operation. So essentially we would just be summing the vectors A and B. Next we take this message vector, take the current vector C and pass it into another function to get the new state of C. This could be again as simple as another addition function. In the end, we just need an output of a vector that is representative of the embeddings of C. But perhaps the summation operations that we're using here might be a little too weak to actually get there and actually to get the embeddings that we need. But instead of addition operations in both places, we could potentially use transformers instead. So how this would work would be that, you know, we can take the two vectors that is for node A and node B that are inbound to C and then we can take information about the connecting edges, which would be AC and BC. We could throw all of this into the transformer encoder and this delivers the encodings of these vectors simultaneously. Next they are used by the transformer decoder along with the current node vector C in order to get the next state of node vector C. This is most likely to be a better in-term embedding of C than just like, you know, simple addition operations. And this entire flow that I just now described is message passing for one node that is node C and only for one time step. And this process happens simultaneously for all nodes in the graph network because each node state, the next state I should say, only depends on its previous state. In this way, every node's neighbors are encoded into its embeddings and we can execute this multiple times to get better and better embeddings. So on the first go of message passing, every node knows about its neighbors. On the second iteration, every node knows about its neighbor's neighbors and it's encoded into the embedding. And on the third iteration, even their neighbors are encoded into every node's embedding and so on. In the context of row networks, message passing allows each road vector to contain information about its adjacent roads, which intuitively makes sense. After some X passes of message passing, we have the road network embeddings then. Now, since the orientation of the roads doesn't change much unless some major construction happens, we can store these road embeddings into a lookup dictionary that we can use during inference time. Now these road vectors can be used to store road vectors contain information about location of the road itself and its relationship with the other roads. But we might need some more real-time information to get more accurate results, such as like traffic information, speed limit information, accidents. All of this can be incorporated too. And this is where we pass all of this information into a feed forward neural network to get the most up-to-date time to traverse the segment. I hope all of that was clear and this completes pass two of the explanation of the core of message passing. Now on to pass three. How do we train this entire behemoth of a network? What I mean by this is, how do we learn the parameters of the two major models used here? The first being the transformer, which gets the encoding of the road segments via message passing. And then how do we also train the network, which uses the road segment embeddings to compute the time to traverse the said road segment? So how do we train this? So typically training happens via back propagation. And in order to train, we also need an objective to minimize or optimize. In this case, when a car actually traverses the route, we have the ground truth. And so the loss for this sample as a Santa Monica to Griffith would be the absolute difference between the actual time taken to traverse the route and the sum of the model outputs for all the road segments that are along this route. So it's absolute minus predictions. And we want this number to be as low as possible. And lookie here, we now magically have an objective function that we want to minimize. And we can minimize this via back propagation, this entire behemoth of a network is still a network that is connected to each other. So we can learn by back propagation of the weights through the feedforward net all the way back to the transformer network. Sound fun? Sweet. Now to wrap up, how exactly do we make inference and predictions of ETA? Well, step one would be to first determine the start and end points on your map. Step two, use some algorithm to find a path. Now, this doesn't have to be the shortest path, it just has to be some path. I haven't touched up on this part much, but we could find the shortest path in a similar way that we're doing here. But honestly, this might be a little convoluted and it might warrant its own video. Let me know if you want that video. For now, let's just say that we have access to the path that we want to take. Step three would be to determine the road segments along the path. Step four, during the train phase, we could have stored all the road segments and embeddings into a dictionary, like I mentioned before. So during inference now, all we need to do is just reference each road segment by ID to that dictionary store and get the corresponding road segment embedding or road segment embeddings. Step five is query some real time features of traffic, accident speed limits and pass them all along with the embedding vector into a feed for neural network to get the expected time of traversing that segment. And step six would be to sum all of the times to get the ETA from start to finish. And that's it for the main algorithm. But wait, there's more. In order to get more accurate time estimates, we can further defy these road segments into smaller segments. Typically, this isn't feasible since the computation power increases by several fold on doing so, but it's Google, they can do it and they mostly do. And it's because of this that their ETAs are pretty stellar and on point. And that's it. Please do note though that a lot of the ideas presented in this video are pure speculation. They don't really mention too much of transformers being used anywhere in their networks, but I do think that all the ideas presented are a plausible solution of how Google could actually present and come up with ETAs. With all the scant resources that have available and I have referenced for this video online, I put them in the description down below. But do let me know your thoughts. Do you think this is how Google could work? I would love to know. Until next time, take it easy. Please do give this video a like, share and subscribe and I will see you very soon. Take care.