 Okay, so hi everybody, we are going to have Niharika go on next and she'll be speaking about congestion aware routing for multi-class vehicles. If you have connectivity issues, please feel free to watch the video at a separate tab, but do leave your whole concession open so that you can come back here for live Q&A. So we'll get started. Okay, third time to chat. Hello everyone, I am Niharika Srivastava and thank you for joining me today. Today I am going to talk about how we can potentially predict a traffic jam by providing congestion aware routes to every passenger in the network and basically avoid situations like these. Fell for a little background. The problem of congestion arises because every passenger inherently opts for the shortest distance route from their source to destination. So with an increase in the number of vehicles on the road over the years and with capacities of roads being approximately constant, this results in increased travel times and road blockages on intersections in the network. This also results in underutilization of cities intricately built road network. Well, I would like to introduce a concept called multi-class fleets, which also serves as a major crux in our framework for implementing congestion aware routing for the entire system. So the concept of multi-class fleets allow breaking a customer trip in three classes. We have the first mile, middle mile and the last mile. The first and the last mile can use micromobility options such as walking, e-scooters and bikes and the middle mile uses fast speed vehicles such as cars, private taxis or even public transports like buses. So the middle mile is used to cover at least 90% of an entire customer trip. And the usage of this is that by using a multi-class setup like this in an optimal combination, a customer can easily maneuver through crowded paths and decrease their overall travel time. This also allows a customer flexible options for their preferred mode of transport on the basis of accessibility and cost. So for example, a customer starts walking from their source for about 250 meters and then they take a bus for about 15 minutes after which they bike away to their destination for 500 meters thereby completing their entire trip in three phases. So let us formulate our problem in a more defined way. We essentially have to route customer trips in a congestion aware manner using this previously explained multi-class fleet of vehicles. And we also want to provide optimal transit points for each trip. Now what this means is that if you look at this figure, let's say a person wants to go from A to B and they have two possible route options. A, C, B and A, D, B. But they can walk to the nearest bus station at C and then take a bus from C to B having their total trip time is 34 minutes because this bus station is really popular because it is closer to A. Or they could have walked a little bit further to their next nearest bus station at B and then they could have us from B to B having their total trip time is 30 minutes thereby saving four minutes of overall travel time. So you see that in this network, the optimal path from A to B is A, D, B and the optimal transit point is B and not C. Transit point means that it is a point where you stop taking one mode of transport. One mode of transport and start taking another. So in this case, we have stopped walking as for our first mile and we have started taking a bus for our middle mile. We also want to implement this entire network in the form of a social model. So the current state of the art basically uses congestion app provides congestion aware routes to just one customer. And that is not a social model. It implements a user's selfish or user centric method. So for an example, let's say you are checking Google Maps to look for a route to your destination, which is absolutely congestion free. But while you are checking for a route, so are about 100 other people and you all decide on the same route which shows as congestion free. But when you actually take that route, the hundred other people also decided to take the same route simultaneously with you, creating a traffic jam on that route and which will increase your overall travel time for every passenger in the trip. So this congestion was provided wasn't in real time and wasn't for the entire system. We want to create this framework and scale it to the entire system by creating a system equilibrium, which incorporates every person in the networks request. So let me introduce another concept called geographic information system or just what that is that just is basically computer software that helps us to extract and analyze geographic elements for research purposes. So as you have guessed already, transportation analysis comes under this. There are various open source libraries in Python that help us to help us to analyze this just information for our research work. So the first library that I have used is actually called OSM NX. What that does is that OSM NX helps us to retrieve network or street networks from the OpenStreetMap database and then further analyze elements of it using other libraries. So one thing to notice that this entire framework was validated using Singapore Street Network and we therefore extracted Singapore Street Network from OpenStreetMap using OSM NX. Over here we see that these are all the nodes in the Singapore graph and this is the edge or the edges in Singapore graph. So OSM NX helps us to retrieve networks which are tagged either only for driving or maybe just pedestrian networks or which are only used for bicycles. This graph actually is only a drivable network because we are calculating congestion for roads which can only be drived upon. The different colors show if an edge is either a primary road or a highway or residential road and so on. The next library that we are using is called Geopandas and it is actually an extension of the data science library pandas. What it does is that it helps us to visualize the previously shown nodes and edges in form of data frames. So we have one data frame for nodes and one data frame for edges. So over here I have shown an edge data frame for the previously shown edge graph. So you can see that let's say you and me refer to the starting and ending nodes for an edge. Key represents which lane it is. So it is possible that in complex road networks such as cities or countries there could be multiple lanes within two nodes. So if key is zero that means it's the first lane between the edge nodes you and me. OSMID is just a primary ID needed to understand this entire road for the data frame. Name specifies the name of that particular edge. So in this case, Tampines Avenue 8 is the actual name of this edge in Singapore. Highway shows whether this road is either a primary road, a tertiary road, a residential road, etc. Maximum speed specifies the speed limit that is allowed for any vehicle to go on that road and this is decided upon government regulations. Length is in meters and travel time refers to free flow time. That is the travel time required to go through this edge UV when there are absolutely no vehicles present on that road. So that is free flow time and location consists of your latitude and longitude points of that particular edge. So now you have your data frames of nodes and edges separately. You can combine them to actually make a graph for further analysis and this is done by a library called Network X. Network X combines these two data frames, creates a graph G consisting of edges and nodes and helps us to implement algorithms such as the shortest distance route like Dijkstra or Bellman Ford or just other things like nearest neighbor search and so on. And the last but not the least, we are using R tree. So R tree is actually a spatial indexing library. What it does is that it basically creates a rectangle on every geographically element on the map. And by either using set operations like intersectional union, you can find neighboring elements for one certain element. So you could, for example, find the nearest node or nearest or all the neighboring nodes for a certain node given a certain radius. Alright, so let's start with actually implementing this congestion or wear out ruling framework. What are we actually doing? Well, the first and foremost step is to extract data and the first kind of data that we are extracting is network data. So we use OpenStreetMaps database like explained before using all these just libraries and we extract and manipulate and process the data. We extract only drivable, drivable networks road network because we want to calculate congestion on that and we also extract road networks that use pedestrian networks as well because we are trying to incorporate a multi class setup. The next kind of data that we are going to collect is actually called a traffic speed band data set and it looks something like this. What is happening is that traffic speed band data set is something that is available by Singapore's government. It is freely available. It shows real time average observed speed at a certain edge for every edge in the Singapore's graph. In this case you see that link ID refers to the OSM ID for which it is the OSM ID of the particular edge and then you have the latitude and longitude position for that particular edge and this is the speed band. So the maximum observed speed was 39 whereas the minimum observed speed was 30. This speed band data set is actually a snapshot of how speeds were observed at a particular point in time and using this we can actually calculate congestion for the entire network. So how we do that we will come to that later but after we calculate the congestion for the entire network we have to propose an algorithm and what this algorithm's main aim is that it should minimize the overall travel time for one user in the entire network and our constraints would obviously remain transit points for that user. At what point should this person start taking a bus and stop walking etc to reach their destination. After we have minimized travel time for one user we want to minimize overall travel time for every user in the system. So this is achieved by actually formulating a linear programming problem wherein our cost function is to minimize the overall system's travel time and our constraints become constant road capacities and continually in the network. So after we have proposed these and all our mathematical tools are in place we use our ABAP proposed algorithm to actually find optimal transit points for the middle mile. After we found that we solved this linear programming problem to find transit points for every person in the network. And we solved this linear programming problem by using a conditional gradient descent algorithm called Frank Wolf which is specifically used for transportation based problem statements. And after we have found system optimal flows for the entire network these flows are basically ensuring that there is zero congestion at all times. We want to decompose these flows into dedicated routes for every passenger in the network so that they can just freely go on that route without having to worry about congestion at all. Okay so after we have done all of that awesome magic our final architecture looks something like this. So let's say there are in customers at one point asking for requesting source destination trips you know different source destination trips. Our framework helps us to calculate middle mile transit modes for all of these end requests using this algorithm that we proposed. And it's called Modified Hybrid Search which we will come to later. After we have calculated the middle mile transit modes for all end requests we use our conditional gradient descent and decompose these flows into dedicated routes for every customer in the network. And something to note sorry something to note is that these nodes follow a multi class setup where every customer would be using a combination of these vehicles. So let us actually start with implementation. How are we calculating congestion. We use a heuristic called the Bureau of Public Roads heuristic or BPR in short. And what it does is that it tries to estimate the travel time required to go through an edge UV in the graph. Okay inherent you know intuitively speaking if you're available road capacity is less. That means the number of vehicles on the road are more and that means your travel time to cross that road would be more. So that means you're facing more congestion and traffic jam if your number of vehicles on the road are more is more. So in this equation you can see that TD UV is actually the estimated travel time to cross an edge UV and T UV refers to free flow time for UV. So what again to reiterate it means that it is time that would take a person to cross you and we given that there are no roads on the no vehicles on the road at all. And using this equation alphabeta constants F refers to the number of vehicles on the road at that point in capacity C belongs to capacity of that road. So you can see that if number of vehicles on the road are more for on an edge UV the estimated travel time would be really, really high. All right so after we've computed congestion for the entire network using this heuristic we are going to compute our middle mile transit modes. Well, okay to put it in a very graphical way and entire customer trip could be seen in this manner X and Y become our source and destination and A and B become our transit points for the middle mile. So basically we a person would be walking or using micro mobility options from X to A as their first mile, then taking a vehicle from A to B as their middle mile and then taking another vehicle or just walking again from B to Y as their last mile. So X and Y are set, we only need to find A and B which happened to be our optimal transit points for the middle mile. How this is done is actually very simple and very intuitive. So let's say this is a source and the person wants to go to their destination. We first take a radius of 500 meters so total it's like 500 plus 500 up to one kilometer radius for the from the person and their destination. And we check what are possible nodes within this radius that a person can walk to and then take a middle mile vehicle from this node to cover the rest of their journey. So a person could walk to either A, B or C and then take a middle mile vehicle to either D, E or F and after reaching any of these points they would further work from D, E or F to their destination and complete their entire journey. So already we are trying to implement a multi-class fleet setup. So you can see that a person has about three into three, nine combinations of possible routes to take and that route is selected which will result in the least overall travel time from source to destination. So that means A and F have to be selected in a way which will be optimal and provide minimum travel time from source to destination. So how A and F are calculated is actually done by an algorithm that we proposed and it's called modified hybrid search. So all right, now this is in this graph you see that there is a point X1, Y1 and there is another point X2, Y2 and the blue line actually corresponds to a Euclidean distance which is actually just a straight line drawn and then the red line corresponds to the actual path that has to be taken to go from this point to this point. So you see theoretically speaking Euclidean distances serve as a lower bound for every distance and if we try to just translate distances to time just by dividing it by constant speed. So Euclidean times are again serving as a lower bound for a path which has a path having the actual distance from A to B. All right, so over here what you see is that this array is a Euclidean array. Now what that means is that let's take an example just to understand all of this easier. In the previous example we saw that there were nine possible routes in this case let's say there were 10 possible routes for the person to go from the source to destination. So in the array we have Euclidean distances calculated for all the 10 possible source destination pairs and they are then sorted in increasing order of their length. Now again if I divided all of them with a constant speed they would become time. So in this case these are Euclidean times sorted in increasing order. And ideally if I want to find a source destination pair having the shortest distance or shortest travel time I would have to go through all these 10 elements and check which is my minimum pair and then output it. But that is not optimal at all because n could be really huge. So therefore we are trying to modify the search and try to return early and in turn also provide an optimal answer. All right, so after we have this Euclidean time sorted which is serving as a lower bound what we do is we define a cutoff which is equal to seal of n by e is base of logarithmic. So which is equal to fourth element 10 by e is approximately the fourth element and then we have a decider which is equal to cutoff plus one of the fifth element. Now the significance of this cutoff is that after we have searched till our cutoff we are going to stop our search and return whatever answer we have gotten till now. Whether it is optimal or not we will assume that that answer that we got is optimal. If till cutoff we haven't received an answer then we will compare with decider and return our answer. Okay, now this cutoff was selected using a concept which was used in a problem called secret hiring secretary problem which you can read about further. It basically it basically claims that a person I mean sorry after the stopping criteria of n by e there is always a 37% of win guaranteed. All right, so we're going to see how this works out for us. So let's say what we do is that we go to our first element which has a Euclidean time of 10 units and we open up that element and we see that the actual travel time is 29 units. Now this 29 units is greater than its next Euclidean unit which is 20 so that means so then we try to open the second element which is 25. Now we see that 25 is lesser than 29 so till now our minimum pair happens to be 25. We also see that 25 is lesser than its next Euclidean element which is 30 and we know that Euclidean distances or travel time serve as the lower bound that means 25 is bound to be the lowest element in all of these 10 elements. So we have found our optimal transit points and it's time to return. That's it in case to we do the same thing and we open up the first element in this case we see that the actual travel time is 35 units. It's greater than the next Euclidean unit. So we open the second element which is 32 units till now 32 has been our minimum because it's lesser than 35. So it's a minimum pair till now but 32 is still greater than its next Euclidean unit so we have to open up the third element as well. It happens to be 45 so 32 units is still the minimum pair and 45 is still greater than the fourth so we open that up which is 55. Moving on we have 65 so we have reached the cutoff 65 happens to be our decider. So now we have to stop at our cutoff. We see 32 is still lesser than our decider therefore we return our minimum pair to beat us 32 and we stop our search. In case there could be an alternate case wherein we are continuously opening up our elements and we reach our decider again. And we see that in this case actually a decider was lesser than the minimum pair we had till now. So in this case we would return the decider because that has been the minimum up till now. So this way we see that in either of the cases we haven't searched all 10 elements we have returned the minimum pair before up till a certain cutoff. And something to note of interest is that this has basically helped us to reduce our number of queries which translates to that there has been computational efficiency in all these three cases. So in the first case there has been an 80% reduction in the number of queries and in these two cases there has been about 50% reduction in the number of queries. So we've gotten an answer about that much percentage. We've gotten an answer that much faster. So I proposed this algorithm but is it even any good. So we did a number of numerical experiments just to see how it is actually scaling up to our actual Singapore's network data. Well the first numerical experiment we did was to compare different kinds of searches so blue represents a modified hybrid search that we just explained. Orange represents a search which is actually hybrid search which was which is a primitive method of the search approach we're using now and green represents our optimal exhaustive search wherein you are going to search through every element to find your minimum pair. And you see that we found our optimal minimum pair in the least amount of queries for a modified hybrid search. So in all three cases okay sorry something to note over here is that congestion and traffic jams depend on times. So in Singapore we basically took three slices of time to incorporate different congestion patterns throughout the day so that we can validate that our framework doesn't only work when it's high peak or off peak. So we have off peak as 12am or midnight we have moderately peak which is 3pm when it is lunch and then we have high peak which is 6pm when people are finally starting to go to their office sorry come back to their homes from their offices so it's a rush hour. So in all these three cases we see that the number of queries required to get a minimum optimal transit point has been the least using our modified hybrid search and there has been an average reduction of 70.07% queries. And then with a negligible average delay of just 4.67 seconds so what this means is that let's say an algorithm or a framework would tell you that the estimated travel time from A to B was 30 minutes. Our framework would tell you that the estimated travel time from A to B is 30.467 minutes 34.67 seconds minutes. So you see that that kind of delay is absolutely negligible for use case like transportation. The second proof has been that we've tried to compare two kinds of framework. Our framework is using a multi-class fleet which is that it's trying to use three kinds of sorry two kinds of you know it's trying to break a customer trip into three classes and routing using congestion aware parts in a system equilibrium way. And we are comparing it against a framework which uses just a single vehicle from source to destination in a very user-centric approach which is what we all do generally. We did we tend to take the shortest distance route using just one vehicle from our source to destination. So in this graph we see that there has been about 74.26% of waiting queue size reduction. So what this basically means is that there has been an increase in the network utilization by 74.26% by using our framework. It means that you can send these many percentage people could be more people could be given or allotted a congestion free path as opposed to your user equilibrium. And all of these people in the network face zero congestion at all times because these congestion aware because all of these rules are being allotted in a congestion aware fashion. And well the third proof is that well if we calculate the overall travel time of the entire system into the number of vehicles that were actually present on the road at that time which basically means a flow time cost. It is simply number of vehicles into hours travel time. So for all three peaks we see again that our blue blue framework as opposed to a single class user equilibrium framework is performs better than this orange one. There is a reduction of 12.13% in overall average flow cost. So this basically translates to that in a system. There is a reduction of 33 minutes per trip. So let's say on an average if a person takes one hour to go from A to B. On an average a person would be there would be a reduction of 33 minutes per trip for every request in the network. So all of these numerical experiments show how powerful our framework is just by increasing the time taken just by increasing the time taken. By walking or using pedestrian networks which most of us do not do not use because we're kind of lazy or we just want ease. So okay now this is actually our framework being used against a real life example. So we have used state of the art which is Google Maps and we see that let's say a person wants to go from this red spot to this destination over here. So Google Maps actually gives us the brown path which is on the right over here. It says that there is a total trip time of 53 minutes from your start to destination which also incorporates walking. So this also is recommending you a multi-class setup. But it says that it involves about 21 minutes of total walking including first and last mile which is like 10 minutes here and 11 minutes here. And then the rest is middle mile having the total time is 53 minutes. And then the path on the left is our framework in pink which says that you have to walk for a total time of 33 minutes which is like 15 minutes here approximately 15 minutes here. And the pink part is middle mile having your overall trip time from start to finish is 45 minutes. So there has been a reduction in the overall travel time by just increasing your time in walking or bicycling by just a little more by just a little more minutes. So this is us trying to explain how our framework is working against state of the art because it is also trying, it is not only trying to reduce one user's overall trip time, it is trying to reduce every person's trip time by analyzing congestion throughout the network. And so we are at the end of our presentation and we have been able to achieve congestion aware routes in a system optimal way for our entire framework. So let's summarize just in case. We first start by extracting all of our data using pythons open source which is libraries. And then we try to implement our algorithm by finding possible nodes that we could walk to cycle to after which we could take middle mile vehicles and then complete our trip. So this incorporates a monthly class setup. We find these optimal transit points using a pro approach algorithm for modified hybrid search which is fast and absolutely computationally efficient and state of the art by about 70%. And there is always a reduction in queries. And there is absolute network utilization. And overall trip time and travel time is also reduced. And we finally have this architecture in place, which helps a system optimal congestion aware routing for every customer in the network. So, well, these are the list of references that we used to make this framework a reality and we gain inspiration from all of these. And that is all and thank you for listening to me and I hope you all learned something and had fun in this journey with me. Thank you.