 So I think I'll take this time to, you know, talk about what Niharika is going to talk about. Her topics title is predicting the traffic jam, condition of air routing. I'll let her tell you about whether it is about network traffic or actual traffic. But right now she is pursuing her BTEC and information technology at triple IT Alhabad. And she's also a software engineer at Gojik. She has presented talks in multiple conferences such as DevCon, Open Source Summit at Europe, Flock to Fedora at Budapest. And she is the chapter title, chapter lead of Pi Ladies Mumbai. So if there is anyone who wishes to join Pi Ladies Mumbai, you should definitely reach out to her after this. She's been a finalist for the Red Hat Women in Open Source Academy Award 2020. And she was also an outreach intern at Fedora Project. So if there are students here who want to know more about outreach, you should talk to her about that. And she's also mentored students in Google code in for the Fedora Project. That's a lot of, you know, awesome stuff to have done as a student. So I'm sure there'll just be more to come. Also, yeah, Niharika, you can share your screen and we'll have you on the stage. I don't think three-minute delay should be much because we are at 3.18 and we'll start at 3.20 exactly. By the way, Wesley has provided the link to his serverless stuff. We'll be making it available as well, but you're hearing it directly from him on the chat so you can bookmark that right away. It is 3.19, so Niharika, please take the stage. Am I live? Yes, you are. All right. Hello, everyone. I am Niharika Srivastav and currently I am a software engineer at Gojek. I was also a research assistant at SUTD Singapore where my mentor and I, you know, asked me to compose the framework that can potentially predict the traffic jam by providing intelligent congestion aware routes to every passenger in the network so that you can basically avoid situations like these. So let's go to the talk outline. I would first start by addressing the problem of congestion and why it should be addressed and solved. Then I will introduce a concept of multi-class fleets that is potentially going to help us to solve this problem of congestion. Then we would go over the architectural overview and what our final framework should look like. I would also go through Python GIS libraries that I have used extensively in this framework and how they are helpful in analyzing transportation networks. Then we would talk briefly about the implementation of this framework. And finally, we would be comparing this framework with real life use cases. Alright, so let's start with some background. Well, the problem of congestion arises because every passenger wants to take the shortest distance route from their source to destination, thereby causing a traffic jam like this on road intersections, which results in underutilization of certain other edges in the network. Also, on top of that, the fact that over the years the number of vehicles has risen exponentially and the capacity of roads has always been approximately constant. So let's introduce a concept of multi-class fleets and how we're going to use it. Well, multi-class fleets allow breaking an entire customer journey into three classes. We have the first mile, middle mile and the last mile. So the first and last mile use micromobility options such as walking, scooters or bicycle and the middle mile uses fast speed cars, taxis or public transports. And the middle mile is supposed to cover at least about 90% of the entire customer's journey. So for example, a person would start walking from their source for about 200 meters and then they would alight a bus and take it for 15 minutes after which they would further bike away to their destination. Alright, so what are we finally going to do? We essentially want to route all customers in a congestion aware manner using that previously explained multi-class fleet of vehicles. And we also want to provide optimal transit points for each trip. So what that means is that if you see in this video. Sorry, apparently the mouse pointer is a bit distracting. Can it be changed or removed? Is it fine? No, it's distracting apparently. I won't use it. Okay, thank you so much. So we have to find optimal transit points for each trip. So if you see in this figure what this means is that let's say a person wants to go from A to B and they have two route options ACB and ADB. So they could either walk to their nearest bus station at sea and then take a bus from C to B having their total trip time as 34 minutes. So usually people like to opt working less and go to the nearest bus station and then just start. Or they could have actually worked a little bit further to their next nearest bus station at D and then take in a bus from D to B having their total trip time as 30 minutes thereby saving four minutes of travel time. So in this case you see that actually the optimal route from A to B is ADB and the optimal transit point is D and not C. So transit point means a point in the network from where you stop taking one mode of transport and start taking another. You're transitioning between modes of transport. We also want to create a social model out of this. So what that means is that so usually UBC implementations of congestion on let's say Google Maps and what they do is they try to route a customer on the most on the least congested path in a very user centric selfish greedy manner. But we don't want to do that. We actually want to route every passenger in the network in a congestion aware manner. So we want to take into account all the requests in the entire system and then scale this framework to work otherwise. So the final architecture of this would be as follows that we would be processing customer requests in a batch of N. They would have origin destination requests. We would use these requests and analyze our network which has congestion information which is in the form of travel time duration. And then we would be finding optimal transit points for each customer. We would do this by by using a proposed hybrid search that we have that we will come to later and after finding these optimal transit points for the middle mile journey. This is where you start taking your fast speed cars. We will dedicate routes to every passenger in the network using that multi class fleet of setup. So let me introduce a concept called geographic information system or just what is just just is actually a software framework that helps us to extract and analyze geographical elements in for analysis for research purposes. So it helps us to layer spatial data of Earth and view it in a 3D way so that we can extract patterns and then do foreign analysis. So as you might have noticed the research work of transportation or path planning strategies come under GIS. So there are several Python GIS libraries that are provided. It's all open source. I use many of them. The first one I used was OSM NX. What it does is that it helps us to extract entire street networks from the open source street map database and then further analyze it. One thing to note is that this framework that we built was verified and validated on Singapore's street network. So we extracted it and on the left we see all the nodes that are present in Singapore's graph and on the right we see all the edges that are present. The different colors of edges represent the different kind of roads. So some are highways, some are residential roads, some are service roads, etc. So a graph has been broken into nodes and edges. Geopandas. So Geopandas is nothing but a slight extension of the Data Science Library Pandas. And what it does is that it helps you to convert the previously gotten nodes and edges into separate data frames. So node is one data frame and edge is another. In this figure we see Singapore's edge data frame. So for example the columns U and V refer to the starting and ending node of a certain edge. We have the name of the edge, what kind of a road it is if it's tertiary primary secondary service. The maximum speed allowed on that particular edge which is regulated by government rules. And then we have the length of the edge and location corresponds to the exact latitude and longitude points. So after we have gotten our nodes and edges in a data frame separately, we combine them to create a network X graph. And this helps us, it enables us to perform several route planning algorithms on our graph. Such as Dijkstra or Bellman's Ford or A Star Search. So that you can go from one node to another and plan your route trip. Awesome. And lastly we have ArchDree. So ArchDree is actually very cool. What it does is it isolates every geographical element and it's like it creates a bound for over every geographical element. Every geographical element is one single element. And then you can perform set operations such as union or intersection and see what other elements fall under the same radius. So for example, if I were to search that what are the nearest neighbors for my search for a certain node in the graph, I would take a union of all the nodes for a certain radius and understand, okay, this is the nearest neighbor. These are the number of nearest neighbors. So that's what ArchDree does. And finally something I used which was for speed up is project RSMS contraction hierarchy. So one thing to note is that project RSMS is not a Python library. It's actually written in C++ but it enables us to integrate it with Python by using the rest API. It provides something called contraction hierarchies and it is a routing algorithm just like Dijkstra's algorithm. So what it does is that it helps us to route between two nodes in the graph using the shortest distance path or the shortest travel time path. Now the reason why we're using contraction hierarchy and not Dijkstra is because when we're dealing with large networks of cities and countries, in this case Singapore is an entire city country, so Dijkstra becomes really inefficient to compute. So for an example over here, we see that if we take an 80 mile Euclidean distance route from A to B, Dijkstra would have to compute 500,000 nodes, whereas contraction hierarchy would have to compute only 600 nodes to compute the shortest distance route from A to B. And the reason why this is happening is because many of these stages are pre-computed and that is a major boost up for our library. Alright, so let's finally start with implementing this framework. The first and foremost thing we need to do is compute congestion for the entire network. So after we have gathered our data from OpenStreetMap using all the Python just libraries, there is something called as traffic speedpan data set that is provided by Singapore's government in an open manner. It's open sourced. So what the speedpan data set provides is that it provides you the link ID, which is a primary key for what edge it represents in the entire Singapore's graph. We have the location which is latitude and longitude points. And something important is it gives you the maximum speed observed at that edge on the road at a particular time and the minimum speed observed at that edge at a particular time. So if you average it out, you can see that for a particular slice in time, you can understand the observed speeds at every edge in the network. So down below on I figure 8, we see a heuristic called Bureau of Public Roads Heuristics. So heuristics are basically estimates we use in order to make our computation easier. So the estimate over here is that congestion is being calculated in terms of travel time. So estimated travel time, hence the heuristic is used. And the main thing to note is that your estimated travel time for an edge should be proportional, directly proportional to the number of cars that are already present on the road. And it should be inversely proportional to the capacity of the road. So let's say that if you're, okay, so let's say there is an edge on the road and it has a maximum speed limit of 60 km per hour. But the observed speed is just 30 km per hour on an average. So your intuitive thinking is that the cars are not able to reach the maximum speed limit. Hence they are facing some kind of congestion. Hence their estimated travel time is going to increase. So this is how we calculate congestion for the entire network. The next step is to divide one entire customer journey from source to destination into three miles because we want to eventually give them a multi-class setup. So let's say a person wants to go from source to destination. The first thing we do is that we take a radius of 500 meters from both source and destination and then we check how many potential transit points fall under that radius. So transit points could be bus stops or a pickup taxi, whatever is suitable. In this example, we have three bus stops within 500 meter radius for source and destination respectively. So the reason why the radius is 500 meters is thinking that a person wouldn't want to walk for more than one kilometer max in an entire trip journey because we also want to take into account the flexibility for a customer. So now, given this, the person has the options of going from all these three bus stops from source to any of the three bus stops at destination. So there are possibly nine routes the person can take. Now which route do we actually select? So we select that route which will give us the overall travel time from source to destination. So essentially, if you see, we only need to calculate which bus stop to take from source and destination. So these are transit points for the middle mile and this is what we need to calculate. How this is calculated is using a proposed algorithm called Modified Hybrid Search. Now our main objective is to find the minimum travel time given a set of travel times. So to understand this, our first thing to note is that imagine we have, in the previous example, we had nine possible routes. In this, let's say we have 10 possible routes. So we have 10 source destination pairs. So we have from A1, B1, so on to A10, B10. And what we do is we calculate the Euclidean distances between these load pairs. And why Euclidean distance? So this is another heuristic because we essentially want to estimate travel times. And our main objective is to compute our minimum travel time in a very cost effective way. So Euclidean distances serve as a lower bound for distance. Euclidean distance is the least distance you can possibly get from point A to point B. If I were to divide this distance by constant speed, I would get Euclidean time. And again Euclidean time would serve as a lower bound for travel time between point A and B. So what we do is that we have 10 possible node pairs and we calculate the Euclidean times for all of these 10 pairs. And we sort them in increasing order. Now if I were to give you an array of 10 elements and I would say that you need to find the minimum travel time pair. But you do not know what the actual values of those 10 elements are. You essentially need to go through every element in the array and then you would output a minimum pair. So that is like an exhaustive search that you are doing. And that is computationally inefficient because it is the order of O of N. And it wouldn't scale well when we are talking about huge networks of countries. So that is why we are trying to do modified hybrid search. So we are going to start doing that. So let's say we have 10 possible roots over here. Whose Euclidean travel times have been sorted in increasing order. We introduce a variable called cutoff which is equal to C of N by E. And so N being 10, C of N by E is approximately 4. And 4 over here represents the fourth element or index 4. We have a decider which is equal to cutoff plus 1 which is equal to 5 in this case. So fifth index or the fifth element in our array. And index is starting from 1 for this case. Sorry about that. So what our main aim is that our idea is to stop our search the moment we reach a cutoff and then we do a base comparison. We do not want to go through all elements to return our least travel time because that is inefficient for us. So for case 1 we go to the first element. We see that the Euclidean travel time for pair A1, B1 is 10 minutes let's say. We see that the actual travel time at A1, B1 taking into account the congestion information and how many vehicles there are and so on and so forth. The actual travel time is 29 minutes. We then compare this 29 with the second Euclidean travel time which is 20 minutes. And we see that 29 minutes is greater than 20 minutes. So we continue. We go on to open up our second element which is 25 minutes. This is the actual travel time for pair A2, B2. And then again we reiterate the same process. We see that 25 minutes this time is lesser than its next Euclidean travel time which is 30 minutes. So we stop and we return 25 as our minimum travel time. And why we're doing that is because we see that we first established that Euclidean travel times are serving as a lower bound. So it is obvious that 25 being lesser than a Euclidean travel time is going to be lesser than every other element that is going to come after that. Hence we stop and we return with 25 saying that this was our minimum travel time for the entire array we are done. In an alternate case what can happen is we again start with opening up our first element. This time it is 35 minutes and it is lesser, it is greater than the next Euclidean time which is 20. So we open up our next element. This is 32 minutes. It is again greater than 30 minutes. So we open the third element. But at the same time 32 has been minimum up till now. So between 35 and 32, 32 has been our minimum travel time up till now. So we just want to make sure that we remember that in a variable somewhere. We've opened the third element which is 45 minutes and it is still greater than the fourth element which is 40 minutes. So we open that up and it turns out to be 55 minutes. Still greater than the fifth which is 50 minutes. And finally we open the fifth element which comes out to be 65 minutes. Now you see that the red line is showing that we have reached our cut off. We need to stop our search. This is what, this is kind of a constraint we have set. We need to stop our search. Till now we have all these five elements and we need to see which one is the best answer till now, the minimum answer till now. And we see that 32 happens to be the least travel time that we have seen till now. Hence we stop our search. We say okay we're going to go with 32 as our least travel time for all 10 possible routes and we return. There could be an alternate case where we keep opening up our list the same way as before. Till element 4 we see that 55 has been the minimum element up till now. And then on the fifth we see that we get 52 minutes which is actually a decider element. We compare our minimum up till now to the decider and we see that 52, our decider element is lesser than minimum up till now. Hence 52 is going to be returned as a minimum for the entire array. And this is how modified hybrid search works. So one thing to note is that in all three cases we did not go through every all the 10 elements. There was a reduction in the number of queries. In the first case we had about 80% reduction in the queries. We just queried the database for two elements and in case two and the alternate one we had a reduction of 50%. So we are trying to make our search algorithm more computationally efficient. And we will see what the tradeoff is but this is how we're doing it right now. So this is where we started to validate our network framework with other state of the art current frameworks. The first thing that we did was that we see that our network is using a multi-class setup wherein a person is breaking their entire trip journey into three parts. And every journey, every customer trip is being system optimally getting every customer trip is being allocated a system optimal route. We are going to compare it with the current framework that we use intuitively which is you start with just a single vehicle from your start to destination. And you want to choose a path which is either the shortest distance for you or if you check on Google Maps and you see okay this is the least congestion travel time for me so I will go on this. So the blue one corresponds to our framework and the orange is the one the single class user framework. And in all three cases over here we have three peak times called off peak moderate peak and high peak so that we can properly understand every pattern of congestion throughout the day. And in all three cases we see that our framework is drastically less than the orange bar. What that actually translates to is that we are utilizing our entire network there is by 74.26% more. So there is greater network utilization using our framework it means that more people are being allotted a congestion free route from their source to destination at the same time. If we opt for a multi-class system equilibrium framework and something to note is that since there is a system equilibrium and every trip request has been taken into consideration. The congestion faced by every customer now who is following this dedicated multi-class setup route is facing zero congestion at all times. So I also wanted to see how this is working and scaling for a real case scenario. So we compared it with Google Maps. So over here we see that let's say a person wants to go from the upper left red mark to the down red mark on the right. So that's your source and destination. Google Maps also provides multi-class recommendations. But again they are user centric in greedy and we're trying to do a system centric one. So let's say how that compares the brown line that you see on the right corresponds to Google Maps framework. What they suggest is that you should walk for a total of 21 minutes. So it's like 10 minutes you start from your source and then you walk for 11 minutes from your transit point to your destination. And your travel time using a fast-speed car or vehicle is going to be 53 minus 21. So your total trip time from start to end is going to be 53 minutes which corresponds to walking plus your middle mile which is cars or bus. But our framework is recommending the red path on the left which says that you should walk for a total of 33 minutes. So it's like 15 minutes starting from your source till a certain point from where then you start taking a bus. And then you again walk for about 15 minutes more to your destination having a total trip time of 45 minutes. So we see that we have saved time over here by just increasing our pedestrian networks slightly more by just increasing our walking time slightly more. We have increased we have decreased our overall journey time. So this is our framework being used in a real life case and we've reached the end of our talk. So I'm just summarizing in case some of you lost track. We first started with collecting our data using open source Python just libraries and this helps us to analyze our spatial data. Then we implemented our algorithm wherein we are aim is to divide a customer's journey into three miles in a certain way. Then we tried to find the optimal transit point for a certain person in the trip. At what point should they be stop walking or should they stop walking and start taking a bus and then stop start walking again to finish their entire journey. So in order to save that overall travel time also while doing this in a computationally efficient way. And finally this is our framework that we get where essentially batch of end customers are allotted dedicated routes using this multi-class frame multi-class setup of fleets. So thank you everyone. Thank you Niharika. And again the chat section has been a buzz so you have a lot of question answers questions for which we hope you will be able to provide answers. If not, if you can also bring up the slide where people can contact you later on that can be there for reference but now I'll read out some of the questions that people can answer. So one of the questions that struck me as generally useful is looks like everything depends on whether there's data provided by government. Any alternative to implement in case such data isn't provided is what, let me see who is asking this, it is Akshay. Yeah awesome question something which I actually faced problem in myself. So in Singapore's case Singapore's government is very generous and they provide traffic data openly but I was also validating this framework on New York and Paris just to see how it scales. And New York and Paris governments don't provide this data freely at all. So what I had to do was that we had to model our estimated travel time and there are various research papers and citations would say that congestion travel times and traffic patterns are represented very accurately using Gaussian distributions. So we modeled our travel time using a half normal Gaussian distribution and then we so it was basically like another heuristic using your estimating your travel times based on a probabilistic distribution. So that works too. It is not ideal. It is not real life. But this is what you do when there is parts of data that is where data collection is a huge task in data science. I hope that answers the question. Akshay, you can ping if you have got your answer. Another question is, is the speed up of contraction hierarchy gained by sacrificing the optimal result? We know that Dijkstra produces optimal result. Can we say the same about contraction hierarchy? Yes, the question is really on. So contraction hierarchies also provides the optimal shortest path which is provided the same as Dijkstra. The speed up is because if you must have attended the keynote by Anand, he said that pre-computation is a very important thing in data science wherein you pre-compute common data points and then you store them. So contraction hierarchy has two processes. One is pre-computing data and the second is what we use, which is after the pre-computation is done, all we need to do is query what is the path from A to B. So in transportation or in graphs, nodes are constant. They will never change over time. Hence, we can pre-compute the shortest and even roads never change over years. So you can pre-compute the shortest distance from A to B by using the road length and then store it. And all you need to do is while you are querying your contraction hierarchy server, you need to divide every edge using the speed limit at that particular edge, which can obviously change. So the pre-computation efficiency is gained because of the pre-computation step and contraction hierarchy provides optimal results just like Dijkstra. I hope that answers. I can chat more later. I think we will take one more question. So Sravya asks, this was during the 70% mark, by the time offsets between A1 and B1 are estimated, what if A1, B2 has become the optimal counting all parameters, optimal counting all parameters? Yeah, okay. So I want to point this out, which I kind of didn't add in my presentation, was that modified hybrid search that was proposed is not providing an optimal minimum travel point obviously because we are not going through all the 10 possible routes. But it is providing us something very close to optimal. And what happens is that in a use case like traffic network congestion, let's say if the minimum travel point due to changing times and changing congestion patterns happens to be for example 20 minutes. So from A to B, let's say your minimum travel time taking all parameters into consideration is 20 minutes and modified hybrid search, what it tends to do is it gains, it gives you accuracy up to 70% with the deviation of on an average less than five seconds. So you would get your estimated travel time as 20 minutes five seconds. So that is that trade off is very negligent when you're talking about a use case of transportation, right? Five seconds make no sense. Even a threshold of two minutes or a delay of two minutes doesn't matter too much to an individual customer. So that was the kind of speed off slash trade off we gained by using modified hybrid search. Shavya, does that answer your question? I'm not sure if she's gotten to the chat section, but I'm sure she will see it. Yeah, you can always contact me and then we can discuss this offline. Yep, just I think we'll take one last question. Why is the cut off chosen as N by E? Perfect question. Yeah, so N by E is approximately 37% of E. This concept is called stopping criteria of hiring secretary problem, hiring secretary problem stopping criteria. This concept says that there is a 37% win probability of finding the minimum or the most optimal value in given N number of data points where you don't know what the actual values of the end data points are. You just know that the size of data points is N and you want to find the minimum without having to look at all. So I'll again repeat it's called hiring secretary stopping criteria. You can check that on Wiki. What we used is a modified version of that where we didn't go through all N again. We only went through just there was a decider if you saw so the decider is just cut off plus one. So again, we didn't go through we wanted to basically stop our search as soon as possible and see what the tradeoff was and how computationally efficient this was. So we try to modify that further by saying there is N by E cut off and then just one more comparison there and that's it. So yeah. Awesome. So on a related point, there's a book called algorithms to live by where it also talks about when to stop in an algorithm. So I think it will be worthy to check out if people haven't. Anyway, Niharika, thank you so much. You've been working on time. So thank you. And thanks to the audience for, you know, like asking valuable questions and participating in the talk because as difficult interactivity is to maintain in a session, I think we're pulling it up really nicely. So thank you again. Yeah, I think I'm going to bring you down from the stage, but thanks again. And you haven't put up your contact information. Or maybe I missed that part. All right, I'll put it up on Hopin. No, you can put it up right here or you can put up on Hopin as well. Yeah, people are all there. I will copy it and put it up on Zulip chat and it will be available there for taking questions later or you have a Twitter handle or email ID. That will be great. Sure, I'll definitely do that. Thank you. Your slides will be made available, I suppose. All right, yeah. Thanks for having me. This is my first Picon, awesome experience till now. I'm not sure at all. It was brilliant. Thank you so much.