 So welcome to using PostgreSQL and friends for a street sweeping solver project. I'm joined by speaker Jamesy Marka, founding member at Active Metrics LLC, and project scientist at UC Irvine, who will discuss how to clean open street map data using width recursive comment table expressions, convert one way and two way streets into a one way network of curbs, convert the street network into its line graph dual, save the solver input output, excuse me, save the solver output to a new table using Python, display the solvers roots using QGIS, and produce a smooth animation using post just aggregate functions and windowing. A little bit about your speaker. Dr. Jamesy Marka earned his PhD in transportation engineering from the University of California Irvine in 2002. He started working with PostgreSQL and PostGIS at that time to stash and process GPS data collected from in vehicle data collection units streaming data over CDPD wireless modems. He continues to use PostgreSQL as a data store for research projects and websites, and has connected to PostgreSQL from Pearl, Java, JavaScript, Python, Elixir and R. My name is Lindsay Hooper. I'm going to be your moderator for this webinar. You can find me under PostgreSQL and I am the host. And with that, I'll hand it off to James and take it away. Thank you, Lindsay. Hopefully the audio is coming through. Looks good to me. As Lindsay said, I'm going to talk about how we use PostgreSQL and friends for a street sweeping solver project. As you listen to the intro, I heard a lot of alphabet soup and acronyms and so on and hopefully I'm not going to do that too much, but I do slip into it. So, I don't want to talk about case study. So that's sort of the focus of the talk. I'm going to give practical examples of spatial processing, using recursion, using window functions and so on, and give a little bit of theory a little bit of background on spatial processing and so on since I don't think that's quite as common as other aspects of PostgreSQL. So just to reiterate what Lindsay said, I've been programming for a very long time. I had my bachelor's degree in 89, worked on space station freedom for a while as assistant engineer, went to UC Irvine to get my master's degree in transportation, worked in Boston for a while and then I'm back to get my PhD. I was doing the PhD research that I first started using PostgreSQL as Lindsay said we needed to store our GPS data somewhere. PostgreSQL before that, or I was using it and the geosupport was lacking so we switched over to PostGIS and PostgreSQL and that worked great. So we wrote a little pipeline to collect the data and save it to the database. After that we had to of course process the data so I learned how to hook up R to the database and we had to present the results so I did it from a website. We did a data we did accident risk website that was a Java based tool. We've done a huge node based project to save and store truck data. And, you know, I use really PostgreSQL as my go to back end for any time I have geo enabled data. So this project is for street sweeping. Street sweeping is you see the vehicles all the time the goal for the city is to make sure each curb is swept. So they want to do that as efficiently as possible. Algorithmically this is an edge covering problem and edge covering problems are NP hard, which means that you can't expect to ever find the optimal solution. All you can do is find ones that are good enough because the search space is just way too large. So what do we need to solve this kind of problem. Well, of course, first we need geographic data we need to know what the streets look like otherwise we can't do anything. We have that because I'm using open street map. And we also need a good solver that can tackle these NP hard problems. And I have that in Google's or tools. So where does postgres fit into this well it acts sort of as the glue it stores the raw data can clean the data process data so it looks good for the solver saves the solvers output and then processes that solver output to display it properly. So as I said postgres is my data glue and it's a smart glue. I want to talk about manipulating the data. I want to first talk a little bit about what the solver needs. So as I said, I'm using Google's or tools that is an open source operations research platform put out by Google. There's lots of different solvers in that suite of tools. We're using a routing solver to solve this as a routing problem so that begs the question. What is a routing problem. Routing problems. The goal is to find the best path for one or more vehicles depending on your problem to visit every node in a network so you're delivering packages or in this case sweeping streets and so on. But there's always practical constraints so the vehicles have a fixed capacity for example, and when they hit that capacity they need to go back to the depot to unload their packages or to pick up more packages or whatever. And then of course all the vehicles need to have about the same amount of work because you know who wants to be the driver that has to do all the work while the other drivers are sitting back at the depot. The computer doesn't care it's going to solve and make the most efficient route but drivers care they want to have the same work. And of course the operator wants to minimize the number of vehicles because vehicles cost money and the drivers also have rules about when they can work they have overtime hours and they have break times and so on. So all these constraints make it a really hard problem on top of the fact that it's already an MPR problem. Luckily, or tools that designed around constraints it's, you know, fundamentally based on constraint programming so there's a flexible way to specify a linear constraints and adding constraints can sometimes improve the runtime because it limits the search space that the solver has to inspect in order to find a good solution. So recapping the solver needs a list of nodes to visit and all to all travel matrix between those nodes. And then the vehicles and attributes where the depots are and so on. I'm going to focus just on the list of nodes to visit and the all to all travel matrix. The other stuff is really kind of minor. I mean it's important in the end but for this talk is minor. Where does that data come from. Again, we're using open street map. If you're not familiar with it. It's awesome tool where the data is for all over the world. It's fairly consistent is designed and put together by citizen geographers just like you and I, anybody can edit it anybody can add to it and so on. It's pretty great. Open street map has three core elements nodes ways and relations. Points in space. So for example an intersection ways or lines so street between two intersections power lines bridges and so on. And relations are groups of nodes so they collect together let's say a bunch of road segments to make a long roadway for example, you can use them for buster out so you can use them to define highways and so on. On top of the nodes ways and relations there's tags so a line on a screen has no meaning you have to tell you know the program and tell the viewer. What is that line so they have this tag system so for example, a tag to be highway equals residential that would tell you know the program or or their viewer that this is a residential street. There's also a fairly flexible approach to how to name space things so this bridge movable. There's lots of different kinds of movable bridges so you can create a new tag where the key is bridge colon movable, and the type of the value is drawbridge, and so on. These conventions evolve over time and there's a huge list of the accepted ones on the open street mat wiki. And the node important point here is that there is a location. Only nodes have location data. This is an example of a way. It has a tag in this case a service road, and it has eight nodes to make up the way. The first one and the last one are part of other ways so conceptually that's an intersection and you know this service road branches off from harbor Boulevard, and it ends at ponderosa street. All the other nodes in between. Remember only nodes contain the geographic data so these nodes are like shape points on the middle of the road. This is an example of a relation. Again, a whole bunch of tags describing what the relation is. And in this case there are 700 members so harbor Boulevard is a very long street, made up of 700 small ways along its length. And then the important point is on one important point is that nodes hold the location information. So, fundamentally open street map is kind of a relational database, and that everything refers back to nodes in order to get the spatial component. So, now that we've talked a little bit about the data. Let's take a look at sort of the meta problem of how do you model street sweeping as a routing problem. Well, routing problem inputs as I said before you need a list of locations that need to be reached, and a cost matrix from moving between all pairs. So street sweeping, you have to sweep a street and that's kind of like visiting a location. We need to use a network in order to determine travel times, and streets need to be swept in a time window everybody has seen those signs up that say, you know, no parking on Wednesdays from nine to noon. Sweepers actually pick up debris along the curbs and when the hoppers fill up they need to go back to the dump site to unload them. So this is almost exactly like a pickup and delivery problem with time windows. The trick here is getting location to visit so sweeping a street is the same as visiting a location. And if you think about that enough and if you remember back to any algorithms class you might have had. If you transform the regular network into its line graph dual, then every link will become a note and that's exactly what I want here so that's what I'll have to do later. So here's my strategy, take the open street map data, turn into a routable street network. And from that network I'm going to generate a line graph and then generate travel times. And those will be my main inputs into the or tools process. So what I mean by friends have been talking for about 10 minutes and I've just barely gotten to postgres. So I'm going to use post GIS that gives geographic data to capabilities to postgres SQL and PG routing which builds on top of post GIS to add some useful functions. So the first step is to load the data. You know, don't grab the whole planet, it's terabytes of size. Instead, go to a download service and grab a little bit. I'm even just grabbing so Cal latest as I showed before that's still too big. So I'm going to use this tool, Osmium to cut it down. The inputs the large OSM file, and then I have a polygon defining the city of Glendale and output comes on the latest. So the PG routing project has OSM to PG routing as a utility tool that will load open street map data. So again my input is Glendale latest. Some features to dictate how to use the database and then a configuration file that tells it what to take. So that configuration files here. I'm just showing this because it's not super well documented on the website. So the first max speed and the tags here is actually in kilometers per hour if that's important to know the way it works it looks at the tags. Remember I talked about the tags for all nodes ways and relations. So it looks for tags that are named highway and have a particular set of values. So I've commented out motorway motorway link and motorway junction, because I don't want those sweet sweepers don't sweep those. They also don't sweep service routes, but they do sweep these nodes. Sorry, these kinds of links. I've also set the max speed at 50, which is about 35 miles an hour, which is about how fast the streets you would move when it's driving between streets. They actually when they're sweeping they go about six miles an hour, but that's different. So generates a big ways table. And then here's some sample data from that ways table. So generates a link length and meters. There's a source node and a target node, and has this cost in seconds and reverse cost in seconds as well. If the reverse cost is negative. That typically means you can't go that way on that street in this case it's a one way street so it has a negative value down below here are two way streets. So the going backwards on the street is allowed. So I also made sure that the 50 kilometers of being used properly if you divide 39 meters by 50 kilometers an hour, you get about 2.85 seconds. So that works. So what have we got. Basically, we've just gotten the data into the database. So I'd like to move straight on to making a street network, but unfortunately, we have to first clean up the open street map data. This is the hardest part and the longest part of my talk. So I'm going to try to go as quickly as I can through it. But rest assured, if it feels like the hours coming to a close and I still haven't finished this part. The other parts are pretty easy. It's always true that cleaning the data is the hardest task. The problem with open street map is there's too many roadway segments. It's easier to show on a map here. So right in the middle. I've got north Isabelle Street and Geneva Street and some other streets. And these are. This is exactly what I want. So this is one block and has one segment. So the dots on this plot are the midpoint of each segment. All around it are streets with many segments on them. So I need to somehow examine each segment determine if it's a midpoint. Sorry, if it's an isolated segment on a block that needs to be joined with his neighbors, or whether it's one long segment and so on. To do that, I'm going to need to use recursive queries. So postgres has with statements that hopefully everybody's using because they're super convenient to organize your long SQL statements puts them into logical tables and so on. With recursive statements I use less commonly but when I do need them they're indispensable so they allow you to look back at the results of your with statements so far and refer back to it so in this case I can grow a segment of grow a sequence of segments into a longer line. So here's a quick refresher on what with recursive does. First there's a non recursive term that sets up the later steps as a union statement or union all, and then the recursive term that selects from the virtual table being built in this with statement. So here's the example from the database. Sorry, from the documentation. The initialization step is just values one, then there's a union statement for the recursive step and it selects n plus one from T, as long as n is less than 10. And then the query is to sum all that up. Looking at this I didn't know what it was doing I couldn't figure it out so I ran it, and it said 55. I still don't know how it got there so instead of doing some I popped in select n from T. And then I could see each line as it was emitted from this recursive statement so the first initialization step emits one, because it's just values one. The second one takes the result of that statement adds one to it and emits that the third line takes the results of previous iteration in this case to adds one and emits that and so on. So you get the numbers one to 10 and if you sum all this up, you get 55. So that's how it works. So thinking about that I look back again at my data. Each segment has a name, a source node and a target node and some other details. The names are not unique so here I've got Glenn Oaks Boulevard twice. The source nodes are not unique in this case they're shared here, but the target seemed to be unique. And the IDs are absolutely always unique so I can use the IDs to rebuild unique identifiers. What I'm going to do is look at a map, and you can see that you know the source nodes are sometimes shared at intersections. So for example this intersection here is a source node for one segment, a target for two. And this intersection here is a source node for three segments and a target for one. So this one coming down from the top is the only one that ends in that node, and then it spruits off three nodes in the other way three segments. So here are used as a source exactly once and uses a target exactly once, but they touch each other so one segment target is another segment source. So that's what I want to do when I'm rebuilding, or sorry, putting together a long segment out of these sub segments. So that's what I just said. So with that count of source and target. I make a table, or two tables rather, the sources table goes through all of the ways. And counts up how many times each source is used, and the targets tables is the same counting up the targets. And then I decide my potential interior segments by looking at all the records, joining with the targets table on the target, and joining with the source table on the source that's these comments right here, lines rather. And then only keeping those with a target count as one and the source count as one. Exactly a true interior, because some intersections actually are the source for only one outgoing link and actually our targets for a bunch of incoming links and so on, because it's a mixed two way and one way network which complicates things a bit. But it's not so hard to back out of the total possible interiors actual interiors. And here's what that looks like. So these are all the red points are all the points again of the blue ones are the ones that are the possible interior segments, and then the green links here are the actual interior segments. So it dropped out a few of these blue ones that are actually touching intersections. So the interior segments, I want to look at the starts and ends, I'm going to focus on the starts, the ends are the same as the starts you just flip the sense of source and target, and it's the same code. The start a start again is starts from an intersection and has to terminate at one of these interior links on a segment. So, here I've got this query here that says where the target count is one. And then the actual starts are where the those possible starts where the start count is greater than one. And if you think about it you can sort of squint and put these together as one query, you're selecting from the Glendale ways, joining on targets and sources, and keeping those with the target count is one, and the source count is greater than one. That's what this looks like. So these are all the possible starts looks very similar to the interior segments. And the green dots here are all of the actual starts. So that's actually too narrow. That's what I like. Here's some start segments that are all in green. And the initial cut has missed this red segment here. This is actually a start. But as I said before, this is a two way links coming into this note here. So, I need to account for the fact that sometimes a node is only a source for one segment, but it can be a target for a bunch of ones. There's this Union statement here saying when the source count is one. If the target count is greater than one for that node, then it's also a start. So that adds these large green dots. And then there's sometimes the name changes and I want to make those to be starts as well. So that's these really big dots, they're scattered throughout the map. And then there's another case called singletons. And again, another special case, not that important. And that makes these other large green dots. So the moral story is, you know, it's always messy. In this case, I was thinking in terms of a directive network, but in fact it was a, you know, a mixed network, and so on. So you just have to handle all those edge cases. So as the full starts query, I'm going to just cruise through this because I just said it all and I'm running low on time. Ends query again is almost identical. You just flip source and target use the same code. So now I need to get to joining those segments. So my strategies and go from the end back to the start and recurse over those segments to grow them. So to do that, I need to use post GIS functions. And so I want to talk a little bit about spatial processing. So a quick caveat, I'm not a geographer. So if I say anything wrong, too bad. So latitude, longitude, what are those, everybody should know a little bit about that, but the earth is a sphere latitude is degrees up or down from the equator, sort of going around the globe. And then longitude is moving left and right around the earth. So from Greenwich, London, which is the Meridian, East of Greenwich is positive, West of Greenwich is negative. Maps are flat and need projections. So a projection is how to unroll that map, if you will, there's a bunch of different kinds depending on the purpose of your end map. Um, Wikipedia articles pretty good on which projection you want to use. It's way more information than, and then I want to read about, but it's interesting. There's some that you should be aware of, there's America to map that's used in tiling map libraries like Google Maps and so on, it has an equal angle property so it works well for zooming. But it's not good for chloroplast maps. So if you want to do something that looks at, say, the incidence of COVID-19 positivity rates across counties in a state, don't use America to projection you want to use an Albers projection which has an equal area property. So it'll you can display those percentages and it actually will visually make sense and be accurate. There's also transverse Mercator. It's just be aware that exists because it's used in universal transverse Mercator. So datums are the other component here, there's a couple different kinds. Most data this these days are in WGS 84 because that's was used by GPS detectors. So as I said a second ago, there's a thing called UTM or universal transverse Mercator. I don't know whether it's a datum or projection or a mix of both. Again, I'm not a geographer, but I do know that it's incredibly useful. So it slices the surface of the earth into 60 zones. And because it's so narrow, they can be mapped with low distortion to a fairly high degree of latitude and long latitude north and south. It's like unrolling an orange. So important points. If you have GPS coordinates, you probably have WGS 84. So think about it that way. Always write latitude, longitude. So latitude again is north and south and longitude is east and west. But programs and functions and so on tend to flip the order and use it more like Cartesian coordinates so X than Y. That's, I don't know, I hate that, but that's where it goes. A standard simple feature for SQL put up at the Open GIS consortium. They develop lots of things, but most important for me are the well known text and well known binary point line string and so on. I only care about point and line string in this project. So how does PostGS fit into this? Well, they've implemented all of the simple features for SQL and then a lot more than that. They've taken the well known text and well known binary and extended those to include the spatial referencing system for the object in question. So the SRID allows you to know what the projection and what the data is for the objects and then you can transform into different projections. Here's a quick example of what these things look like. So an EWKT you've got SRID and then the point and just plain WKT would drop the SRID part. And line string is just a common delimited list of longitude and latitude pairs. Notice again, this is X than Y. They flipped it in the code. So I'm going to use a bunch of functions throughout this. I'm not going to go over them now, but they're here you can look at them later if you want to go back and review what I did. But there's also this one weird PostGS trick that's super useful. So when you measure geometries, if you just use the ST length and you just apply to a geometry, you're going to get length and degrees, which is kind of useless. So instead, you need to project your geometry into UTM. So in my case, I live in zone 11. So if I have my data in Glendale, I want to project it to 32611 SRID. And then the ST length call will give you meters. That is so useful. I use it all the time. Which zone should you use? Well, you can go to Wikipedia and look up the zone. Pick your correct SRID based on that. And now back to the story. Remember, we had identified the interior segments along these roads and identified the starts and identified the ends. I want to recurse from the end back to the start. I also need to make sure that I stop at the right point. So I'll recursion is two steps initialization, and then the recursive part. So, excuse me, a quick drink of water. This is my initialization step. I'm making a table virtual table here called search graph. And I'm selecting from the ends and selecting a bunch of features from that end table, starting with the length and geographic ID. I'm going to use that for the iteration stopping point. Collecting all of the unique IDs as a path and initializing a cycle variable as well as false. So this little bit here is the geometry call. So why do I use ST as EWKT? So what I'm doing with a statement. I'm making a binary geometry into text, and that's not free to do, but it turns out if I just make an array of geometries and then try to union them, the union statement chokes and crashes because it says, well, I don't know what kind of geometries these are. The problem is as soon as you make an array of geometries, postgres loses the sense that what type of element that is inside of it. So you have to keep casting it back. I'm getting them out of strings. I preserve all the information because it's just text. So now here's the recursive bit. Again, my stopping guards. I am not going to go to any depth longer than 100. So I won't make any segments longer than 100. That's much larger than I need. And I also make sure I don't have any cycles. I'll show how I do that in a second. So again, the initialization step started at the ends, and the looping step builds from the interiors. I'm joining the previous step of this recursion, given that my current target is equal to the previous iterations source. So again, think about as Lego pieces that you're stacking together. I have a Lego piece in my hand that I'm fitting it to that growing stack, and it fits perfectly if the target equals the source and if the name is the same. Collecting again the same columns, but I'm growing the length and adding one to the depth. I'm prepending the current geographic ID to the growing path. And you can use that path by seeing if the GID is inside of that growing list. So if there is, then that means it's a cycle and a cycle means you stop. Anyway, so this is the geography step again, geometry step. Focusing in on that the make line statement. It works well to use both binary and well known text. So this works perfectly well. What it does it takes that binary geometry for the current observation and makes a line out of it, and the growing segments. That's a binary item, well known binary. And so I convert it back to text for the next iteration. And alternate versions use array. Again, if there's a little bit of gymnastics to say geometry line string, you know cast the array. It works out to be about the same amount of time but I think the other way is cleaner so I use it. Here's the results of that recursive step. Some wrinkles here and that, like, here I've shown North Louise Street has a depth of 19 it also has a depth of 18 so I have two results for this North Louise Street segment, one with 19 elements one with 18 segments. And also we actually have 19 results for this North Louise Street. So I want to keep the longest one so that's the next bit of sequel does it looks through and only keeps the longest joined segments from that recursive step. It's a little bit tricky but not too complicated, basically I just break up the array from the path associate the longest value with each element in the array. And then the second block here GID max depth associate or sorry keeps only the longest depth for each GID. And the last bit here rebuilds the paths but only keeping the longest one so all those intermediate ones go away so North Louise Street with 18 observations is disappeared. So I make one record and add the starts to it so that's this here. I select from the search graph. Join me the distinct paths to just keep the longest records. Join in the starts where the target of the start fits in with the source of the growing path. And then I select a bunch of things from my final output. And then I have the geometry bit in here so again, I'm making a line as I did before converting it back to well known text. And as I worked on this talk I realized that's kind of redundant I probably didn't need to do that but what I'm doing is taking my well known text and then converting it back again into a binary sense so I can probably delete those two commands but I'm not going to do that for this presentation because maybe there's something wrong. I don't know, maybe I figured something out that I need to do that. And then the last part here is a simplify preserve topology call so if you have a whole bunch of lines that are in a straight line. I don't need those interior nodes, right. So if I piece together 19 segments from North Louise Street and they're on a perfectly straight line, then I want to eliminate all those interior segments and just keep the start in the end. And that's what this will do if there's a turn or a curve that deviates more than 0.0001 degrees, then it's going to keep those extra nodes as it needs them. Note also I'm using a length here in degrees, which is useless, but it doesn't mean anything really but it's good enough, it works. Some little bit of bookkeeping to finish up and group together and dump to my table. And then I'll output. This is what I started with. This is what I end up with. So this was great. I was super happy when this came out. There's still some problems here and there, like, you know, these nodes. There's two nodes on this network or two segments rather two segments in this line, but it's good enough. I reduced my total number of segments by about 40%, which is a huge impact on the problem size. Again, this is MP hard so the smaller you can get in the much smaller the search space. I've made open street map and I've cleaned it. So the next step is to make that street network. So I'm going to try to go fast through this and luckily it's not actually that hard. I'm going to make a routable street network of curbs. OSM data has information on one way streets and two way streets and in theory PG routing can analyze a mixed network but in practice, I wasn't good enough to make it work without hitting a whole bunch of bugs. So I said, well, let's just concentrate on the curbs, make myself a direct network. That's what I just said. So I have a big SQL statement for all this stuff. I'm going to quickly go over the details here. Set up first a serial sequence so I can make sure that each curve has a unique ID. And then I use my one weird trick. I've converted the geometry from the new Glendale ways table into the 32611 SRID. So again, I've got, I've got it into a UTM universal transverse architecture. So I get meters. To make the right hand side. I know because I'm a transportation engineer that lanes are about 12 feet wide. So two meters to the right will get me the curb if it's a two lane street. So that's what I'm doing here. And then I cast it back to 4326, which is WGS. And then I have to reverse the sense of it because of the way offset curve works. In order to make sure I get the movement correct, I need to reverse it. On the right hand side. The left hand side is different depending on whether it's two way or one way. The two way street is a positive two meter offset. Again, convert back to 4326 and then again reverse the sense of it. I know it's a two way street if the reverse cost is positive. It's similar, but you don't have to reverse the sense of it because you're moving in the correct direction at that point. And the reverse cost of negative tells me this is a one way street, but I definitely want to be sweeping that left hand side curb so I do have to make it. So all the left hand sides are the union of the two different kinds of left hand side. I make a left hand curb table. And then I make final union of those two to make my curbs table, and I insert that into a new curbs graph. And I'm done. So that makes this, which is kind of underwhelming, but again, very useful. I've made a routable network now of directed links. So where are we. I've got my street map, and I've got my street curb networks now I need to make a line graph, and then travel times so line graph is actually one of the easiest things I did. To revisit again what is a line graph. So network graph, the intersection nodes in the streets or links in a line graph the streets or nodes, and the links represent legal movements. So thinking about that, if you're out of node here on sweeping the right hand side of the street, you can move straight to the next curb, or you can turn left to the right side or turn left to the left side, or you can make a U turn and sweep the other side of the street. So putting all this together for this fairly simple intersection produces all of these links so these nodes have to have these links. Luckily, I don't have to do any of that because PG routing has an excellent function called PGR line graph. So you need to pass it a little bit of sequel to look at your tell how to get the ID and source and so on, but then it does the job so this is what it looks like here's my links, and it converts my curve graph into a line graph, zooming in a little bit see what's going on for real. You can kind of see that you know the turnaround links are also in there. It's doing the correct movements and so on. So there, I've got my routable network. So I'm ready to go on that data pipeline. Next I need to travel times. To do that need to make an all to all matrix. It's again, almost a one liner so close and yet so far. So there's a function again it takes a little bit of sequel as text and then a list of vertex IDs. The documentation that shows this in my case I have this so I'm selecting my ID, the source, the target, the target length and meters as the forward cost, the reverse cost which is just set the negative one. So it'll never route down a negative path. And for my source vertices, I just picked all the sources, but I quickly ran out around because 9000 curbs makes an 84 million wide matrix, and it, you know, PGR Dijkstra Dijkstra algorithm has to keep everything in memory as it goes. So it just runs around. So I do a little bit of ugly hacking. I made a function, standard function body initialization step eliminated the ID blocks so instead of picking all the vertices I picked just 3000 of them. And then I have the same command as before select asteris from PGR Dijkstra cost matrix of this sub select quote literal is just the text line from before. And then I'm picking the ID and source from my curb line graph. And then I'm from my vertices for each iteration I'm picking the IDs out of ID block that 3000 limited vertices. And then I insert it into the line graph matrix. And if there's any conflicts do nothing. So that's my function that repeats eight times. It starts at zero and increases each time by 1000, and this call here execute insert sequel using start ID. So that again, the start ID is used inside this ID block so the source has to be greater than the start. So, if you think about it I'm moving in steps of 1000 each time I'm taking a window of 3000 so I have some overlaps, but you could be also thinking about it as if there's a big bug here and that you know the initial 3000 will never overlap with the final 3000. So, instead of doing that I thought about it a while flailed about and I said oh I can just sample the underrepresented so I have a growing matrix of observations. I'm going to pick the ones that are not represented in that matrix and make sure they're in every call, then gradually I'll build up my full matrix. So to do that, I use an ordered set aggregate function called percentile content or continuous percentile. So here is how that works. I have an s ID count so the unique starting ID, and I sort it by count. So in the high block I'm picking the lowest 500, and then I have this percentile count call that tells me where is the lowest seventh percentile based on count. So then the high block says take everything from that sort of count matrix that's greater than the lowest seventh percentile and randomize them and then limit them 2500 so I'm still picking 3000 nodes. But I'm picking making sure 500 are from the low block and 2500 are from the high block. Union those and then run the call. So after a while that works. I ran it a bunch of times. Eventually I have everything should have 9193 observations. Most of them are within one or two of that. So then finally we can solve the street sweeping problem, pump all this data into our tools. If you were here hoping to find out how to do that. I can't talk about that because it's really outside the scope. Some quick benchmarks. It right now it takes about 20 minutes to generate the initial solution so it just churns away in the data and 20 minutes later you get your first cut, and then it can run for hours just improving that over time. This is okay because you're not going to redo street sweeping every morning. You kind of want to run it once per season or per quarter. But it's also difficult to visualize the output because our tools just dumps a list for each truck that tells it where to go. So that's the last bit here. So I deliberately save the geometry with the results of each link. So if it says here's the list of routes that street sweeper number a has to cover, then I also include the geometry. I do that because then post GIS, sorry, QGIS can display it directly. So here's a QGIS view of Glendale. And here is the view of all of the routes that my solver generated. So that's nice, but it doesn't tell you very much about how these routes are interleaved and so on. If you zoom in on just one route, you can see it here. It's doing what it's doing, but it doesn't really give you a sense of the movement. Where's the vehicle coming from and going to and so on. So that's what I said. So then I moved on to animating. There's lots of blog posts for this. If you search for Geo Giffery, you'll find some good posts. I like this guy's post here. His approach was to use QGIS is Atlas functionality. So he makes an image stack. And then he sort of like a flip book, you know. So Atlas allows you to control a print view. And each frame is controlled by the next street being swept. And then I just use FFM peg to stitch together the stitches, stitch together the images. So it actually looks terrible. And if you have any sort of epilepsy tendencies or don't like flashing lights, look away for a couple seconds. And if you have a weak stomach, also look away. So yeah, my daughters were horrified about this and said that's not an animation. That's just horrible. So the problem is that each frame moves one block and the blocks aren't the same size. So I had to turn to post gas again to snip up the roads into equal sizes. So there's a command called ST line substring that takes a geometry starting fraction ending fraction, and we'll snip that geometry into the points you want. So the documentation for ST line substring also says you can use this to break a line into n different parts from zero to n minus one. And this is the code they show in the documentation. Here's the snipping bit. And then down here they have this generate series zero to 10,000 to make sure you're 10,000 just an arbitrary big number. I didn't like the arbitrary big number I figured why I know how long my roads are I know how long I want them to be so I can figure out how many, how long my series needs to be. So I have to reorganize the statement with with statements. So, again, I use that one weird trick to find out the length of the longest, longest segment I transformed to srid 32611. I got the max length from that. And then the max iteration value is just that maximum length divided by 25. So then using that max iteration, I could generate my necessary length of my series. And then I can use that series value inside my snipping statement. So here, I'm also creating a frame number so that I know the order of these frames so if I take one road that has some ID. I can just say, Oh, well, the IDs increment sequentially throughout the routes. So I just add in a little bit of decimal value to that ID, and I get my frame so I can put these in order. Then the substring statement I just follow along with what was in the code, take the geometry, and I'm gradually moving a window from the beginning to the end so I can piece together the street, save it into a table. And here's the results of that. Hopefully you can all see this and it's not stressing the bandwidth too much but it's much smoother. The vehicles moving along but right there was pretty jerky every time that vehicle changes direction. The screen still jerks around. So, as a bonus I get arrowheads. The previous animation was just the street and now I've got an arrowhead moving along a street because moving every 25 meters. And then the room from improvement is sort of jumps around. So the last thing I did, and this is the last thing from my talk. I made a point of view table so the Atlas functionality can have a hidden view where it centers the picture, and you don't need to show that. And so I just want to do is sort of spatially average where the vehicle is coming from and going to instead of following the vehicle. Exactly. So with that, I use the ST collect which is a spatial aggregate function, and I used a windowing function over, ordered by the UID. So unique ID for each geometry. And I did the two preceding rows and 10 following rows. So, if you think about that, you know the immediate past history will always be in the window. And then I make a sheet to where the vehicle is going to go so that way the camera is sort of panning ahead of the vehicle just a little bit. And then given that I make the centroids, and then I save those centroids to table so every single frame of the animation has a unique point of view that should be the center for that frame. So the results can be showed on a map. Again, for the animation you would hide this but I just want to show you what it looks like. So this is a window with the two preceding and 10 following 10 upcoming nodes. So these are the centroid points so it sort of smooths off the curves as you see, if you make a bigger window, you'll get more smoothing, and even bigger window, you get even more smoothing. The problem with getting too big a window is I mean if you conceivably can do you know 10,000 before and 10,000 after, but that would be the entire route in one picture. So if you're zooming in on an area, the vehicle will drive off the edge of the frame so there's a, there's a link between the amount of aggregation you do, and the size of your window so I'm actually developing a website to show this using JavaScript. So I've set up a relationship between the size of the window on my screen and how much spatial aggregation I want to do for the point of view. So here's the smoother animation with a 210 window. It's a bit better, but the vehicle still sort of in the middle of the frame here. And then this larger window, so 40 nodes before 15 nodes after again each, remember each segment is 25 meters so 40 times 25 is about how much space it could maximally have in each frame. And this is smoother this works well for this size of animation window, and you can see the vehicle moving around. And that's about it. So that's the end of my talk. Are there any questions. So far, there's only been one question and that's, would you be able to share the scripts you used for cleaning the OSM data. Yeah, so most of it is in this presentation, which will be made available as a video I'm also going to put up the, you know, the raw slides on my website. I think in my in the bio for this talk of the synopsis I have a link to the draft slides I'm going to put the actual slides in the same spot so they'll be findable. For the most part, it's, it's all in there. If it's not, I can also make a, you know, a GitHub repository it's not. I can't share all it is I can't just dump the project to GitHub, but I'm sure I can without too much effort. I can just extract the scripts. So, so yes, long winded. Thank you for that. Another question has come in. What problems did you have when analyzing with PG routing. So, there, there is, there are a couple functions in PG routing to solve. Variable so I use it a lot in this project actually when I'm when you're making that that matrix of distances, you know that that's crucial. The other way to do it, you can just individually start from some origin some destination and say hey, you know, route, find the best route from a to B. All those functions work well. What didn't work for me was when I dipped into the experimental stuff so another way to think about speed sweeping is also to think of it as a Chinese Postman problem. And there's a Chinese Postman problem solver in there, but it didn't work on problems that are this large. So obviously it crashed. And on smaller problems, I had some instabilities, where you're running it three times in a row, the third time would crash. And I guess I actually just got a notification on the GitHub issue on that that I raised that that's been solved in the latest version so the basic aspects of it work really well. And the fundamental things and the solvers aspect of it. I don't think work as well, especially not given how many constraints I wanted to throw into this. I was reading up on the plans and the issues and GitHub and so on and PG routing. Personally, there's some work to try to include a different routing library in it. And in my opinion, that's the wrong way to go. I think it's great for the basic stuff like just shortest path from a to B, or how do you get from a to B, you know, the five best routes and so on those sort of basic algorithms work really well. But as soon as you start to try to solve larger problems. I think it's not the right tool for the job. Does that answer the question. I'd say so. Another, the question asker is saying yes. Another one. Did you run into geometry issues with the OSM data, as in connected things not actually being connected in the real world and vice versa. Yes, actually there was one point that that that did happen. I don't remember bank which is the street right across from Glendale so I kind of was familiar with various things. And I just arbitrarily zoomed in to check things out. And I came across the street that I know recently had reconfigured itself. And you could see that the open street map data had a through movement where now there was a forced right hand turn they blocked off the street. So I think it's a mix to I think told middle school and or junior high or high school. I don't remember. Anyway, they've made us have that you know the flow of traffic around the high school school there only goes in one direction you can't go straight through on the other side so they've made a one way street where they used to be a two way street. And that wasn't an open street map. So actually I just changed it. I, you know, you correct it yourself. There aren't any missing links in this city at least it had been well edited over the years. So a lot of the, a lot of the problems with open street map is there going to be a link that doesn't touch its neighbor. So that seems to be fairly good here in other cities, you probably want to take a look at that. And the great thing about open street map is if you find errors, it's fairly straightforward to go in there and fix them using their tools. Great. For the sake of time I'm going to combine two questions into one. And then we have one more. So, did you consider Valhalla rooting which Tesla uses, or mobility DB. I don't know mobility DB. I did look at Valhalla. I also looked at I can't remember what it is the same guys to make Valhalla, you know, the Mapbox folks in that sort of world. OSM routing something like that. Anyway, so I there's a project called OSM routing or something similar my brain just gapping out that I've used a lot. So with that one you can load up the open street map data, and then set up sort of a service that you send it origin destination information and it'll get you back shortest path Valhalla does a similar thing. In this case, I needed the actual network to convert into a line graph right and there's nothing in Valhalla or the OSM routing project whatever that was called. I wish I could remember. Anyway, that will give you a line graph. It was a funny thing where I was actually asking about that. And then suddenly it was contacted with some of the developers for other questions but you know it was sort of like, how do you how do you use the embedded C code that's the representation of that graph can you get at that in order to turn that into a line graph, but you can't. So for my purposes, I really needed to get something where I could get the line graph so PG routing work great, because it had that, you know, line graph function that did all my work for me I actually did nothing for that simple. All I need to do is make my routable network. So, you know, those are good projects, but they weren't good for this. The other advantage of PG routing is I could make that network, the all the network matrix once and save that in the database do all that work once and then just query it when I needed it. Whereas with the OSM routing project that I've used. I typically have to do that. I can't cash it on the database side, I have to cash it on my program side. So I hit the service, a bunch of times to build something and then I save it as a file, which is kind of hacking and gross. I much rather have a table in the database that I can just query next time I run my solver. I think the consensus in the chat is that the name of the project you're trying to remember is OSRM. Yeah, that's it. Yeah. And open source routing machine. And our final question here, which I think is a particularly good one because I had the same question. If you had to take into account traffic for driving cars and not a street sweeper. Would you use a different set of data or a different API to get the route options and travel times, and then apply a similar approach. Well, no. So the couple of things. First, the OR tools solver doesn't allow you to do dynamic travel times, right. So there's been a lot of discussion back and forth over the years. If you're in the C++ API of our tools, you can set up something that has what's called, you know, a state dependent travel time so you can say, Hey, it's now 10 o'clock. What is my travel time on this link. But according to the lead developer, that's a bad idea because it's really, really slow, you know, it slows everything down as far as the solvers concerned it can't can no longer optimize based on, you know, sort of first principles and now has to wait okay I can't necessarily rule that link out because if it comes in a different state, then it's it's a different choice. Or tools to begin with doesn't like that aspect of it so you can't put in dynamic traffic. On the other hand, what the way to do it. And the answer sort of is yes, is you just sort of scale it along so remember the street sweepers have a certain period of time that they're allowed to be sweeping a certain zone so you know if it's Monday from 10 to one. The traffic on Monday from 10 to one is fairly consistent. Right. If it's from 8am to 10am. That's more of a rush hour traffic so you can sort of look at the streets and decide but at the same time, you know, the cities also know that too so they don't put their busiest streets to be swept in the early morning. They usually put them at different times a day so another feature with these MP heart problems is you definitely want to break them up in the smaller chunks so you can break up, you know just based on the zones know the time of day of that zone and then go and tweak the travel time on streets to be what you need for that time of day. The other issue is that the street sweepers when they're sweeping move at about six miles an hour. And they're on the side of the road out of the way. But you know when they're actually moving sort of dead heading that they will slow down traffic to so another aspect of this the call we're originally responding to is they wanted to make sure that street seepers weren't slowing down traffic and similar for like a garbage truck problem same deal garbage trucks definitely block traffic so they want to try to minimize the impact of these vehicles on traffic by being able to change their routes as they stop fit so they said hey this isn't working we want to be able to change it and that's why they want this sort of a tool. Wonderful. Well those are all of our questions, and I think that's all of our time. Thank you, thank you, thank you James. This was wonderful. Really fascinating way you you solved a problem. Thank you to everyone who joined and participated. James actually getting a ton of praise and chat right now. Oh good. I'll take a look after to boost my ego. If anybody has any more questions too. I don't know if my emails on that list but I'm, I'm certainly open to answering any questions offline if you want to shoot me an email in the chat as soon as I get off this. Wonderful. So, regardless for you are whether it's morning afternoon or evening. I hope you have a wonderful rest of your day, and I will see you back here next week. Cheers. Thank you. Thank you.