 And a lovely welcome back to the Huck stage on the third day of this Congress. We are here with a talk on a few quantitative thoughts on parking in Marburg by Martin L. He's interested in data analytics now and infrastructure and traffic in general. And because of that he started scraping publicly available parking data in Marburg and just went on and analyzed it and found a lot of interesting things which he is going to present in this talk to you right now. In case you didn't know, there is an IRC client on the live.huck.media where you can ask questions later or with the hashtag RC3Huck tag on Twitter. Welcome to my talk, a few quantitative thoughts on parking in Marburg. I'm delighted to speak here on this Congress because I love the yearly conferences. Also thank you to the organizing team for making all this possible. You do an absolutely fabulous job. Now the first question that you should ask is why. The following is a purely hobby project question. I came up with a question because transportation is important, but unfortunately it's also difficult. The most popular vehicles these days are cars and hence the question, how do people park in Marburg? Who am I? My name is Martin and I analyze publicly available data. I live close to Marburg, therefore the parking in Marburg. Now a little bit of background regarding Marburg. It's a small pictures vibrant university town. There are a few highlights such as the castle, the old town and the river just to name a few. It has around 80,000 residents and a somewhat dense core around the old town. You can see a few pictures here of the castle, the old town and the river respectively. Now at this point I would like to give my props to David Kreese because all this work was inspired by his amazing data science talks. You can find them on YouTube and I absolutely encourage you to look for the barn mining, Spiegel mining and the Xerox story talks. Okay, so if you have questions then please ask. I will be there live during the Q&A of this conference and also you can send me an email with whatever you like, essentially. Okay, so first of all I would like to give a quick introduction to the data source. Now the data, the parking data from Marburg is publicly, well it's published live on a system that is implemented by the city council I believe. It's called Park Light System Marburg or PLS for now and it publishes the data such as the parking decks, the number of pre-parking spots and the location. The address here is PLS.marburg.de and let's see how it looks. Yeah, so obviously it's still online and you can see here the parking deck names listed, the number of pre-parking spots. Color coded is if it is rather full or if it's rather empty. You can see here all of them are in the green color coding here. That's because it's probably close to Christmas so nobody wants to really park in the city. And the only one that's this one here, the marked dry park deck, that it has some load to it. Then also there's a button called route. So whenever you click on this button say we pick the Elnring Center button. We are redirected to Google Maps and we can see here the location of this parking deck for example. Okay, let's go back. Last but not least there's also the maximum vehicle allowance and of course the timestamp of the data. Okay, back to the presentation. Now this is a very simple website so of course it's easy to scrape and that's what I did. Regarding the scraper I used a Linux computer and a Docker container and this scraper you can see a small sketch here to the left. It simply visits the website every three minutes inside the Docker container and writes the data into I believe it was CSV files which are subsequently used for the data analysis. All of it the scraper and the analysis scripts are written in Python. Okay, the data format is pretty simple. It's processed internally with data frames with a package panda. Everybody who knows Python probably knows panda. Anyway, the data format is as follows. The row corresponds to the time. The column corresponds to the parking to the specific parking deck and the cell corresponds to the number of three parking spots at that time of that parking deck. Now in order to make the numbers a bit more usable I transformed the number of three parking spots to the number of used parking spots by subtracting it from the maximum along the time. Okay, now the intro is just to get used to the data. We'd like to take a look at the locations of the park houses or the park decks. This is a screenshot. This is an interactive version. Let me open it here. It's a interactive map. You can see two types of markers. The first one, red, the second one, green. And that's because the red ones are the ones that are given. Well, they are encoded in the links of the PLS system. And they are actually wrong. When you click on the, for instance, Erlenring Center parking deck that I've done before, the location, longitude and latitude are actually incorrect and Google Maps corrects it on the fly. And therefore I've shown here the ones given on the website that are incorrect in red and the ones shown that are correct. So you can safely focus only on the green ones. A quick overview. Here's the train station region. There are two and then they are scattered around the city. Sometimes there are two parking decks very close by, for instance, these two and these two. And that's because it's essentially one parking deck with two parking sections, typically inside the building and on top of the building. Okay, let's go back to the presentation. With that in place, we, or we take a look at the joint data, meaning I accumulate the number of use parking spots across all the parking decks. You can see that here now. So it's a quite comprehensive picture. I started data scraping in August 2019 and stopped it at the end of February 2020. This data here is a different resampled frequencies of the original and raw data. I started with a resample of one hour. So just a reminder, it's the true frequency is three minutes. Again, I resampled here into two one hour. It's not very easy to understand on that scale here than to one day. It's the orange now and lastly on one week. And we can learn different things from it. So in particular, the orange curve of one day shows that there might be some periodicity in the signal and the green one shows that there are times or weeks that are particularly, where there's particularly little parking demand. For instance, here around Christmas 2019. Okay. So again, from the orange signal, you can see that there's probably some periodicity. And in order to quantify that, I plotted the or computed the auto correlation function. The auto correlation function essentially takes a time signal and computes the overlap between the time signal and the same signal shifted by some time. And whenever there's a large overlap, that points towards the periodicity. And here we can see that the periodicity maximum or the auto correlation maximum, the first one corresponds to one week. And therefore the periodicity can be safely assumed to be at seven days. Of course, when there's periodicity in a signal at seven days, for instance, there's also periodicity in 14 days and in 21 days, but the correlation coefficients, they decay typically. Okay. Now we have the periodicity with respect to days in place. Now let's take a look at the day and hour demand. And for that, I computed a two-dimensional histogram with the day, Monday to Sunday on the one axis and the other axis corresponds to the hour. And here we can clearly see that the majority of the parking demand is around the noon hour. So starting from 11 to approximately, let's say 5 PM or so. Interestingly, that was a point where I was surprised is that Sundays is a day where there's little parking demand in Marburg. I would have guesstimated that Sunday when everybody has spare time, they typically rush into the city, but that's obviously not the case. Another interesting fact is that Monday morning seemed to be very difficult to get up because you can see the parking demand is smaller than on other mornings. Okay. Now, after that, I come to the separate analysis where I take a look at the individual parking decks. So first of all, again, the time series, it's a bit dense and it's very hard to see. So there are a few things to learn from the picture. So first of all, the green signal that corresponds to the Erlenring center reminder, I just opened it in the very beginning of this talk. It seems to be the dominant one. Then there are quite a few data gaps. So take for instance, well, it's very apparent here for the violet one, the Furtstraße park deck, this one here. But that's an extreme case. It had obviously some kind of problem. It was open for some time and then closed for some other times. Typically, park houses or parking decks are either open 24-7, but there are also quite a few that are that close overnight. Okay. Next, I was interested in the statistics of parking demand for individual parking decks. So I concentrated only on, say, one parking deck and computed the histograms of the used parking spots, also depending on the time. Let's focus here on the Oberstadt. It's the old town and you can see that the overall parking demand peaks at around, let's say maybe 20 used parking spots. That's the average, but that's now for all times. When we make that statement dependent on the time, for instance, the morning, we can see that's approximately the same, but when we go towards noon, we can see that the number of parking spots or used parking spots increases. There are even a few times when it's at the maximum around noon. Now, when we go towards later hours, the maximum shifts towards smaller values again. Now, this behavior of the maximum shifting so clearly, depending on the hour, is not apparent for all of the parking decks. For instance, the Parkdreik here, doesn't show the signal as clear as the Oberstadt one. From this all now, we can quantify also the integral parking demand. Simply it's the number of parking spots that have been provided per parking deck. Now, the picture here is normalized to the maximum. One can see from this picture here very easily that the Erlenring center, as we've estimated or guessed previously already, is the one that's dominating the whole city. It's providing the most parking spots by a large margin actually. The next one is the Lahn Center and then maybe the Oberstadt and the other ones follow after these. Another interesting point here is that the proportion of parking spots provided on weekends is for the different parking decks. For instance, here you can see this one here has quite a big portion, the Erlenring center also on weekends. Contrary, the Parkdreik Parkdeck has only a very small portion of parking spots provided on weekends. It might be interesting to know that this particular parking station is the one that is used if you want to go to a doctor and it's very close, so many doctors are not open on Sundays and Saturdays and therefore probably the parking demand is quite low. Now there's a temporal version also where I rendered a small video that I'm opening now and you can see essentially the same as in the previous graph but against time. Again, it's very apparent that there's a periodicity and here my scraper crashed and it's back in business again and I found it interesting to see that there are parking decks that host cars even at night. For instance, here the Erlenring center again and the Lahn center, the ones that are the largest one, they offer parking also overnight and there are some cars in there probably. Okay, let's close that again. Now, I come, lastly, to the prediction part now. The goal here is to measure the parking demand through the parking decks but then to interpolate between the parking decks. So I would like, so I have, say, the Oberstadt, the Old Town and the, I don't know, the Erlenring which was the largest one and I would like to know what's the parking demand in between, for instance. For doing so, I use a spatial fit and I use a machine learning model for that in order to do that spatial fit. It is now in this particular case a nonparametric model called Gaussian process regression and the nice thing about that is that it also returns the uncertainty because say, for instance, you would like to use these model machine learning predictions to, say, build some kind of parking deck or to get rid of one. All these operations, all these derived actions would be very expensive so you would like to know if the uncertainty is large or small for whatever the machine learning model predicts. Just for the math-oriented people if you're interested in that model, definitely take a look at the, I would call it, Gaussian process Bible by Rasmussen, it's amazing to read. Yeah, there are two evaluations now I did. The first one is based on the whole data set so there's no spatial or, well, sorry, there's no temporal resolution and what I do, I did a, well, I rendered a video and I would like to explain you the outcome of that while it is running. The top picture here shows you the prediction by the machine learning model and the bottom picture shows you the uncertainty. The training data, meaning the parking decks, is denoted by the black points. Now, first of all, the uncertainty, you can see that wherever there is training data, the uncertainty goes down so the model is certain about its prediction because, well, there's training data and in between the uncertainty rises again. Now, the prediction here you can see some small hill. It's exactly the Elmering center, which was the largest one. What is shown in the video is it's rotating. You can see the coordinates of Marburg on the plane, on the bottom plane and at some point the view rotates upwards and gives you a top-down perspective with the corresponding color bars or corresponding color map. So again, here's the maximum, the Elmering center and I did that because next we would like to finally measure the parking demand between stations. There's another small video again and now we start right from the top-down color-coded view and again the black points are the training data but now the red points are its kind of test data, meaning positions in between. I concentrated now on the Mensa because I have a special relation with the Mensa the physics department, the university library the train station and the cinema and just to demonstrate from this special fit we can derive the parking demand at these positions also. Here this yellow pike, it's the Elmering center again. Now that's only a qualitative result of course. I don't want to derive any quantitative at this point. It's just a proof of concept that it is possible to derive something like that from the publicly available data Now I forgot to mention the beginning that there's a bonus and I would like to come to the bonus now. It is about the corona crisis or pandemic of course. What I did is the initial data acquisition phase here in black the whole talk was about that black portion here I stopped it at around the end of February and I restarted the whole data acquisition process now again in approximately April just to capture something from the corona crisis as well and you can see here again the time series I think the most interesting bit about it is the most comprehensive bit is the mean you can see the mean across the whole time denoted by this dashed line and you can see that the mean is smaller so during the corona pandemic you are people parked in Marburg which is reasonable I would say but there are also times where the number of parking spots decreased significantly so for instance right when the corona crisis started in April and now the second wave in October, November, December it is visible that the parking demand decreased a lot I went one step further and wanted to know the differences between pre-corona and during corona also for each of the parking decks that's what I did here it's now not the normalized parking demand but the absolute parking demand so now you can see also the absolute numbers the black bars you've seen previously already now the red bars is during the corona crisis and then I defined the first wave and the second wave as serious corona times so I also plotted a third bar set of bars here and it's interesting to see that while most of the parking decks of course suffered in terms of providing parking demand so most of them provided fewer parking spots but there are a few like for instance the markdreik park deck here that almost increased we can see during the corona in general it increased a bit and then during the heavy corona it increased even more and as I mentioned before this is the parking deck that corresponds to a whole collection of doctors so I derived that well during corona times the parking demand in front of doctors even increased a tiny bit yeah with that I would like to come to my conclusions thank you for sticking with me until now so I scraped publicly available data here with a small scraper setup I analyzed it for instance for day and hour patterns and last but not least did some machine learning in order to quantify the demand in between the stations there is a company blog article also you can find it down here there are all the figures in higher resolution and you can play around with the interactive map also if you like and to finally now conclude the presentation I would like to hear from you what you think about this analysis I'd like to improve with these kind of mini studies and therefore I would be very interested in your critique regarding the content, the presentation and general comments again you can email me to this email address here or alternatively I set up a Google forum so the Google forums document that exactly comprises of these questions and you can simply type them in if you're interested thank you very much alright first of all thank you for this amazing talk I have a few questions that have been relayed to me and I'm just going to ask them one after the other and let's not waste any time and start with the first one have you found parking decks that are usually heavily overloaded or never completely used? so given that there are only around what was it 8 or 9 or 10 in the data set honestly I never looked for that question so short answer is no, long answer yes I could have or I still could I would say okay have you tried prediction in time so guessing which parking decks will be exhausted soon no no so that's obviously it's like I would consider that something like the predictive maintenance of traffic business kind of it's definitely a thing that people that have more time and more are willing to invest more definitely should do and could do I would say I mean there's lots of additional data that might be of interest like weather data and for instance is it a public holiday yes or no and all that kind of stuff so again short answer no long answer yes would be possible okay so if anyone in watching has the time or energy to do they could absolutely yes okay and last question I have right now is will the code or especially the scraping part available publicly or like in the GitHub or somewhere and I could do that yes so I was very I was quite hesitant with that so obviously publishing the data could be problematic I have no experience with that on the legal side so I would probably not publish the data which is I mean it's old data anyway so but then regarding the code I was just waiting if anybody's interested so given that somebody stated the interest I would probably publish it yes okay yeah I think that's it from the question side and they were all answered quite nicely and judging by that I don't get any more questions right now so yeah I would conclude this talk maybe you can also like have a last word and from my side I'm done here yes so well thank you very much for watching the talk and I try to improve I think I said it on the last slide if I'm right let me know if you have any doubts or things to improve essentially on and then regarding maybe the last question of publishing it I believe that I put a link there to find my blog and I would probably just add another blog post stating well there's a GitHub repository you can go there and just find the code and stuff like that there yeah so if you're interested just you know find me find my website my name is Martin Lellep and then you will in a few days I guess probably in 2021 only so I won't be able to publish it in the next two days but then the code will be public yes okay then have a great day great time at congress and bye bye