 So first I want to start a show of hands. Who has a car? Who has a car that they drive? Alright, lots of people. Who's found themselves wanting to pull their hair out when they're unable to find parking in that car? Yeah, big issues, right? So for me, I live in San Francisco and to give you an idea, this is an example of a parking sign. I don't know about you, but I can't even tell when I should park there or not, so I'm just going to keep looking for other spots. And this is actually really common around San Francisco and around a lot of dense cities, where parking rules can change depending on the hour of the day or the day of the week. And this is something that's a real issue and people get parking tickets all the time. I probably have way too many to be proud of, so I'm not going to tell you how many I actually have. But so it's a really difficult problem and especially with autonomous cars coming to fruition, this is something that we're going to have to be able to take care of automatically without having to figure out where someone can park or what time they can park. So what we want to be able to do is actually detect where are the parking signs and then extract the rules from those parking signs so that we can actually have intelligent information and avoid getting all those pesky parking tickets. So just to give you an idea, there's actually been a lot of work that's done in this field. So here you can see there's actually been pushes to redesign parking signs so that they're actually more easily readable, so you can just look at the parking guide and see, okay, am I able to park here or not. And there's a variety of reasons why we want to speed up this process. For one, it makes the streets a lot safer. So I don't know about you, but sometimes I'm driving down the road and suddenly the person in front of me stops because they're reading six different parking signs on the side that cause all kinds of traffic, it causes pedestrian issues. And so we want to avoid a lot of this. And what we can do is actually leverage deep learning and machine learning to actually help us reduce these problems. And there's even apps out there that are actually trying to fix this issue. So Spot Angels is a popular app that people use in San Francisco and they actually give you regions around the city that are available parking. But again, this isn't anything really intelligent. Basically what they do is they just go around the city and kind of index where all of the parking spots are and then predict whether or not it's going to be available but no one's actually doing any real deep learning or AI on this problem. And so once we have apps like this and apps that are actually intelligent you can start to imagine getting parking notifications to say, hey, your parking is running out. This parking section is going to turn into street cleaning in the next 30 minutes. So you might want to move your car and avoid getting towed or getting a ticket. And this is something that is really applicable to a number of different areas. So really how can we leverage AI to help us facilitate these parking issues and make sure that we can avoid getting tickets, improve the traffic situation, and help pedestrian safety? So for one, I'm sure many of you might have seen these Google cars driving around. So why don't we just use the information that they've collected on their cameras? You can see they have nine directional cameras on the car and there's a number of companies that are doing this, Google, Microsoft, here, Technologies. They're all really trying to map the world and understand not only parking but buildings and streets and signs. And so we can actually, there's a lot of this data that's available so what we did is we actually pulled a lot of the images from the Google API Street View to actually collect the data that we really need to create these models. And what we have to do is actually structure this data, right? So when we pull this information from Street View, none of it is structured. It's just a bunch of images. And so we actually went through a process that I'll walk you through of structuring the data and actually creating clean training data that we were able to train our model with and get a pretty accurate model. And just to walk you through, there's actually a number of challenges that we saw. So just in San Francisco alone, there's a wide variety of street signs. So there's street cleaning, you can park for two hours, one hour, no parking signs, et cetera, et cetera, et cetera. It keeps going on, right? So this is a wide variety of issues that we want to try and solve for. And then not only the parking signs, but there's also a lot of occlusion, right? So if you have trees growing on the street, oftentimes the parking signs are going to be occluded and it's really difficult for a camera to see these from that might be mounted on a car. And so we want to be able to handle all these cases as well so that, again, we have the most robust system that's able to accurately tell you whether or not you're able to park in a given location. And on top of that, it's a really messy world, right? So you can imagine all sorts of parking signs or gags or anything that people are trying to put up just so people won't park in front of their house, right? But it's a public spot so you should be able to park there. And so we want to try and avoid these sort of signs that don't actually mean anything when it comes to parking in that location. So these are a few steps that we took to actually clean the data, get the right sorts of annotations that we needed. So when we downloaded the data, this is the type of data we got. They're all panoramic images. And I'll show you how we actually went around collecting those images. But we collected a lot of them and then broke them into segments of images so that they're more digestible by human. And then what we did is we have a human annotation platform at Figure 8. And so we actually ran this data through a first pass to remove any erroneous data so that we can only focus on the images that do have parking signs. And once we were able to do this, then we went ahead and created the next job where we had these humans label all the signs in each of these images. And so by doing this, we're actually creating all of our training data and really streamlining it so that we don't have to actually manually identify where every single one of the boxes are. And this is again one of the advantages of Figure 8 having a human platform so that we can quickly get a lot of this training data. And so you guys know this data is actually available online. So you can go to this figure8.com slash data sets and we have a number of this including others with audio and open images that are free to download and this is the data set that we actually used to train this. So we have almost 3,000 annotations of parking signs in this data set. So how do we go about building the models for the parking sign detection? This is kind of a diagram of the sliding window detector. So you can see that you have your query image and then on this we actually compute gradients to pull out the feature vectors for the signs. And then we had SVM that was trained with a single class for the sign detection. And then we had a number of steps after to actually improve the accuracy of the model and remove false positives and avoid false negatives. And so here you can see that gradient feature extraction was used in 96 by 96 filter in these color spaces. And then we had this linear SVM trained on a single class which was just for parking signs and then using open CV to do all of our image processing. And then these were some additional steps that we took to actually improve the results of the classifier. So one of the things that we wanted to make sure is that any box that we saw only was associated with a single sign and each sign only had one box. So that's why whenever we had more than 70% overlap of boxes we actually removed one of those boxes. And then we also had to mine for missing attributes so that we could remove the false negatives and then using this SVM to actually improve the... or reduce the false positive rate. And then what we wanted to do on top of that is actually localize so that now you can detect the signs but we want to know where is the sign located in the geography and what part of the street is that sign associated with. And so the way that we did this is when we were collecting the data you actually collect the same sign from multiple viewpoints and so you can really triangulate where each sign is on the road and use that information to actually identify what parts of the street this sign is applicable to. And here you can see just some of the evaluations that we did so we're able to identify where each of these parking signs are. Obviously it's not perfect so you can see that it's actually missed a sign. But this is like what I said the way that we collected the data is we selected one mile by one mile areas in San Francisco and then we used the Google API to select random coordinates within each of those regions and then pull a 20 meter radius of all the images. So within any given region we had about 2,000 images that we downloaded and then like I mentioned previously we split those images up and then only extracted the parts that had parking signs in them and then sent those to the crowd to get them labeled. So to walk you through the deep learning models we actually used two models, YOLO and SSD to do our tests. So YOLO is you only look once and this is a really lightweight classifier so the idea is that you can classify a lot of images really quickly and it's pretty good at low false positives. However with the SSD although it's a little slower you actually improve the accuracy generally and you'll see better detections and more accurate boxes come out of the SSD. And so to show you the YOLO architecture this is what we used. So we had 19 convolutional layers and 5 max pooling layers. For those of you that don't know the convolutional layer is essentially like a matrix that you slide over the image and you run a function, a convolutional function on that to get basically a second image. And then you use a max pooling layer and you can kind of vary the size of the max pool to take the maximum value. So based on these RGB values you'll take the maximum and that's how you'll actually build out each of the layers of the network. And then so we had this pre-trained model that was trained on ImageNet and then we used transfer learning and fine tuning to basically adapt that model for this specific use case. And what we did there is we removed the last convolutional layer and replaced it with a 3 by 3 with the 1024 filters. And then we ran this model through 160 epochs on all of our training data and then used our validation data to check. And we actually went with a really low learning rate and reduced that throughout the epochs and that way we were able to really fine tune what we're specifically for this parking sign use case. And this is actually from the preliminary test that we did with our validation data. We saw some pretty good results. So as you can see the YOLO has a really high sensitivity meaning that when there is a parking sign in an image it's actually good at detecting that there's that parking sign there and that's something that we were trying to optimize for so that we don't miss any of these images or any of these parking signs throughout all of the images. And here you can see some of the predictions so you can see the box around each of the signs and we actually, it's pretty robust so it works on signs that are far away as well as signs that were partially occluded so we actually got some pretty good results there. And then there's also these challenging cases so oftentimes depending on your field of view you might have a very, very acute view of the sign so this is something that we actually want to attribute for in our next round of data is to actually have more of these cases so that we can detect the signs. But one of the things with these cases is in the real world you're going to have a lot of this sorts of data and even as humans, like when I look at that second image I can actually tell what's on the sign I can't read any of it I may be able to tell that it's a parking sign and so what we want to do is actually figure out smarter ways of not only detecting the signs but figuring out how we can use different fields of view and different sorts of like camera locations to actually get the most out of our data that's on these signs. And then when we also worked on a SSD the single shot detector and we actually used a different approach with the active learning approach so we did the same sort of normalization with the images initially so we got the panoramics we broke it up and took only the images that had parking signs in them but instead of labeling all of the data and sending this through the model we actually only selected 1% of the images and had those labeled from the crowd then what we did is we used the active learning approach to we trained a preliminary model of 1% of the data and all of the other data we fed it through the model each time and as the data was classified we used all the high confidence predictions we just took those directly and any low confidence predictions we actually sent that back to the crowd for labeling and the idea here is that not only did we want to test out how we could build a predictor for parking signs but we wanted to test an active learning approach to reduce the amount of data that was required so we actually did this and we went through three rounds of results and as you can see the results weren't as accurate as with the YOLO detector but we were able to use a lot less data and pay for less of that data and so with active learning you can actually spend a lot less and depending on your application that might be fine to have a lower accuracy if you can save a lot of money and that's what we found out in labeling costs then what we did is we actually tested this on signs outside of San Francisco so all of the data we trained on were within SF but then we wanted to test in New York and LA and as you can see we were actually able to detect the signs but as we move on what we would want to do is actually take the data from all of these cities individually and use a fine tuning and transfer learning approach so that we could build out specialized models of these because although the signs may look similar they're actually quite different in their content the types of street cleaning types of parking rules and so we would want to specialize these models depending on the city and country that you're in and so some of the future work that we're looking at is actually extracting the rules from this so what we did with this first set of data was just to detect where the signs were and now that you detect where the signs are we want to pull out these extract the parking rules from this and so this is kind of like an OCR task that we would want to do and understand what is the text on the signs and how are those rules correlated so on any given day of the week I'm going to use my contextual information of what time is it, where am I, what's the date and know whether or not I'm able to park in those locations and again as you can see there's a wide variety of texts that can appear on signs so even parking at 90 degrees you can actually get a ticket if you're not parked in the correct position and so this is something that we want to account for in the future versions of this work so just a couple things we want to really improve the ability to detect these parking signs and then extract the rules and then use this in applied AI so get this into cars get this into ubers that are driving around the city that may have a lot of cameras recording the data around them and use that to improve the accuracy of these models and actually deploy solutions so that when we do have autonomous cars this is something that we're going to really want to use so that we are making sure those streets are safe and we actually have clear paths and additionally we want to account for curbs so this is something that's really big in California is all the curbs are painted different colors so white means it's a loading zone yellow means it's commercial red is no parking and so again this has really messy data as well so as you can see that top picture that's actually a fake paint on the road that someone actually went in and painted so people won't park in front of their house and so it's actually pretty difficult to distinguish between the two and we can use AI to actually do that and make sure that we have the most robust system so yeah this is really a problem that we're looking to solve and I'm happy to take any questions