 The pocket parker system uses these two events. Walking to driving indicates I left a spot, driving to walking indicates that I arrived. And I try to use those to maintain an accurate per lot count of how many spots are taken or available. The problem with this is the presence of so-called hidden drivers. So who are hidden drivers or what are hidden drivers? So let's say I see that there are two arrivals into this parking lot. And if I'm assuming that everybody's using the pocket parker system, then I would think that there are still available slots in this lot. But in reality, we don't assume that everybody is using this system. Even if you deployed it into some really common tool like Google Maps, you would still not have 100% adoption because somebody doesn't have a smartphone or uses ways instead of Google Maps or whatever. So 100% of the drivers that are using these parking lots are not going to be using our app. And what that means is at the same time that these two cars showed up, and I might think that the lot still has spots in it, there were a bunch of other hidden drivers that showed up at the same time and took a bunch of the other spots. And so if enough hidden drivers arrive, drivers that I'm not monitoring, then this lot could actually be full, despite the fact that I only saw two monitor drivers show up. So I need to understand the uncertainty that's introduced by the hidden drivers and be able to incorporate it into the models that we use to determine lot capacity. So doing this requires a couple of different steps. The first thing we have to do is actually figure out the actual capacity of the parking lot. To do that, we do this in a pretty simple way. It's not necessarily perfectly accurate. We use the computed area of the parking lot based on the information in the mapping database. And then we assume a specific parking spot size and use that to produce the actual capacity. So I'll call that C. And this is not necessarily perfectly accurate. You could do better if parking lots were annotated or the maps were annotated with this information, but this is a best guess, so it's good enough. The next thing I want to do is I want to compute the scaled capacity of the lot. So dividing the actual capacity by the scaled capacity gives me an estimate of the monitored fraction. So how many, what percentage of the drivers am I actually monitoring that are using a particular lot? So the way I do this is I maintain a count of the number of spots in the lot over time. And this count can actually go negative. It can go positive. This is just totally free form. It's just based on whenever I see a driver parking the lot, add one. Whenever I see a driver leave, subtract one. And this count does not reflect the actual number of spots in the lot because of all the presence of all these hidden drivers. But what I do is I look for big swings in this count. So I look for a particular day or a particular period of time where I see a big shift in this count in one direction. And I try to find the largest such shift over a period of time. And what I do is I assume that this is the scaled capacity of the lot. And then to get the monitored fraction, what I then do is I take the scaled capacity and I divide it by the true capacity. And that gives me this really important parameter, FM. So FM is the monitored fraction. As the monitored fraction goes towards one, I'm monitoring more drivers. And I have to do less modeling to get the updates to the lot estimates to be accurate. And there's less uncertainty in the system. As the monitored fraction goes towards zero, there's more uncertainty in the system caused by all these hidden drivers. And my estimates of the lot are less accurate. So OK, so now I have my estimate of the monitored fraction. Now what I do is I use that estimate to create a probability distribution based on the number of drivers that I see arrive. So let's say that I see that the two drivers arrived in the lot. And I know that the monitored fraction, in this case, is 0.1. So you might assume that the highest probability number of drivers that actually arrived, assuming my monitored fraction or representative, is 20. Because for every monitored driver, there's nine unmonitored drivers. But in reality, what I actually have is a probability distribution that we compute that's centered around 20. But that has a spread that's determined by this monitored fraction. And that makes a lot of sense. Because as the monitored fraction gets small, this distribution, despite the fact that it's still centered around 20, gets wider and wider and wider. Because the actual number of drivers that might have arrived have less certainty about. As this number goes towards 1, this distribution gets narrow, because I have more and more certainty. And it's less and less likely, for example, that the number of drivers that actually arrived is like 40 or 200 or something. So this is intuitive. This means that the monitored fraction is accurately representing something about the amount of uncertainty in the system. So given the number of drivers I actually see arrive, I compute this distribution. And I use this distribution to update a per lot availability model. So the per lot availability model says, for a particular lot that I'm monitoring, what's the probability that the number of available spots in the lot is some number between 0 and C? Where C is the capacity of lot and 0 means the lot is full. So this is the probability that there are a certain number of spots in the lot. Now, all I really care about when I'm making recommendations to drivers is the probability that there are 0 spots in the lot. But if I wanted to make that a little bit stronger, I could look, I might not send a driver to a particular lot unless there was a probability that there were a number of spots available, not just one. Now, what's interesting about this, so I use that distribution when I see arrivals. I use a distribution computed using the monitored fraction to determine how to adjust this probability. But there's an interesting feature to this problem, which is that at certain points in time, I know that there's a spot in the lot. So at some point in time, the probability that there's various numbers of spots might look like this. But when I see a driver leave, I know for certain that the probability that there are 0 spots is 0. And this is one of the ways in which we adjust this distribution. There's more details in the paper, but when we see a driver leave, we zero out the probability that there are no spots and renormalize this distribution accordingly. What the evolution of this distribution over time, so what happens is, depending on how much uncertainty there is in the system, the probability that there are 0 spots will be basically 0 at some point in time, but then some of this mass will quickly start to move in there, depending on where the mass is located. So if more of it is over here, and I don't think that the lot has very many drivers parked in it right now, it takes longer. If a lot of it's right here, it gets there very quickly. And so the evolution of this distribution has an intuitive connection with how many drivers there are in the lot. Now, if I don't see any events for a long time, what happens is what I would expect. This distribution flattens out. So at some point, if I haven't seen any events for a long period of time, Pocket Parker knows nothing about the state of the lot. And so the probability is pretty much the same that it has no spots or that it's full. I just don't have any information about it. So this is the basics of how we update this particular probability distribution. There are more details in the paper, and there's some additional subtlety, particularly in terms of how we try to incorporate some soft information about lots being full. Because one of the weaknesses of this model is when a lot fills, I no longer see any arrivals to that lot. But that doesn't mean that the lot isn't full, it just means that people are parking somewhere else. And so we do something where we look at the relationships between parking lots, and we use that information as a soft hint that there are no spots in lots that are more desirable based on their location to some sort of point of interest. So that's another feature of the system. But this is the basic idea. We compute the fraction of drivers that we're monitoring. We use that to compute probability distributions based on the number of arrivals that we actually see. Then we use that probability distribution to update the distribution, the overall distribution of how many spots are available, and we use that to calculate the probability that the lot has a spot. And we can also use it to calculate the probability that the lot has more than a certain number of spots if a driver wants more certainty that they should try looking for a spot in a particular one.