 Thanks so much. Well, it's a real pleasure to be here. Not only is Instacart located in San Francisco, but I live about three or four miles south of here. So it was a pretty easy commute for me. So I'm going to talk a little quietly through this mic because I've just gotten over a laryngitis and I want to see how long I can keep my voice for. So if at the end, I may sound a little bit like Kermit the Frog is what my six-year-old says. So just forgive that, not how I usually talk. I'm going to talk about data science at Instacart and I'm going to go through a narrative up front, kind of walks you through some of how we've used data science to drive profitability. And then if we have time after that, I'll kind of deep dive into a few interesting things we've been doing more recently that are fun, some related to optimization, some related to personalization. And then we'll take questions afterwards. So how many of you have used Instacart before? So maybe a quarter of the audience. So just to quickly give you an overview the value prop for Instacart's pretty straightforward. It's groceries from stores you already know and love deliver to your doorstep in as little as an hour. So that could be Whole Foods or Target Berkeley Bowl or Andronica's. Personal shoppers go get the groceries and literally bring them to your doorstep. So what does it look like? Let's start on the consumer side. So you download the app or you go online and the first thing you'll do as a consumer is choose one of the stores that we partner with. Then once you're inside of that storefront, you'll start to shop for groceries and you'll search if you haven't used Instacart before or browse or you'll shop from buy it again things you've purchased in the past. You'll then check out and you'll select a specific delivery time. And then after you've got that delivery time, the groceries come to your doorstep. You take all of them out, you put them into your pantry or refrigerator and then your pet crawls into the bag and you take a picture and you put that on Twitter and it gets retweeted a lot and you become famous. So that's how you use Instacart. The shopper experience is arguably even more complicated. So there is a completely separate application for the shoppers. So a shopper is on shift and they're working and they will be given an opportunity to do an order and they will accept that order. They'll go to the grocery store and they will navigate through a list of groceries and they will find each of the items that you want. They'll scan the UPC code if there is one on the product in order to validate that's the exact thing that you want. And if it isn't there, they can interact with replacements that are suggested and text message you. So after they've scanned the barcode and checked out, they'll go out for delivery and then the whole process is repeated. You take a picture, you're famous, you get the idea. So that's what Instacart looks like if you're either a consumer or a shopper. And that's kind of what you see on the surface, this interaction between consumers on one side and shoppers on the other. And while that's a big part of Instacart, there are two other parties and it's really a four-sided marketplace. So we have the stores themselves, the retailers. Our shoppers are going into these stores and that's where our inventory is kept. So they're having to navigate those physical environments and we're having to manage what the inventory is in those stores. Consumers express a lot of loyalty to these brands and so we really want Instacart to be a kind of retailer-first experience. It's not a product marketplace, it's a retailer marketplace that once you're there, it's like you're shopping in the store but you're doing it online. And then the other interesting side of the marketplace is the product side. There are consumer packaged goods companies who spend hundreds of billions of dollars on advertising and they would love to do that more effectively and efficiently and Instacart's a great platform for them to do that, to influence the consumer. And then we've got to help the shoppers find these individual items on the store shelves so we need to maintain a tremendous amount of data about all of the products that we shop for. So all of these sides have interesting data science problems. But what I really want to talk about today is this burning question of can Instacart succeed and how does data science help us get there? And so what do we need to succeed? The first thing is we need to have a large market. Fortunately, groceries is really big at $600 billion in the United States alone. So that's plenty large. We don't need a big share of that to be a large company. Consumers love us and so we've got a platform that people really like to use. So if it's a big market and consumers love you, the question then is can you make money? And at the very basic level, that starts with unit economics. When you go out and you do one of these deliveries, do you take in more revenue from our partners or from our customers than you pay out in expenses? And that really determines whether or not we'll be successful after these first two are taken care of or be another webvan. So let me double click into unit economics for you briefly. So the first thing is to look at what comes in above the line, what are the sources of revenue? And you start with the delivery fees. These are what the consumers are paying in order to have the groceries delivered. And then there are tips that consumers are paying that go directly to the shoppers. So all of the tips go to the shoppers. We also, though, get the product partnerships from the advertisers. They may be paying to promote specific products or placing advertisements on Instacart. And then we also work with hundreds of retailers not store locations, retailers. And all of those retailers that we work with will have contracts in place with, that are partners, in order to do rev share. And so we'll get money from them as well. So that's what comes in on top. What we spend to do a delivery, first there are transaction costs. These are credit card processing fees, insurance, et cetera. But the bulk of the money below the bottom of the line is the time spent shopping for the groceries in this store. And the time spent driving them to the customer addresses. So minimizing the time that our shoppers spend while keeping the quality of the service high is incredibly important for us to have good unit economics. And we've done that. We've gone from losing money on every single delivery to making money, and not just cents, but making dollars every time we do a delivery. And this just shows the minutes that it takes to do a delivery at Instacart and how that's declined over time. And so we started out at, it's indexed to 100% because we don't want to disclose the exact number of minutes. But we got profitable unit economics some time ago and have been continuing to push further efficiencies and have plans to keep pushing this. There are lots of things we can continue to do to make us more and more efficient. So just earlier this week, our CEO, Apurva talked at TechCrunch and he said a number of things that I was really excited about to have them public. So since the beginning of last year, revenue's grown by 500%. 90% of our customers are repeat customers. People really love the product. If you look at the Instacart Express customers, those are people who have signed up to not pay delivery fees. They spend on average $500 a month on groceries, which is essentially their entire grocery bill. And will be cash flow positive in the next 12 months. That means you take all of the unit economics that we're getting times the scale of deliveries and that starts to cover all of our expenses, which is pretty exciting. So data science at Instacart is pretty fun. Big part of why I joined Instacart was because of all of the opportunities and the complexities. And so some of the big complexities, the first one is this marketplace dynamic that it's a kind of complicated service involving these four different entities. And you have problems at the intersection of every connection point and data being collected and tracked at every point and a lot of influence over them all. Another fun problem is variance. It turns out that I spend a lot more time thinking about variance than mean because there's a lot of complexity in delivering physical goods quickly in the real world. We have things like the weather, but also things like the Pope visit that happened last year, where every city that the Pope visited would gridlock parts of the city and cause us a tremendous amount of problems, traffic patterns, et cetera. And then also the time dynamic. We're trying to do delivery in as little as an hour. And that's radically different from trying to do delivery the next day at some scheduled time. Everything has to change if you wanna do it in as little as an hour. And it's picking the groceries really quickly. It is standing in the checkout line and trying to avoid that if possible. It's also parking. In some cases, we actually shove our cars underneath other cars in order to deliver groceries. We don't actually do that. Somebody in the back nodded and he looked like he was really doing this. Maybe in the future, forklifts or something. So let's get into the meat of what we're actually doing. So we'll talk about optimizing minutes in kind of two angles to that. One, balancing supply and demand, which is first and foremost, how do you measure something like demand? But also how do you forecast, schedule, adapt? And then the optimization of the fulfillment problem itself. So we have all of these personal shoppers. We have all of these orders coming in. We have to be able to make predictions about them all. We have to plan them. We have to evaluate the plans. We have to dispatch. So I'll go into some topics on all of these. So let's start on the balancing supply and demand side. And the question is, what is demand for a service like Instacart? Well, it's easy to measure deliveries, right? We count those every minute, every second, every day or year. So deliveries are very easy, but when you come to the site, you see something like this. You'll see delivery windows for various times in the future. And if you haven't paid for Express and you're in New York, the delivery fees, they can be meaningful. And maybe there are some sales at some time slots. And at other time slots, we're doing surge pricing or busy pricing where the prices increased. Sometimes slots might not be available at all. So there's really three outcomes. Either you go through and check out and pick a delivery slot. Or you walk away, but you'd intended to. You would have done it had there been no busy pricing and had everything be available. Or you walk away because you never really had intent to begin with. And when we're constantly controlling and changing busy pricing and availability, it's really important for us to distinguish between two and three. Because you can imagine a situation where I manage the supply such that I have supply to do the sort of 10th percentile of the distribution of expected demand. Most people then won't have availability. They'll have surge pricing. They'll walk away. And I won't know if I'm not measuring this what demand could have been and what maybe I should have staffed to. And so we're constantly building models to predict for every visit. What's the chance the user would have converted had they seen full availability? And then we go back through all of the visit data to make predictions, to estimate for every one of the visits, what's the fractional number of deliveries we would have gotten had we had full availability. And then we take that and subtract off the total deliveries to get an estimate of what we call lost deliveries. So we're tracking that as a percent and trying to minimize it. And then we're also using this as the basis of what we want to forecast in staff against. So the next problem is, okay, how do we actually forecast demand? And it's kind of complicated because it's not national, it's regional. It's within a region as a specific retailer on a day at an hour. And then we have to turn that into the actual hours for the shoppers to do deliveries at that hour, due in a specific window. And so if you look at, you know, this is a regional set of time series. If you dig into one of those regions, these are the time series for all of the warehouses or the retail partners that we work with. I call them warehouses. It's an internal term. I'll try to use retailers. So these are all of the retailers. For one of those retailers, these are the days of the week. For one of those days, these are the hours of the day. And so you can imagine trying to make accurate predictions at all of these levels of granularity for a pretty big business and it's millions of forecasts. So one of the first things that we've done is to improve outlier detection and be able to remove some of the exogenous shocks. They could be holidays where, you know, stores will have different hours. Consumer demand will change radically. Shoppers behavior will change radically. Could be storms where the same kind of things happen or it could be pope visits. So we're analyzing all of the time series data to figure out in what regions at what times we're having anomalous events. First and foremost that we can just strip them out of all of the history of the data that we use such that they don't influence everything else that we do. But increasingly what we're doing is looking forward into the next month for every one of our regions and trying to identify with our local teams all of the specific events and then analyzing what happened at events like those in the past or those specific events to make adjustments in advance. Once we have the outlier detection, the next piece is to back test different forecasting models. So this is just one experimental design where we're going back in time across a bunch of different regions and selecting specific days for those regions to use as a test data point for the region. So we'll build a time series on all of the data to the left of those blue points, build time series forecast model to test them on the blue dot. And these are about 10 different forecast methodologies and how they've changed in performance over time. And so you can see that all of the methodologies were really struggling around the holidays. Everything becomes very chaotic. People's patterns of behavior change. And then they've improved since but there's been some diversion. And there are some of these methods that dominate others. So we're increasingly getting better at building these forecast models. There's a lot of things that we'd like to do to get even better. One is modeling that hierarchical structure explicitly, not having to just pick a level of the hierarchy and do traditional time series forecasting but being able to leverage the fact that how demand at one warehouse or one retail partner changes maybe similar to how another one changes in the same region. One interesting observation is just how accurate can our forecast be as we scale? And so each of these are different regions operating at different scale. So the x-axis is the size of those markets and number of deliveries on a log scale. And the y-axis is our forecast accuracy. So what's encouraging is that the forecast accuracy goes down as the market gets larger and is we're better able to predict what's happening there. But it also asks them to, it doesn't get down to zero. It remains pretty high. And it doesn't seem to get better after a certain level of penetration and scale. There are some outliers on the right side. Those are constrained markets where there are real limits on how we can operate. And so you have these constant shocks to supply and demand and they become really difficult to predict. So I'll take questions at the end, if you don't mind. I'll try to hold them because otherwise we won't make it. So what this net means is we have to have shock absorbers for delivery. And so one of those I talked a little bit about earlier are these delivery options. And so this is a screenshot for San Francisco yesterday for a couple of the different store locations in San Francisco, the internal tool we use. And you can see different delivery options like at Safeway, one hour, two hour, seven to eight, eight to nine, et cetera. These numbers are actual estimates of how many deliveries we think we could take. And then these are color coded. Green means everything's fine. Yellow means we've got so much capacity we're gonna discount the delivery fee to try to drive additional demand so that we don't have idle shoppers. Blue means we're busy pricing. We've only got capacity for two so we wanna gradually increase the price of busy pricing such that ideally everyone pays just what they're willing to pay to get their delivery if we have supply side constraints. And then red means whoops, we've got nothing. So we've completely turned that one off. So these are all predictions that are constantly being reevaluated as we're watching our plan for what our shoppers are going to do and what we're seeing happening on all of our applications for consumer demand. So now let's switch to another problem which is okay, we've got a bunch of shoppers, they're in a city, they're in different role types, they're sitting at different locations. And we've agreed through the delivery options to have thousands of deliveries coming into that city. Now we've gotta make sure these shoppers can go do all of those deliveries. And so that's the fulfillment side of the problem. And the first thing is, well we have to make sure the deliveries are on time. These deliveries have a time window and it's usually an hour long time window. And consumers really care about that. So this is happiness on the Y-axis measured by the feedback they give us on the rating for the order. And you can see that if we're on time, they're the happiest but really they prefer to be slightly early even to the delivery window. And if we show up like five or 10 minutes late, they start to get angry, not five to 10 minutes late, I'm sorry, five to 10 minutes before the end of the window they start to get angry. Because we've set an expectation, we've told you two to three and we show up at 255. Like your friend that does that, like I don't know, that's not cool, like you just barely made it. So consumers treat us the same way, they really want us to be early in the window, they get really upset if we're late. And we can be pretty early, we have to show up like an hour early for them to actually get angry at us, which is nice. So it's important that we understand for any one of the possible combinations we're evaluating, how long is it gonna take the shopper to go in and shop for all the groceries at the store? How long is it going to take them to drive? And so this is just a comparison. On the Y-axis is the actual delivery time for a bunch of shoppers. Here is just correlating what Google Maps would give you as an estimate for how long it would take to do these different trips. And you can see that the R-squared's about 26%. For Instacart, our model gets up to about 50%. So it's not a trivial problem, it's kind of unique to Instacart. Why does it take us, why is Google Maps travel time not enough, a big part of it is parking? Google Maps doesn't assume that you have to park, but you also have to walk the groceries up to the location, take them upstairs. So there are a lot of other features besides just the distance and route traveled that we need to account for in these predictions. So in order to predict the fulfillment times, a part of why it's so important to understand is we do something called batches. So suppose we only had three deliveries to do, delivery one, two, and three, and they were ordered at specific times in there, due at specific times in the future. Well, we might have shopper one in the store pick order number one, shop for an order number one, and then shopper two, shop for order number two and three in sequence. And these are in store shoppers, so they're dedicated to just shopping orders over and over and over again. Then there's gonna be a handoff and one driver is gonna pick up all three of those orders and deliver them in sequence. And so it's really important not just to know how long it's gonna take to forecast to the customer, but we have to evaluate this combination and ask the question, what's the chance that delivery number two is going to be late if we deliver it in this batch as a whole? Which is a function not only of how long it will take to shop for delivery number two and drive for delivery number two, but the variance in those estimates and the variance of all the other orders in that batch. So we use quantile regression in order to estimate specific points in the distribution of all of the times that we use. Our models are gradient boosting machines so we can put in complex time and space features. One interesting piece is we have to update the predictions frequently throughout the fulfillment process. So it's something that we need to continuously reevaluate and it has to be millions of predictions per minute, probably more than that now. The reason is we're evaluating all of these different combinations. And so for each of the combinations, in memory we need to be able to get a prediction so you can't call out to an API. It has to be done really quickly, co-located in the process that's doing the optimization for fulfillment. Okay, so we know the predictions. Now we need to decide how to route the shoppers. Suppose we only had four deliveries. This is an actual screenshot of a batch of these four deliveries. And so the first delivery was picked over this time period, the second over that time period, et cetera. And then the shopper who is gonna drive, they acknowledge the batch here, drive to the store, spend time in the store, getting the bags for all of the groceries, drive to the first address, the second address, the third address, and the fourth address. And fortunately in this case, everything worked out and everything's on time. You can see if you line this up, this one is this delivery. So this customer is probably not gonna be super happy with us. This is one of those last five minutes, which is something we'd like to avoid. And so this is a map of the trip that the shopper took. Yesterday when they did this delivery. And so if we only had to do four deliveries in a batch, we would solve little mini-traveling salesman problems and everything would be wonderful. But it's not so simple because we have thousands of shoppers on shift, many thousands of deliveries coming in. And so we actually have to solve for the assignment problem of which shoppers are gonna do which batches and which sequence. And this is a screenshot of the system that we use to track what's happening with the shoppers themselves. I won't talk you through the whole thing, but this is one shopper, this row. This is their shift. And then these are various points where they're accepting batches, picking groceries, driving, doing deliveries, et cetera. And so we watch all of this such that our happiness team can intervene and try to solve for problems in real time if need be. But we also are tracking all of the data associated with all of this. The nature of this problem then is something called the vehicle route planning with time windows. And it's an NP hard problem, which basically means that nothing we do will ever solve it perfectly in scale. It's non-deterministic, right? Because we have these predictive models with a lot of variants and what's actually going to happen. We also have multiple roles and those roles all have cues associated with them. So there's a role of people who's dedicated to picking the grocery, shopping for the groceries in a specific store location. I might have six of those people and it's not just a question of what work should they do, but I've got to figure out, am I keeping them busy? And if they're not busy, the work's going to have to overflow to someone else. And then it's also very dynamic. We're constantly getting new orders in, shoppers are constantly canceling and so the quality of any one of our plans degrades pretty quickly over time. So every minute we're kicking off an entirely new planning operation to figure out what we want all of the shoppers to do over the next four to five hours. This is just an earlier map and if you do the math, you've got 1,000 orders, say four orders per trip, 1,000 shoppers, that's 400 million combinations. We want to go through all of those possible combinations or a subset of them and we want to optimize for making sure that the number of items found is maximized. We've got lots of whole foods locations we could select, which one has the inventory that best aligns with the specific order and are we willing to trade off distance and inventory quality? We want to maximize the probability of being delivered on time, we've talked about that and we want to minimize the total time spent in the system. If we minimize the total time spent in the system that means we've got the most gaps in the plan that will increase capacity, we'll be able to get more orders in and then ultimately we'll be able to plan staffing more accurately. So some of the things that we've done to make significant improvements here, one is to remove as many constraints as possible and try to unify objectives. Early on we would try to optimize for minutes and we would have constraints around the delivery time and constraints around the item quality and it's much much better if instead you quantify well what are each one of those things worth to us? Unify that into a common objective and then let the optimization algorithm solve for that objective. We recompute it every minute in every market. We've really started with very greedy heuristics. It's such a hard problem that trying to solve it optimally would have taken us a year to launch. So start with very intuitive, greedy things and then iterate on those greedy heuristics. Increasingly now we're taking sub-problems here like what are the optimal combination of deliveries into batches? And that's a sub-problem that can be solved optimally offline for markets up to a certain scale. Or the sub-problem of given a set of batches what's the optimal paths for the shoppers to go through those batches? That can be solved offline at scale. In some cases we can put some of those things into production, in other cases maybe we can learn what's happening in those optimal solutions and try to replicate that into a real-time solution. The final piece is to wait to the very last minute to dispatch and this seems intuitive. It's something that we did very early on. You make all of these plans, you constantly reevaluate it but don't tell anybody what to do until you have to because it's constantly going to change and you wanna have the very best possible information and so the shopper won't know exactly what bags are gonna pick up at the store until they are at the store, ready to pick up the bags because we may have been changing what orders they're going to do right up until they walk up to the staging area and so we're continuing to make improvements on how quickly we can make those kinds of decisions at the last minute. Some changes are local. This is a map of Manhattan if you go back about three weeks and each of these these are Whole Foods locations and you can see the different routes that shoppers were taking. In some cases they're doing batches like this is two deliveries and then going back in order to do all of the deliveries. This is what it looked like earlier this week on the same day and so it's maybe a little bit tough to tell but look at how many people are crossing Central Park and it's a lot fewer here than it is over here and if you've ever lived in Manhattan or been a tourist in Manhattan getting across Central Park can be a nightmare. There's a few ways to get across Central Park so it can take a lot of time and so this was a specific change that we made in Manhattan where we may be sacrificing utilization how busy we keep a subset of our shoppers in order to let another subset of our shoppers move faster and we'll test to see whether or not this is a net improvement in the system efficiency as a whole. So to give some metrics around some of the changes we've driven in the last year to help us get to profitability on the customer side we've been able to reduce late deliveries by 20% and keep lost deliveries where people come and they can't get available windows keep that constant keep it low. On the shopper side we've been able to increase the speed with which they move by about 20% and make them 15% busier. This is what it actually looks like in practice. These metrics trade off against each other so utilization and lost deliveries are a great example. I can staff 2x the number of shoppers that I think I need in a market and everyone on the consumer side will have all the availability for deliveries they want but on average my shoppers are going to only be busy 50% of the time. So that's a really easy solution or I can staff half the number of shoppers and really tick off all the customers but my shoppers are like constantly busy and so a lot of changes are easy to go back and forth this way. There are lots of ways to do that and to trick yourself into thinking you're making progress but really you're just trading off on an efficient frontier and excuse me, the really hard work is pushing that efficient frontier forward and I think that's where data science really shines is improvements in the algorithms themselves lead to better more optimal solutions that move you across an efficient frontier or push it forward. So I'll talk a little bit about how we organize and then I'm gonna go into a couple of other special topics so we're very mission driven in how we organize engineering and data science what that means is we have a whole bunch of different teams they focus on specific niche areas of our product portfolio and they have a mission they're gonna make the very best shopper app possible or they're gonna make the best consumer experience possible or help us to balance supply and demand in the best way possible and within those teams we'll have product management dedicated, we'll have designers we'll have an analyst but the data scientist is integrated into that team they report into the technical lead who might themselves be a data scientist or they could be an engineer and they're working side by side with the engineers to solve the problems holistically and what that really means in practice is that the data scientists don't show up after the product's been created and logging is set and data is already streaming and the roadmap is set for three months and start trying to ideate to figure out how to interject themselves into the process they're there from the beginning and they're setting the scope of the product and what data science is going to do and what data needs to be collected and how it's going to be integrated and then they're there at the end they get to see it survive or fail and they deal with the implementation issues they watch the A-B test results of successful products I put into production probably one out of 10 works well on that first iteration and it's really easy to put something into production not rigorously measure and watch it and just let those crappy releases sit if you're not there side by side with everyone so it's something I'm pretty passionate about and then sometimes we have to solve problems that really span the organization you know we need to completely change how we're doing delivery I can't do that in data science I can't rent a bunch of trucks and hire a new role type and train them and figure out what their compensation strategy should be and so when we do these really cross-organization cross-team things we bring together working groups try to really empower them and make them short-lived so that we can solve these cross-platform products problems some principles we very much prize urgency trying to move really really quickly we set clear goals and we try to make sure we're uncomfortable and this is a lifestyle change like you know you have to be willing to get into this and really go after it and be comfortable with it and we really screen for people whether or not they they like that kind of environment of you're gonna release a product change in two or three days and you'll see what happens and it's okay you know things don't have to take six months we're very transparent we try to share everything we tend to have a policy of when we get a new platform we set the permissions globally open in the organization so everyone can access and see everything and then we really take ownership and it starts at the top our CEO you know sends out an email every week with an update on the objectives that we're driving towards as a company and the progress that we're making on those objectives so there's this real sense throughout the organization of accountability and ownership that we call it internally this is your baby you know you've created something Instacart itself or your product and you own its success so that's the main narrative threat and I'm gonna do a couple of the sides now on some other topics so first one is when I'm really excited about buying it again so this is a gift of me shopping and the first thing I do is I start buying things on buy it again and then I click in and there's hundreds of things to go buy and I'm going in and adding these things to my cart so in the matter of about you know 15 or 20 seconds I add about eight things to my cart and what we do is behind the scenes we're actually ordering the buy it again using data science and algorithms and so what we're doing is predicting for every user for every item they've ever purchased in the past what's the chance that if they showed up today they would buy that item and then we make those predictions for everyone and then order items and buy it again in proportion to those predictions so we're using gradient boosting machines built in Spark with lots of other features you know feeding into the models you can imagine a lot of it is the inter arrival time patterns I might buy yogurt every five days bananas every week protein powder every month you know pasta sauce every two weeks I might behave in a similar way to thousands of other users with respect to some of those inter arrival times but in different ways for other users depending upon what they have families etc so there's a lot of richness and complexity that you can discover there and I really love this experience of I'm a forgetful person and so I use this experience to remember what it is I need to buy Instacart is generally better at this point and reminding me to buy things than I am which is probably more about my failure as an individual than our success in algorithms your mileage may vary but I really like it recently we sent out started sending out emails if you haven't ordered after a certain period of time it's a really helpful email hey don't forget to order on Instacart and we suggest the items from your buy it again that you'll be most likely to buy and we want to structure the experience so when you come back online they're there and you can just deselect the ones you don't want you know go add all to cart and check out and you can literally buy your groceries you know in the 30 seconds you know between meetings if you're a busy person I mean don't have to remember all of these things that's increased order rates for the people that we've sent it to 10% over the control on AB tests which is pretty exciting so let's talk about another one that's more geospatial first and foremost I'll say I am not at all an expert in anything geospatial so this is what you get when you take a data scientist that kind of plays with this data to try to do something useful so I'm very open to your suggestions and ideas but what I wanted to do is to go in and identify where to shoppers park and the easiest place to start is to look at the actual store locations because we have many shoppers visiting these store locations but why would I want to know about that why would Instacart care there's a few reasons one we might want to understand the time they are spending looking for parking and figure out where to minimize that how to optimize it the other thing is we might be able to ultimately make recommendations to our shoppers you know park here or park there so on the left is a map and this is the journey of an individual shopper and the specific path they followed with roughly one minute lat-long updates from the phone is highlighted and the arrow shows their direction of travel once they get inside of the store the arrows are stripped off because otherwise it's just very chaotic and the color here is using Viridis which I think was invented here which is super exciting but to map that speed to how fast the shoppers are traveling and so you can see the actual time series that we're looking at up here is the accuracy of the measurements this is in square meters and so when the shopper is outside of the store location and the accuracy is pretty good it's within plus or minus five meters but when you get into the store all of a sudden the store obstructs the GPS signal and it's plus or minus twenty meters and then it drops again on the outside this is their distance to what we think is the center of the store and then this is the speed as measured by how quickly they're moving in the lat-long updates you can get the speed off the phone itself which is at a point in time how fast is the phone moving that's not quite as interesting for my purposes because it's measured so infrequently I want to understand their general speed and so the question then is where did this specific shopper park and the heuristic that we're using to try to understand this is look we know when they're picking because we log them doing the picking activities in the store so wind back before they start picking and look for the first point at which they slow down dramatically and in this case it's this dot right here is the first point at which their speed suddenly changes and the terminus of that of that point is a good estimate of where they may have parked on the flip side when they leave look for the first point where they increase speed significantly and so that's this one so either one of those dots could have been the spot that this shopper parked at and then something you can do is you can look at for a specific trip how far apart are these dots if they're co-located then either we get really lucky or it's a pretty good estimate of where they parked if they're far apart then it's a pretty good chance that one of them must be wrong and so you can discard them both sometimes it's really really hard to tell this is a trip where I just don't think we have the data needed they come in quickly and then slowly go into the store and then leave slowly and then accelerate here and it's possible that they parked anywhere up here because of the time gap in this measurement it's hard to say exactly where they left their car but if you aggregate this over you know thousands tens of thousands of shopper trips you actually start to get especially if you filter out the estimates that don't seem that good a pretty decent heat map this is a target in San Francisco and you can park in this lot or you can park in this lot or you can actually park up on the roof and then go through the roof of the target but you can't park over here or over here you'll only be able to get into best buy and so you might make that mistake once but if you're an Instacart shopper you go there frequently enough you figure this out and so those are the good spots to park at this target I think this is a whole foods location maybe Andronico's location and there's a parking lot here it's pretty easy to access most of the time they can park in that parking lot but if you look at this actually over time this can get really busy and the parking will overflow onto the streets at certain hours so what we'd like to do with this is look at the delivery addresses and try to blend many of these estimates because there not everyone's going to the same location but you can look at how far any delivery location is to a given location of interest and blend them together into an estimate for a location another really interesting challenge how do we optimize the shopping in the stores and so one thing that we did is we went back and looked over all of the whole foods orders in San Francisco how many orders did we have where two shoppers did the exact same set of groceries where there were five or more items in the total order and so these are all of those over a period of time and there's not a ton of them what's interesting is first these are both log scales and so this is how long one shopper took and then another shopper took this long to do the exact same order they correlate which is good and generally as the number of items increase the time it takes increases but look at the outliers where one shopper takes a thousand seconds to do a trip and another shopper takes just 200 or 300 seconds to do the trip so the exact same items, exact same store three times as long to shop for them so incredible opportunities to continue to drive efficiencies and consistency in the store this is a transition matrix if we look at when our shoppers have to go from one department and one aisle in the store to another department how much longer does it take them to shop for an item in that department than if they'd already been in the department to begin with some departments like bulk take a long time to find it, to bag it other departments are really easy something like produce can be quite quick so the question is how much does the transition from one department to another change things and you can see that like if you're gonna go from produce as a department to pick dairy and eggs that's generally not very good the time multiple is oftentimes 2x or plus indicating those are probably far away but within the dairy and eggs department the specialty cheeses is pretty good that's an aisle that's oftentimes that Whole Foods co-located to the produce and so you can imagine either just using data like this this isn't something that we've done in production yet but either using data like this or we're also tracking the gyroscope and accelerometer data of the shoppers when they're in the store we can start to reconstruct the path the shoppers are taking the distance of any pair of items and optimally route shoppers when they're picking items in some cases we get that location data from partners and so this is an example where we have some of that location data and we can pop up a map and show the shopper where all the items are where they're located what they should go pick in what sequence another fun thing is just what's in the store we get files from our shopping partners but one thing we've tested is just go down the aisles and take video of the aisle tags and then do OCR on the images from the video and with 93% accuracy we can identify the product in our catalog associated with one of these items ultimately it would be wonderful if when you're looking for half and half you know we could show you a picture of the dairy section and highlight exactly where half and half is located do something like image recognition on the products you know using images we've collected ourselves or have collected through robotics or cameras that we've placed you know or where is the baby spinach a fun experiment that two of our data scientists did they both went and shopped two different orders the first one they would shop the order they would capture where everything is and they would create these maps to guide the other person to go do the exact same trip but with all of the map laid out and these images saying exactly where the product was and then they switched and had the other one do the exact same order but now with the benefit of the map and they were able to reduce the time that it took them by in one case half and another case by two thirds by having that kind of information now data scientists are not good shoppers so I don't think we'll get the same kind of lift if we put it into production with our really experienced shoppers but it definitely shows some of the opportunity so that's it I am Jeremy.Stanleyatinstacart.com you can reach out to me directly that way you can follow me on Twitter obviously we're hiring it's a big part of why I go out and talk a lot and I'll stay around for some questions so thank you