 From theCUBE Studios in Palo Alto in Boston, bringing you data-driven insights from theCUBE and ETR. This is Breaking Analysis with Dave Vellante. Uber has one of the most amazing business models ever created. The company's mission is underpinned by technology that helps people go anywhere and get anything. The results have been stunning. And just over a decade, Uber has become a firm with more than $30 billion in annual sales and a market capitalization of nearly 90 billion as of today. Moreover, the company's productivity metrics when you measure things like revenue per employee are three to five times greater than what you'd expect to find in a typical technology company. In our view, Uber's technology stack represents the future of enterprise data apps where organizations will essentially create real-time digital twins of their businesses and in doing so, deliver enormous customer value. Hello and welcome to this week's Wikibon Cube Insights powered by ETR. In this Breaking Analysis, Cube Analyst George Gilbert and I will introduce you to one of the architects behind Uber's groundbreaking fulfillment platform. We're going to explore their objectives, the challenges they had to overcome and how Uber has done it. And we believe the company is a harbinger for the future of technology. Now the technical team behind Uber's fulfillment platform went on a two-year journey to create what we see as the future of data apps. And it's our distinct pleasure to welcome to the program Ude Kiron Medesetti, who's a distinguished engineer at Uber. And he's led, bootstrapped and scaled major real-time platform initiatives and his time at Uber and has agreed to share how the team actually accomplished this impressive feat of software and networking engineering. Ude, welcome to the program. It's great to see you. Hello George. All right, Ude, Ude. Start, if you would, tell us a little bit about yourself and your role at Uber. Yeah, hi George. Hi Dave. Super nice to be here. I joined Uber back in 2015 when we were primarily doing on-demand UberX and we were primarily in North America. And over the last eight years, I have witnessed Uber's tremendous growth. How we have expanded from on-demand mobility to all kinds of personal mobility, how we have expanded from just mobility to all kinds of delivery. And the mission that you just said, go anywhere and get anything, that is the total accessible market of that is insane around the world. And that's what drives us here and that's what kept me here with the same energy even after eight years. So I work on the core mobility business and a bunch of foundational business platforms that are leveraged across mobility and delivery. I also lead Uber-wide senior engineering community where we set best practices so that we can move at the same pace across all of the engineering team across Uber. So yeah, that's my quick intro. Yeah, I remember the first time I ever used the Uber app. I was stuck in the hinterlands outside of Milan. No, couldn't get a cab. And I said, I'm going to try this Uber thing. This was like the early part of last decade. And I was, I was, it was like my, it was like a chat GPT moment. Now back in March, George and I and just last week as well introduced to the audience this idea of Uber as the future of enterprise data apps. And we put forth the premise that the future of digital business is going to manifest itself as a digital twin that represents people, places and things. And that increasingly business logic is going to be embedded into data versus the way it works today. And applications are going to be built from this set of coherent data elements. So when we go back and look at the progression of enterprise apps throughout history, it's, we think it's useful to share where we think we are on this journey. So George put together this graphic to describe the history in simple terms, starting with 1.0, which was departments and back office automation. And then in the orange is the sort of ERP movement where a company like Ford, for example, could integrate all its financials and supply chain and all its internal resources into a coherent set of data and activities that really drove productivity kind of in the 90s. And then Web 2.0 for the enterprise. So here we're talking about using data and machine intelligence in a custom platform to manage an internal value chain. And where we're using modern techniques that we use here, the example of Amazon.com, not AWS, but the retail side of the operation. And then in the blue, we show enterprise ecosystem apps. This is where we place Uber today. Really one of the first, if not the first to build a custom platform to manage an external ecosystem, different of course from the gaming industry that we show there on the right-hand side. And our fundamental premise is that what Uber has built, and we're going to get into this because Uber is on its own journey, even within that blue ellipse. But our premise is that eventually mainstream companies are going to want to use AI to orchestrate an Uber-like ecosystem experience using packaged off-the-shelf software and services. And so you see most organizations, they don't have a team of Uday's, they can't afford it, they can't attract the talent. So we think this is where the industry is headed. And Uber is a harbinger example. And George, you have a burning question for Uday, so go ahead. So Uday, it's a big picture question, but it has to do with like helping people like understand not just the consumer experience of the app, but the architecture of an application that is trying to orchestrate an ecosystem and how different that is from where we've been, which are these packaged apps that manage repeatable processes that were pretty much almost the same across different businesses with maybe room for customization. It's so radical, and we are so accustomed to living in it out here in tech bubble land, but tell us, help us understand sort of big picture. What a big transformation that is from the applications point of view. Yeah, so one of the fascinating things about building any platforms for Uber is how we need to interconnect what's happening in real world and build large-scale real-time applications that can orchestrate all of this at scale. You know, like there is a real person waiting in the real world to get a response from our application whether they can continue with the next step or not. If you think about our scale, like in the last FIFA World Cup, we had 1.6 million concurrent consumers interacting with our platform at that point in time. This includes riders, eaters, merchants, drivers, couriers and all of these different entities, they are trying to do things in real world and our applications has to be real-time, they need to be consistent, they need to be performant and we need to be, and on top of all of this, we need to be cost effective at scale because if we are not performant, if you're not leveraging the right set of resources, then we can explore our overall cost of managing the infrastructure. So these are all some unique challenges in building a Uber-like application and we can go into more details on various aspects and both that breadth and also in depth. Right, yeah, so Uday, I mean, this vision that you sort of laid out, it requires an incredible amount of data to be available as you said in real-time or near real-time. Uday's team, there are a couple of key blogs that we'll put into the show notes. I mean, I've probably got seven hours into them and I'm still going back and trying to squint through them so we really appreciate you sort of up-leveling it here and helping our audience understand it. But what was it about the earlier 2014 architecture? You described this in one of your blogs that limited the realization of your mission at scale and catalyzed this architectural rewrite and we're particularly interested in the trade-off that you had to make that you've talked about in your paper, your blog, to optimize for availability over consistency. Why was that problematic? And let's talk about how you solved that. Yeah, if you think about back in 2014 and what was the most production-ready databases that were available at that point, we could not have used at that point in time traditional SQL-like systems because of the scale that we had even at that point in time. And the only option we had, which provided us some sort of scalable real-time databases was no SQL kind of systems. So we were leveraging Cassandra and the entire application that drives the state of the online orders, state of the driver sessions, all of the jobs, all of the waypoints, all of that was stored in Cassandra. And over the last eight years, we have seen the kind of fulfillment use cases that we need to build that has changed a lot. So whatever assumptions that we have made in our core data models and what kind of entities we can interact, it has completely changed. So we had to, if not anything else, change our application just for that reason. The second, because the entire application was designed with availability as the main requirement and latency was more of a best effort and consistency was more of a best effort mechanism. Whenever things went wrong, it made it really hard to debug. For example, like we don't want a scenario where if you request a ride, two drivers show up at your pickup point because the system could not reconcile whether this trip was already assigned to a particular driver or it wasn't assigned to anyone. And those were real problems that would happen if we don't have a consistent system. And so the three main areas of problems at the infrastructure layer at that point, one is consistency that I mentioned already. And because we didn't have any atomicity, we had to make sure the system automatically reconciles and patches the data when things go out of sync based on what we expect the data to be. There was a lot of scalability issues because we were getting to a best effort consistency, we were using at the application layer some sort of hashering and what we would do is, oh, let's get all of the updates for a given user routed to a same instance and have a queue in that instance. So that even if our database is not providing consistency, we have a queue of updates. So we make sure there's only one update at any point in time. That works when you have updates only in two entities. So then at least you can do application level orchestration to ensure they might eventually get in sync, but it doesn't scale beyond that. And because you're using hashering, we could not scale our cluster to beyond a vertical limit. And that also inhibited our scale challenges. And especially if you want large cities that you want to handle, we couldn't go beyond a certain scale. So these were the key infrastructure problems that we had to like, we had to fix so that we can set ourselves up for the next decade or two. Yeah, makes sense. So when the last update wins, it may not be the most accurate update. So yeah. All right. And then George, when you and I were talking about this, you said, Dave, you know, it might not just be scale. It was sort of Uber thinking about the future, but elaborate on that, George. So Ude, what I wanted to know was like, you guys had to think about a platform more broadly than just like drivers and riders, cause you had new verticals, new businesses that you wanted to support. And, you know, while the application layer manages things, the database generally manages strings, but the new capabilities in the database allowed you as you were describing to think of like consistency differently and latency. But can you talk about also how you generalize the platform to support new businesses? Yeah. So that's a great question. You know, like one of the things we had to make sure was as the kind of entities change within our system as we have to build new fulfillment flows, we need to build a modular and leverageable system at the application layer. At the end of the day, we want the engineers building core applications and core fulfillment flows abstracted away from all of the underlying complexities around infrastructure, scale, provisioning, latency, consistency, like they should get all of this for free and they don't need to think about it. When they build something, they get the right experience out of the box. So what we had to do was at our programming layer, we had a modular architecture where every entity, like let's say there is an order, there is an order representation, there's a merchant, there's a user or an organization representation. And we can store these objects as individual tables and we can store the relationships between them as another table that stores the relationships between these objects. So whenever new objects get into the system and whenever we need to introduce new relationships, they are stored transactionally within our system. We use the core database as, you can think of it as a transactional key value store. The database layer, we still only store the key columns that we need and rest of the data is told as a serialized block so that we don't have to continuously update the database anytime we add new attributes for a merchant or for a user. We want to make, reduce that operational overhead but at the high level, every object is a table and then every relationship is a row in another table and then whenever new objects or relationships get introduced, they are transactionally committed. Dave, I just want to add that what's interesting is he just described an implementation of a semantic layer in the database. Right, right, we've been talking about this for months, George, and the importance of it. And I want to come back to that. Let's help the audience understand at a high level today the critical aspects and principles of the new architecture. What we're showing here is a chart from Google engineering in one of the blogs and we want to understand how your approach, again, differs from your previous architecture. So, and you touched on some of that. So the way we understand this is the green is the application layer which is sort of intermixed. The left hand side shows that and on the right hand side, you've separated the application services at the top from the data management below and that's where Spanner comes in. So how should we understand this new architecture in terms of how it's different than the previous architecture? Yeah, so in the previous architecture, we went through some of the details, right? Like the core data was stored in Cassandra and because we want to have low latency reads, so we had a Redis cache as a backup whenever Cassandra fails or whenever we want some low latency reads and we went through RingPop, which is the application layer chart management so that the request get routed to the instance we need. And there was one pattern I didn't mention which was on Saga pattern which was a paper from a few decades ago. Ultimately, there was a point in time where the kind of transactions that we had to build, it evolved from just two objects. Like imagine a case of we want to have a concept of a batch offer, which means a single driver should accept multiple trips at the same time or not. Now, you don't have now one each to one association, you have a single driver. I have maybe two trips, four trips, five trips and you have some other object that is establishing this association. Now, if we need to now create a transaction across all of these objects, we tried using Saga as a pattern extending our application layer transaction coordination but again, it became even more complex because if things go wrong we have to also write compensating actions. So that system is always in a state where they can proceed. We don't want users to get stuck and then not get new trips. So in the new architecture, like the key foundations we mentioned one was around strong consistency and linear scalability. So the new SQL kind of databases provide that and we went through exhaustive evaluation in 2018 across multiple choices we had and at that point in time, we picked Spanner as the option. And so we get, we move all of the transaction coordination and scalability concerns at the database layer and at the application layer, we focus on building the right programming model for building new fulfillment flows. And the core transactional data is stored in Spanner. We limit the number of RPCs that we go from our on-prem data centers to Google Cloud because it's a latency sensitive operation and we don't want to have a lot of chatter between these two worlds. And we have an on-prem cache which will still provide you point in time snapshot reads across multiple entities so that they're consistent with each other. So for most use cases they can read from the cache and Spanner is only used if I want strong reads for a particular object. And if I want cache reads across multiple objects I go to my cache. If I want to search across multiple objects then we have our own search system which is indexed on specific properties that we need so that if I want to get all of the nearby orders that are currently not assigned to anyone, we can do that low latency search at scale. And obviously we also emit Kafka events within Uber stack. So then we can build all sorts of near real time or OLAP applications and then it's also go show raw tables. Then you can build more derived tables using Spark jobs and all of those things are happening within Uber's infrastructure. And we use Spanner for strong reads and core transactions that we want to commit across all of the entities and establishing those relationships that I mentioned. So George coming back to the sort of premise this is how you've taken Uday, these business entities, drivers, riders, routes, ETAs, orders and you've reconciled the trade-offs between latency, availability and consistency. Would it be fair to say Uday that because you did such a good job matching between the things in the application and the things in the database that you were able to inherit that transactional strengths of the database at both layers at the database level and to simplify that coordination at the application level and that you also did something that people talk about but don't do much, which is a deep hybrid architecture where you had part of the application on Prim and part using a Google service that you couldn't get elsewhere, often Google Cloud. Yeah, absolutely. And then I think one more interesting fact is how for most engineers, they don't even need to understand behind the scenes it's being powered by Spanner or any database. The guarantees that we provide to more application developers who are building fulfillment flows is they have a set of entities and they say, hey, for this user action these are the entities that need to be transactionally consistent and these are the updates I want to make to them. And then behind the scenes are application layer, leverages, Spanners, transaction, buffering make updates to each and every entity and then once all the updates are made, we commit. So then all the updates are reflected in the storage so that the next strong read will see the latest update. So the database decision obviously was very important. We're curious, what was it about Spanner that led you to that choice? It's globally consistent, it's a globally consistent database. What about it made it easier for all the applications, data elements to share their status? How did you, you said you did a detailed evaluation. How did you land on Spanner? Yeah, like any kind of choice requires a lot, there's a lot of dimensions that we evaluate one is we wanted to build using a new SQL database because we want to have the mix of, asset guarantees that SQL systems provide and horizontal scalability that no SQL kind of systems provide and new SQL and building large scale applications using new SQL databases like at least around the time when we started that was still we didn't have that many examples to choose from even within Uber, we were kind of the first application for managing live orders using a new SQL based system. But the specific properties that in some sense we need are external consistency, like I kind of mentioned, which is Spanner provides the strictest concurrency control guarantee for transactions so that when the transactions are committed in a certain order, any specific read after that, they see the latest data because that is very important because imagine we assigned a particular job to a specific driver or courier and then next moment, if we see that, oh, this driver is not assigned to anyone, we might make wrong business decision and then assign you one more trip and that will lead to wrong outcomes and then horizontal scalability because Spanner automatically shards and then it will rebalance the shards and so then we have this horizontal scalability. In fact, we have our own auto scaler that listens to our load and Spanner signals and constantly adds new nodes and remove nodes because the traffic pattern Uber has changes based on time of the day and then hour of the day and then also day of the week, it's very curvy. So then we can make sure we have the right number of nodes that are provisioned to handle the scale at that point in time. I've kind of mentioned the server side transaction buffering that was very important for us so that we can have a modular application so that each application, each entity that I'm representing, they can commit update to that entity independently and layer above is coordinating across all of these entities and once all of these entities have updated their part, then we can commit the overall transaction. So we can, so the transaction buffering on the server side helped us at the application layer to make it modular. Then all the things around stale reads, point in time reads, bounded staleness reads, these help us build the right caching layer so that for most reads, our cache hit rate probably is like on high 60, 70. So for most reads, we can go to our on-prem cache and only for when there's a cache miss or strong reads, we can go to our storage system. So these were the key things. One, we want from new SQL and then Spanner was the one because we thought that time to market because it's already productionized and we can leverage that solution. But all of these interactions are behind an ORM layer with the guarantees that we need. So this will help us over time figure out if we need to evaluate other options or not but right now for most developers, they don't need to understand what is powering behind the scenes. Yeah, and the outcome for your customers is pretty remarkable. I mean, George and I would say we're really interested and George was sort of alluding to this before the aspects of the system that enable this coherency across all these data elements of the system that it has to manage. In other words, your ability to get agreement on the meaning of a driver, a rider, a price, et cetera and how you design and achieve that layer to enable that coherence. That is tech that you had to develop, correct? Yeah, absolutely. I think there are many objects. Also, we need to really think about what attributes of what a user sees in the app need to be coherent and what can be kind of stale but you don't necessarily notice because not everything need to have the same amount of guarantees, same amount of latency and so on, right? So if you think about some of the attributes that we manage, we talked about the concept of orders. If a consumer places any intent, that is an order within a system, a single intent might require us to decompose that intent into multiple sub-objects. Like for example, if you place an Uber Eats order with this one job for the restaurant to prepare the food and there is one job object for the courier to pick up and then drop off. And when courier drop object, like we have many waypoints, which is the pickup waypoint, drop off waypoint, each waypoint can have its own set of tasks that you need to perform. Like for example, it could be taking a signature, taking a photo, paying at the store, all sorts of tasks, right? And all of these are composable and leverageable so I can build new things using the same set of objects. And if in any kind of marketplace, we have supply and demand and we need to ensure there is a right kind of dispatching and matching paradigms. In some cases, we offer one job to one supply. In some cases, it could be image to end. In some cases, it is blast to many supplies. In some cases, they might see some other surface where these are all of the nearby jobs that you can potentially handle. So this kind of, this is another set of objects which is super real time because like when you spin a driver sees an offer card in the app, it goes away in 30 seconds and they need to, 30, 40 seconds, they need to make a decision and based on that we have to figure out the next step because within Uber's application, we have changed users' expectation of how quickly we can perform things. If you are off by a few seconds, it will start cancelling. Then Uber is hyper local. Then we have a lot of attributes around latitude, longitude, route line, driver's current location, our ETAs, these are probably like some of the hardest to get right because we constantly ingest the current driver location every four seconds. We have a lot of latitude, longitude, like this throughput of this system itself is like in hundreds of thousands of updates per second. But not every update will require us to change the ETA, like your ETA is not changing every four seconds. Your route line is not changing every four seconds. So we do some magic behind the scenes to make sure that, okay, I have you crossed city boundaries. Only then we might require you to update something. Have you crossed some product boundaries? Only then we require you to do some things. So we do that inferences to limit the number of updates that we are making to the core transactional system and then we only store the data that we need. And then there's a complete parallel system that manages the whole pipeline of, how we receive the driver side of equations and generate navigations and stuff for drivers. And then how we convert these updates and then show it on the rider app. That stream is completely decoupled from the core orders and jobs. And if you think about Uber system, it's not just about building the business platform layer, like we have a lot of our own sync infrastructure at the edge API layer because we need to make sure all of the applications data is kept in sync. They're going through choppy network conditions. They might be unreliable and we need to make sure that they get the updates as quickly as possible as with low latency, irrespective of what kind of network condition they are in. So there's a lot of engineering challenges at that layer as well. Ultimately, all of this working together to provide you the visibility that, hey, I can exactly see what's going on because if you're waiting for your driver, if they don't move, you might cancel assuming that, hey, they might not show up. And we need to make sure that those updates flow through, not just through our system, but also from our system back to the rider app as quickly as possible. So hopefully, George, you had a question. Yeah, this is something new. We're on new territory, at least as far as Dave, what we've explored before, what I'm taking away is that the, you're not just managing this layer at the application where you've got Uber's entities or things, but you're also, and translating that down to the database and the database is transactional semantics, making it sort of easier to manage and orchestrate those things. But what you're describing is something where the data is sort of liveliness is an attribute that makes managing it separate from just mapping it down to the database. You manage how it gets updated and how it gets communicated separately based on properties that are specific to each data element. And by data element, I mean property, not like a driver or a courier. And that is interesting because Dave, just as a comment, Walmart talked about prioritizing data and for communications from stores and the edge. And that may lead into a follow-on question. This is sorry for the long preamble, but the question I have today is, what happens when you are orchestrating an ecosystem with 10 or 100 times as many things as you are now and more data on all those things than you have now? Have you thought about what a world looks like where the centralized database may not be the central foundation? See, I think that's where the trade-offs come in. We need to be really careful about not putting so much data in the core system that manages these entities and these relationships and overwhelm with so much data that I think we'll probably hit some, then we'll end up hitting scale bottlenecks. For example, the fare item that you see both on the rider app or on the driver app, that item is made up of hundreds of line items with different business rules specific to different GOs, different localities, different tax items. We don't store all of that in the core object, but one attribute for fare that we can leverage is a fare only changes if the core properties of riders requirements change. So every time, like you changed your drop-off, then we regenerate the fare. So I have one fare UID, every time we regenerate, we create a new version of that fare and store these two UIDs along with the micro-order object. So that I can store in a completely different system, my fare UID, fare version and all of the data with all of the line items, all of the context that we use to generate that line items, because what we need to save transactionally is the fare version UID. When we save the order, we don't need to save all of the fare attributes along with that. So these are some design choices that we do to make sure that we limit the amount of data that we store for these entities. In some cases, we might store the data. In some cases, we might version the data and then store along with that. In some cases, if it is okay to tolerate that data and it doesn't need to be coherent with the core orders and jobs, it can be saved in a completely different online storage. And then we have at the presentation layer where we generate the UI screen, there we can enrich this data and then generate the screen that we need. So all of this will make sure that we limit the scale of growth of the core transactional system and then we leverage other systems that are more suited for the specific needs of those data attributes, but still all of them tie into the order object and then there's an association that we maintain. So this is really important. We're going to actually revisit this as a guide to the future. But so I just want to take a pause and reset here and kind of hopefully the audience understands that what Uber has built is different of course than conventional apps. We tried to sort of put this together in a slide to describe the sort of 3.0 apps if Alex you'd bring up the next one. So starting at the bottom, you have the platform resources and then the data layer to provide that single version of the truth and then the application services that govern and orchestrate the digital representations of the real world entities, drivers, riders, packages, et cetera. And that's all supports what the customer sees in the Uber app. So the big difference from the cloud stack that we all know and love is, you know, Uber's not selling us compute or storage. We don't even see that or other Uber's offering up things, access to drivers and merchants and services. And so Ude, where are the lines between sort of your thinking in commercial off the shelf software that you were able to use versus the IP that Uber had to develop itself to achieve these objectives? Can you describe sort of that thinking and what went into that build versus buy? Yeah, in general, we rely on a lot of open source technologies, commercial off the shelf software and in some cases, in-house developed solutions. Ultimately, it depends on, you know, the kind of specific use case, time to market, maybe want to optimize for cost, optimize for maintainability. All of these factors come into picture for the app, the core orders and the core fulfillment system we talked about Spanner and how we leverage that with some specific guarantees. We use Spanner for even our identity use cases where we want to manage, you know, especially in large organizations, you want to make sure your business rules, your AD groups, your stuff and how we capture that for our consumers that has to be in sync. But there is a lot of other services across microservices across Uber that leverage Cassandra if their use case is high right throughput and we leverage Redis for all kinds of caching needs. We leverage HCD and ZooKeeper for low-level infrastructure platform storage needs. And we also have a system that is built on top of MySQL with a RAV-based algorithm called Dockstore. So for majority of the use cases, that is our go-to solution where it provides you shard local transactions and it's a multi-model database. So it's useful for most kind of use cases and it's optimized for cost because we manage the stateful layer, we manage and we deploy it on our nodes. So for most applications, that will give us the balance of cost and efficiency and for applications that need the strongest level of requirements where like fulfillment or identity where we use Spanner, for high-right throughput we use Cassandra. And beyond this, when I could think about our metric system, M3DB, it's an open-source software, open-source by Uber, contributed to the community few years ago. It's a time-series database, like we ingest millions of metric data points per second and we had to build something on our own and now it's an active community and there's a bunch of other companies leveraging M3DB for metric storage. So ultimately it's, in some cases we might have built something and open-sourced it, in some cases we leverage off the shelf, in some cases we use completely open-source and like I know contribute some new features. For example, for data lake, Uber, Pioneer, Pashihode, back in 2016 and contributed. So then we have one of the largest transactional data lake with maybe 200 plus petabytes of data that we manage. Got it. Okay, this next snippet that we're going to share comes from an ETR roundtable which is our data partner and they do these private roundtables. We'll pull it up and I'll read the quote from a pretty famous technical guru who's going to remain unnamed only because I'm not sure I have permission to name this individual but he says everybody in the world is thinking about real-time data and whether it's Kafka specifically or something that looks like Kafka, real-time stream processing is fundamental. When people talk about data-driven businesses, they very quickly come to the realization that they need real-time because that's where there's more value. Architectures built for batch don't do real-time well. Person mentioned Cockroach says it's super exciting. I feel weird endorsing a small startup, he said but Google Spanner is amazing and Cockroach is the closest thing that you could actually buy off the shelf and run yourself rather than be married to a managed service from a single cloud vendor. So Uday, a couple of questions here. I'm curious as to how you changed the engine in mid-flight going from the previous architecture, pre-2014 and post and that George mentioned what happens when real-time overwhelms the centralized database's ability to manage all this data in real-time and it sounds like you architected at least quite a runway to avoid that but talk about two questions there. How'd you change the engine in mid-flight and when do you see it running out of gas? Yeah, the first question. Now, one of the things I think is there's designing a new Greenfield system is one thing but moving from whatever you have to that Greenfield system is 10x harder and the complex, the hardest engineering challenges that we had to solve was for how we go from A to B without impacting any user. We don't have the luxury to do a downtime where, hey, we're gonna shut off Uber for an hour and then let's do this migration behind the scenes. And then we went through the previous system was using Cassandra with some in-memory queue and then the new system is strongly consistent. How do you go from the core database guarantees are different, the application APIs are different. So what we had to build was a proxy layer that for any user request, we have backward compatibility. So then we shadow what is going to the old system and new system, but then because the properties of what transaction gets committed in old and new are also different, it's extremely hard to even shadow and get the right metrics for us to get the confidence. But ultimately, so that is the shadowing part. And then what we do is what we did was we tagged a particular driver and a particular order that gets created, whether it's created in the old system or new system. And then we kind of gradually migrate all of the drivers and orders from old to new. So there would be at a point in time, you might be seeing that marketplaces kind of split where half of those orders and earners are in the old, half of them are in the new and then once all of the orders are moved, we switch over the state of remaining earners from old to new. So one, we had to do a lot of unique challenges on shadowing and two, we had to do a lot of unique tricks to make sure that we give the perception of there is no downtime and then move that state without losing any context, without losing any jobs in flight and so on. And then if there is a driver who's currently completing a trip in the old stack, we let that complete and the moment they are done with that trip, we switch them to the new stack so that their state is not transferred midway through a trip. So then once you create new trips and new earners through new and then switch them after completing the trip, we have a safe point to migrate. This is similar to like 10 years ago, I was at VMware and we used to work on how do you do vMotion, like virtual machine migration, one most other host, this was kind of like that kind of challenge, what is the point at which you can move the state without having any application impact. So those are kind of the tricks that we have to do. And the second question, and how do we make sure we don't run out of gas? We kind of went through that, like one, obviously we are doing our own scale testing, our own projected testing to make sure that we are constantly ahead of our growth and make sure the system can scale. And then we are also very diligent about looking at the properties of the data, choosing the right technology so that we limit the amount of data that we store for that system and then use specific kind of systems that are catered to those use cases. Like for example, like all of the matching system, if it wants to query all of the nearby jobs and nearby supplies, we don't go to the transactional system to query that. We have our own inbuilt search platform where we are doing real-time ingestion of all of this data using CDC and then we have all kinds of rankers so that we can do real-time on the fly generation of all of the jobs because the more context you have, the better marketplace optimization we can make. And that can give you the kind of efficiency at scale, otherwise we'll make imperfect decisions which will hurt the overall marketplace efficiency. Yeah, and in your blog post, you had said that you had to build this architecture to support your business for the next decade. So if I'm inferring, you don't see any, at least in the near term, all these data elements and all this real-time data overwhelming the system because of the way you've architected it. Is that a fair assertion? Yeah, yeah, absolutely. I think you're confident at least for the foreseeable future. What we have is a stable foundation and since then, you could see the kind of new use cases that we are building, right? Like, Uber reserve, now you can reserve 30 days in advance. Now we are entered into grocery. We are doing where a courier is going and then shopping for you. We are doing, recently you might have said announcements on party city, on pet smart. So we want to make sure that we can go anywhere and get anything. We can unbundle every use case that you need a car for and then provide a affordable, scalable transportation solution so that we can handle all of your mobility needs on demand at scale at your fingertips. And then we can capture every single merchant in the world and then capture it in a system. Every single catalog, every single item manage relationships across all of them. We have millions and millions of catalog items around the world. And then so that you can go and get anything that you need. Whether it is a food, whether it's alcohol, whether it is some party item, whether it's some pet food, whether it's convenience, whether it's pharmacy, everything is handled. So that is, so we at least right now, at least I'm confident that we can scale to those needs and then we have the system that can scale to that needs. Right. You know, last question is George and I have been sort of looking to the future, using Uber as an example of the future. So what do you see coming or what do you hope to see if you think about just a broader industry with respect to commercial tools over the next three to five years that might make it dramatically easier for a mainstream company that doesn't necessarily have Uber's technical bench and depth to build this type of application. In particular, how might other companies that need to manage hundreds of thousands of digital twins design their applications using more off the shelf technology? Do you expect that will be possible and let's call it the midterm future? Yeah. You know, I think the whole landscape around developer tools, applications, it's a rapid evolving space. You know, what was possible now was not possible five years ago and like it's constantly changing. But what we see is, you know, we need to provide value at upper layers of the stack, right? And then wherever if there is some solution that can provide something off the shelf, we move to that so then we can focus up the layer. Like it's not just building, taking off the shelf, IIS or past solutions, just taking the sheer complexity of representing configuration, representing the geo-diversity around the world and then building something that can work for any use case in any country during to those specific local rules. That is what I see is like the core strength of Uber. Like we can manage any kind of payment disbursements or payments in the world. We have the largest support for many payment, like any payment method around the world. For earners, we are disbursing like billions of payouts to whatever bank account and whatever payment method they need money in. We have a risk system that can handle nuance use case around risk and fraud. Our system around fulfillment that's managing this. Our system around maps that is managing all of the ground truth, toll, surcharges, navigation, all of that. So we have probably one of the largest global map stack where we manage our own navigation and leveraging some data from external providers. So this is like the core IP and core business strength of Uber and that is what is allowing us to do many verticals. But again, the systems that I can use to build this that over time, absolutely I see, it makes it easier for many companies to leverage this. 15 years ago, we didn't have Spanner. So it was much harder to build this now with Spanner or with similar new SQL, other of the shelf databases. It solves one part of the challenge but then now we need to think about like the other layer of the challenge. I'm so excited that you were able to come on George because Ude was able to come on because George, you and I have been talking about this as the future and I think Ude just solidified it. But I think with George, we set a new record for breaking analysis in terms of time. But George, what are your takeaways, anything last words that you would have to add before we break? I think the takeaways are, I think this is one of those applications that people will look back on many years from now and say that really was the foundation for a new way of doing business, not just to building software but of doing business. Like Amazon was the first one to manage their own internal processes, where they're orchestrating the people, places and things with an internal platform but you guys did it for an external ecosystem and made it accessible to consumers in real time. And I think the biggest question I have, and it's not really one that you can answer but it's one that we'll have to see the industry answer is to what extent the industry will make technology, make it possible for mainstream companies to start building their own Uber platforms to manage their own ecosystems. That's my takeaway and my question. Yeah, so, okay, we're going to leave it there. Uday, thanks so much. I really appreciate your time and your insights and love to have you back. Yeah, absolutely. Anytime, bring me up, I'll be there. Anytime. Thanks so much. It was a pleasure talking to both of you today and on being on making analysis soon. Fantastic. On behalf of George Gilbert, I want to thank Uday and his team for these amazing insights on the past, present and future of data-driven apps. Well, I also thank Alex Meyerson who's on production and manages the podcast, Ken Schiffman as well. Kristen Martin and Cheryl Knight helped get the word out on social media and in our newsletters, and Rob Hough is our editor-in-chief over siliconangle.com. Thank you so much, everybody. Remember, all these episodes are available as podcasts. All you got to do is search breaking analysis, podcasts, popping the headphones, go for a long walk on this one. I publish each week on wikibon.com and siliconangle.com. You can email me directly at david.volante at siliconangle.com at dvolante to DM me or comment on our LinkedIn posts and check out etr.ai. They got great survey data on enterprise tech. This is Dave Vellante for theCUBE Insights powered by ETR. Thanks for watching and we'll see you next time on breaking analysis.