 Good morning. Thank you, the Linux Foundation, for inviting me back to speak here today. And many thanks also to my colleagues in the Uber infrastructure team who helped me with this presentation. A special shout out to the network and software platform teams. I'm just here to talk about some of the great work that our teams are doing to build and manage a software-defined infrastructure at Uber scale. Today, I will first say a few words about who we are, and we'll also share some numbers to highlight Uber's growth and scale. I will then talk about a vision and some of our architectural principles for software-defined infrastructure. I will follow on by giving a few specific real-world examples of how we leverage software for automation and the infrastructure. And I will conclude by providing an end-to-end view of our global network. Who is Uber? Uber started in 2010 to solve a simple problem. How do you get a ride at the push of a button? A few years later, Uber was already operating in hundreds of cities around the world. By late 2015, we completed our 1 billionth trip. And in about six months later, we completed the next billion. And less than a year later, Uber had logged over 5 billion trips. In 2017 alone, we completed 4 billion trips. And as of now, about 15 million trips happen each day in over 600 cities and in 78 countries. Users of our platform today include over 75 million monthly active riders, as well as over 3 million global, 3 million active drivers globally. So while these numbers speak to the already significant scale of Uber, in terms of the overarching opportunity, today, ride-sharing accounts for only a low single digit percentage of total miles driven globally. In addition to our ride-sharing business, we have also been introducing new services that will continue to drive the growth of Uber. These and possibly new services in the future will also continue to set new requirements in our infrastructure in terms of scale, performance, resiliency, agility, and unit cost. So how do we make sure that our infrastructure can keep pace with such a rapid growth? The magic of the Uber app today is powered by a highly distributed software architecture that relies on a fault tolerant and highly available infrastructure. For us, the only way to be able to deliver the required levels of performance and availability is through software and automation. And in order to fully achieve the benefits of software-based automation, we must and always strive to use open standards-based technologies and avoid dependency on any single vendor across the entire infrastructure stack. This framework summarizes much of the work that we do on software-defined infrastructure. For us, a key enabler is to first build real-time or near-real-time visibility into the infrastructure state and then leveraging that information possibly augmented with additional insights from analytics and machine learning, we can then push the desired state of the infrastructure through programmatic interfaces. Now, let me spend a few minutes to put this into context and provide specific real-world examples of infrastructure automation. We use software throughout the entire infrastructure lifecycle at Uber to automate many things, from delivering forecasting models to doing capacity planning to provisioning infrastructure to managing all the changes that we perform to detecting incidents and providing alerting to mitigating and remediating things when things fail. In the interest of time, I will cover only four specific areas over the next few slides. For provisioning across our server and network environments, we leverage a number of homegrown software platforms to automate and orchestrate the entire provisioning process. This includes capabilities such as auto-discovery, where, for example, on the network side, we push intelligence down to the devices to enable a truly distributed self-discovery model and enable zero-touch provisioning. This also includes auto-validation of the state of the hardware, for example, to prevent bad devices from going into production. For automated change management, we also leverage a homegrown platform that serves as the source of truth for the desire and the state of the infrastructure. This platform also enables our engineers to define the infrastructure state in a vendor-agnostic way through the configuration data. And it's the data that drives the device configuration generations requiring no direct human interaction with servers or network devices. This platform also provides infrastructure as code capabilities so that changes are versioned with rollbacks, can be audited, reviewed, and ported across platforms, for example, between on-prem and cloud if needed. For auto-detection, we leverage a distributed and highly available platform. On the network side, we do both active and passive monitoring, leveraging streaming telemetry that gives us near real-time visibility into the state of the network, including our network reachability, network latency, packet losses, and link utilization. One area that we have found to be a little bit challenging in this space is the current vendor support for standard-based Yang models, and also lack of full support for OpenConfig. Auto-mitigation and auto-remediation are also areas where we heavily leverage software to improve our operational efficiencies. So when hardware fails, not only we have to ensure that the issue is mitigated quickly before it becomes a service-impacting incident, we also automate the back-end workflows to automatically generate troubleshooting and or RMA tickets. If needed, we can also do auto-diagnostic tests, auto-remediation tests, and perform failure prediction functions, for example, by monitoring specific metrics or by running specific playbooks. So we do a lot more on software automation than what these four slides covered. I will not have time to cover all of those, unfortunately. But one other aspect that I want to talk about is the fact that when you rely on software to automate your infrastructure, you have to be sure that that software is thoroughly tested. And our approach to doing that is to create a test environment that can not only provide the capabilities needed to do the traditional software test cycles, such as feature testing, regression testing, integration testing, but also enable us to deploy and use the tested software to provision, monitor, and configure the test environment itself. So I will now talk for a few minutes about the global network starting with the data center. So for us, network is a foundational layer of infrastructure and a key enabler for our business. As such, network resiliency with the focus on deterministic failure behavior is one of our top design principles. Operational efficiency is also a key objective, meaning that the network has to be simple to build and also be flexible and cost effective. This is an example data center design, which in this case is based on a six-plane multi-stage cost apology. Starting from the bottom, rack switches for server aggregation, pod switches to connect the racks, fabric switches for connectivity between pods, and then external fabric switches for connectivity toward a data center aggregation layer, and then an edge pod for connectivity toward when and redundant pops. This design is quite flexible in that its specific implementation can be fine-tuned and adjusted across many dimensions. For example, depending on requirements on oversubscription, size of failure domain, bandwidth capacity, or power, or the need to scale across multiple fabrics. Now, here are some examples of how this design could be implemented in different ways. In the simplest scenario, you can imagine one campus with one building or data center and one fabric connecting to the external world. Or you can have multiple buildings or data centers on the same campus and multiple campuses within a metro area. In this example, one edge domain is shared between two fabrics in two different buildings. And similarly, you can have additional fabrics deployed across different buildings and different campuses connected through a data center aggregation layer with resilience, redundant, low-latency fiber. With this design, you can now have availability zones where each zone is an independent failure domain, which means that availability zones do not share a common physical failure domain, which could be a network hardware and the edge pod or racks or power or cooling. And once you have availability zones, you can combine zones that are in nearby physical locations, typically a few milliseconds apart, into regions. And from an availability perspective, these regions can now effectively function similarly to cloud provider's regions. And your software developers and your software teams can now deploy these availability zones seamlessly between on-prem and on-cloud environments. Our global network includes not only the data center network in POPs, but also the metro as well as backbone networks that connect POPs and data centers together. We also have an access network for our R&D sites. Across each of these domains, we are working on a technology roadmap and look into next-gen technologies and would love to work with this community on solving some of these big challenges together. On the access side, we are looking into SD-WAN and next-gen capabilities to support full line rate crypto at 100 gig at the right price point. In the metro, we are leveraging DWDM for more flexible and scalable connectivity at the lower unit cost. On the backbone side, and generally in the WAN space, we are moving away from static and long-term contract models toward a more flexible approach, preferably SDN-controlled, on-demand spectrum as a service. We're also exploring ideas and future models where regional and long-haul bandwidth could be more on-demand and usage-based, like cloud services, where carriers would serve as spectrum brokers. One key enabler here will be the upgrade adoption of flex grid open-line systems by carriers. Finally, on the DC side, on the data center side, in addition to the software-defined capabilities that I mentioned, we are also looking into server ODM and a modular rack design to support multiple server types, for example, across compute, storage, and AI and machine learning with GPU and FPGA. We're also looking at network disaggregation in the data center, which, as my last topic, I'm going to cover over the next two, three slides. So with merchant silicon taking over networking, there is a great opportunity, especially in the data center space, to look into the disaggregated model to separate network hardware and network software. And with the great work that is happening in the industry, this model can not only bring significant cost savings, it can also enable a much faster pace of innovation, especially if you're considering building your own network OS versus leveraging a third-party NAS. This would also allow you to quickly develop features in ways not currently possible with the OEM model. There are certain considerations that have to be taken into account if you're considering adopting a third-party NAS or building your own NAS. First and foremost, you have to ensure that you get feature parity with this NAS solution. You have to think about the hardware abstraction, architecture, and design that works best for you. Should it be kernel agnostic where introducing changes and adding support for new hardware can probably be less risky? You have to think about the code support model that works best for you. Should it be user-managed versus vendor-managed? And who should do the feature enhancements, the patches, the vulnerability management work? And also, you have to consider and ensure that the software has reached the level of maturity that can now be put in your production environment. And last but probably most importantly, you have to make sure that this new solution is fully ready for operational deployment. With that, thank you very much for your time. I'm now happy to take questions. Justin, glad you were able to come back again this year as Uber rather than Visa. I was fascinated last year that you developed the plan for SD-WAN as a critical part of the Visa network structure. And now I see you considering SD-WAN again for the access for Uber. Why? Can you give us more details? What kind of options are you considering? And can you talk about vendors that you're considering? What are your decision criteria for choosing an SD-WAN approach? Thank you, Michael. Great to see you again. Thank you for the great question. Yes, SD-WAN really, the use case remains the same. How do you leverage software to manage remote sites in an automated way? And how do you make sure that config changes or bringing up these devices can be managed through software? So in this case, at Uber, we have some of our R&D sites that generate a lot of traffic. And some of the SD-WAN solutions that we have looked at cannot easily support 100 gig, if at all, and enable full line crypto. And the current pricing models and cost structure for some of the SD-WAN solutions are not where we want them to be at this point. So we are working with the community and vendors to make that happen. And the use case, as I mentioned, remains the same. At Uber, we have a number of remote sites. And our teams are trying to find a way to be able to bring these new sites quickly and manage all the initial configurations through software and push the subsequent changes through software and manage the whole environment remotely in a centralized way through software. Thank you very much for your question. Yeah, one more question. This is more along the lines of scalability of the applications. What have you learned as you've ramped up quickly on microservices ability to scale on performance and those kinds of things? I'm sorry, can you repeat the last sentence, please? Yeah, what have you learned on the ability on the app side for microservices to scale in both manageability and performance as you've ramped up? That's a great question. So at Uber, as I mentioned, we have a highly distributed software architecture that actually is based on a very large number of microservices. And what we have learned from the network and infrastructure side is that, as I mentioned, these microservices rely on a fault tolerant and highly available infrastructure. And a lot of the traffic that is generated in the data center is because of a lot of the kind of traffic exchange between these microservices. From our perspective, from the network and software platform perspective, we need to ensure that those requirements are well understood and that the infrastructure can actually support the growth of these microservices across different data centers but also on cloud. So that's kind of my quick answer to your question. Happy to discuss offline after my talk. Please? Hi, Cliff Grossner, IHS Market. Good to meet you. We just finished a study for the Open Compute Foundation. And I was wondering if you could give a little bit more color on some of the parameters and some of the considerations around the ODMs you're looking at for servers, some of the decision-making criteria, and also for Whitebox. For example, are you looking at an Open Compute design or certified design that you might modify for yourself? That's a great question. On the Whitebox, I can tell you that right now we are looking at ways to first enable a disaggregated model. So the roadmap would be from OEM to a network Whitebox solution based on third-party NAS and commodity hardware. And then considering building an Uber NAS possibly depending on the use case and other factors. And then you can also imagine that the next and ultimate stage would be to do ODM on the network device, similarly to the work that is happening on the server side. I hope I answered your question. Thank you. Great. With that, thank you very much again and thank you very much, Arpit. Thank you.