 So, we just heard AT&T Direct TV talk about containers and how they were seeing more usage of Kubernetes and, you know, that's certainly something that we see in the user survey stats that we showed earlier and really all across the deployments for OpenStack. And it is interesting to see how, as time moves along, the unit of consumption around Compute keeps getting smaller and smaller from full servers to virtual machines to containers and even on to functions as a service or serverless, however you want to want to phrase it. But, you know, what we see with OpenStack users who are combining these technologies is all of those things, they still need hardware that they run on and that hardware needs to be dynamic and programmable. And up next, we're going to get to hear from eBay who runs one of the largest production OpenStack environments in the world and also one of the largest production Kubernetes environments of the world. And they are going to be talking about how they combine these technologies and how they manage Kubernetes at scale. So, help me welcome from eBay, Sunit and Uday. Good morning. So, I'm here to talk about eBay and how we're running Kubernetes at scale. First, a few obligatory slides on eBay or just one, actually. eBay probably doesn't require any introduction for most of you, one of the largest online market places in the world. But what you may not know is, you know, part of the eBay family is also the classified group and stop up. Yeah, that great site where you can score great event tickets, great deals on that. You know, eBay has a pretty large business bar, $2.2 billion in quarterly revenue. And at any point of time, we have about a billion listings live on the site and 169 million global users. So, fairly significant. And as you can imagine to run a significant online business, you need a pretty large infrastructure. And so, about in 2012, we started a transformation with an eBay and we went to an internal cloud. And what we did was we basically doubled down an OpenStack at that time. And fast forward to 2017, this is what our cloud looks like today. The OpenStack Foundation folks tell me it's one of the largest ones out there. And I would admit that it's not been without bumps and bruises. Getting to these skills has, you know, we have a lot of learnings and it wasn't easy for us to get to this skill. The good news is that with all the investment we've made to get OpenStack to run at this skill, we're seeing a lot of benefits in the business. It's been fairly disruptive for eBay. It's a multi-tenant cloud. We run all of our DAV QA production environments on it. 95% of all eBay traffic when you go to ebay.com is on our internal OpenStack based cloud. And we also run all of our mid-tier services on it. And 4,000 applications manages a bunch of storage. So it's a fairly significant cloud. But it wasn't enough. So, you know, these developers, they are pretty demanding a lot. And our developers are no different. And they are looking for more and more. And we were fairly infrastructure centric. And to respond to our developers, we started looking at Kubernetes in 2015. So there's a lot to like about Kubernetes. These are some of the things which were very interesting to us. One was AppCentric. So OpenStack with all its goodness is still at very infrastructure centric view. It's an IES offering after all. We love the fact that Kubernetes is AppCentric. It's declarative. Resilience was a big deal for our applications. We didn't have that much resilience. Open source, we love open source. We had a great experience with OpenStack. We contributed. We benefited. It's really important to eBay that any major platforms we build should be built around open source. And then, of course, Docker container support. Our developers wanting to run microservices was fairly significant. And recently Kubernetes also introduced Geofederation, which is awesome, because as a large enterprise, we run everything multi-data center. And this way, with federation capabilities, we'll be able to do that too. This is what Kubernetes looks like at eBay today. So this is our installation. Now, this may be significant for some companies. But given the size that we are at, I would say we are still at the very beginning of the journey. I don't think I can say that it's widely deployed, but we are in a very fast on-ramp. One of the things that we see is that our customers are asking for Kubernetes, not just on VMs, but also on physical servers, on bare metal, on GPUs, as well as on VMs. And our Kubernetes is actually powered by OpenStack right now. And we've responded to our customers, and we are developing Kubernetes to be to support all of those platforms. Just a few sample workloads for you, folks, on what we are running on Kubernetes. So eBay is investing a lot in building our own AI platform. We're building an ad services stack, internal platform automation, Elasticse are distributed. So you see that there's a variety of workloads running on Kubernetes. And it's great because we are being able to run both stateful as well as stateless workloads. And one of the big issues, as I mentioned before, that we struggled with was resilience of applications. And especially making stateful applications resilient is hard. So we are investing a lot in making these platforms run on Kubernetes. And as Kubernetes matures at eBay, we are building these platforms in conjunction. So as I mentioned, getting OpenStack from where it was to getting it to run at eBay scale was quite non-trivial. We had a lot of bumps and bruises along the way. And what it takes to take something like Kubernetes and run it at our scale is also non-trivial. So these are examples of some of the things we have to do. You have to take Kubernetes. You have to integrate it with OpenStack. You want to make sure that it meets our security standards. You need a whole ecosystem of services around Kubernetes. You need global registry, service registry. We need to make sure our applications are secure. We want to be able to run Kubernetes in a highly available manner. And everything that we run on Kubernetes, including Kubernetes itself, has to be in our CMDV model. It needs to be multi-tenant. We are running Kubernetes in a way that we are having multiple tenants on the same cluster. Logging and monitoring integration and so on and so forth. So there is a lot that goes at a larger enterprise to take it from software to running it at scale. And this time we did something differently. So when we built OpenStack, we built a lot of point chills to handle capacity management, monitoring, all kinds of things. And what we learned was that while they worked, it was very hard to make it work seamlessly in an integrated manner. With Kubernetes, we took a different approach. We took all of our learnings from OpenStack and said, we're going to look at this differently. And we did. So I want to introduce today TestMaster. And what TestMaster is, we call our internal distribution test.io. TestMaster is a tool for managing full lifecycle of Kubernetes across multiple providers. And a little bit more color on that. So we took a lot of the principles of Kubernetes itself and used them to develop TestMaster. It's model-driven. It's declarative. It's built on the same principles as Kubernetes. Most importantly, for those of you who do distributed systems, you know drift is a huge, huge problem. You can say, this is my desired state. And what you actually see in your environment is completely different. We designed it to be drift-proof. So when we say, my Kubernetes cluster needs to have 15 nodes, TestMaster makes sure that it's self-healing as well. If any nodes goes down, it'll actually reprovision those nodes. So TestMaster is using a lot of the things which Kubernetes gives its application. But this time, TestMaster is running on Kubernetes to manage Kubernetes. I'm really excited that I have Uday, who is a lead developer on TestMaster. And he'll be doing a live demo from the eBay data centers for you. Over to you, Uday. Sure. Thank you, Sinead. So let's get started. So today, I'm going to show you a live demo of how we use TestMaster to manage Kubernetes clusters deployed across different availability zones and geographical regions across the globe. So eBay as a global e-commerce website has to serve consumers globally across the world. So over here, you can see that TestMaster has a view into all the different geographical regions as well as the availability zones that we have our data centers in. A few of these availability zones can be in a public cloud as well as in a private cloud. So over here, you can see it also has a view into all the different Kubernetes clusters and possibly other clusters that TestMaster has created and is currently managing. So as an operator, today, I'm going to take you through a flow where I add some capacity into a cluster, which is running short on resources. So if you scroll down this page, you can actually see the different utilizations of the resources in the cluster. And let's pick a cluster which is running hot. So this cluster, for example, is currently 96% percent used on CPUs. So let me log into this cluster and add some capacity to it. So this is a more detailed and a more interesting view of what the cluster is and what are the different compute nodes that make up the cluster. It also shows you the cluster endpoint and the version of Kubernetes that we are running on this cluster currently. You see two different things here. One is a compute node. A compute node is a representation of a physical or a virtual machine that is running an operating system and is sitting in some L2 domain. You can also see another thing called node pool. As an operator, by defining a node pool, I'm telling Testmaster that I want to make sure that there are n number of compute node replicas that are always up and running and alive in the cluster. For people who know Kubernetes, a node pool to compute node is analogous to what a replica set or a replication controller to pod-z's. So a cluster can be made up of one or more node pools offering different operating systems and flavors of compute nodes. For example, in this cluster, you can see there are three different node pools, one running CentOS, another one running Fedora, and another one offering GPUs as resources for our machine learning folks. So machine learning is probably the most biggest use case for us right now. So let me try to show you how we can add some capacity to the GPU node pool that we have. So all you do is edit the node pool specification that is out there and change its state from four compute nodes that you've asked previously to something bigger. So let's make it 10. And when you hit Save, the node pool object is updated. So Testmaster has a reconciliation loop on the back end which takes a look at what the current state of the cluster is and moves it towards a desired state. So the previous state that we have asked for was four compute nodes. And now we have updated the state to be 10 compute nodes. So Testmaster reconciles the state to provision the remaining six compute nodes. So that's pretty much it. So that is how you add some capacity as easily as that. So they have a question for you. What if one of or more of your nodes in the cluster fail? How does Testmaster handle that? OK, so Testmaster is designed to be state aware and can actually remediate the compute nodes without you having to do anything as an operator. So let me quickly show you a quick demo of how we can do it. So what I'm going to do is I'm going to log into the Horizon dashboard to inject a synthetic failure onto one of our compute nodes to see how Testmaster would remediate it. So I'm going to take a look at this guy and it pause the instance where I'm pausing the instance which would bring down the instance for a little while. And let's keep an eye out on this guy. OK, so it's still happening. So you'd see in a few seconds, theoretically, that Testmaster would actually detect that there is a failure with the compute node, and it would start a remediation flow. There you go. It's almost instant. In the next cycle, you would see that Testmaster would replace this compute node with a new node in no time. For the purpose of the demo, I've actually reduced the racing period of this controller to actually make it very instant. So there you go. The compute nodes have been added into capacity into the cluster, and you almost had nothing to do with the failed compute node. That's pretty hands off. And on top of it, Testmaster can also do additional cluster operations at a cluster level, like managing operating system upgrades across a fleet, which is a pretty big problem, as we all know. And it can also do Kubernetes deployments in a very strategic way, in a rolling update way, where you don't do everything at once and decreasing your risk of upgrades. And a lot more things. Very cool. Thank you. Thanks, Odey. Thank you. Can we flip back to the presentation? So Testmaster, we've implemented, we designed Testmaster to be a multi-provider. We implemented an open stack provider, and we also have a virtual box provider. But it's very exciting that it's pluggable that you could theoretically extend it to even a public cloud provider and manage Kubernetes on a public cloud. In closing, I definitely am very excited to announce that this is not a unique problem for eBay, and we are going to be open sourcing Testmaster in the next couple of months. So this is something that a lot of people have approached us, a lot of companies have approached us. It's a common theme in the industry. We really hope that you would join us in making Testmaster something that the whole community can benefit from. We have a Testmaster session today. Sorry, on Tuesday at 11.45. So Odey and other office teammates will be hosting that. Please do join in. I would love for the community to get a lot more involved and take Testmaster to the next level. Thank you very much.