 Our next presenter is my fellow co-chair, Aparna Subramanian. Aparna is the Director of Production Engineering at Shopify, and today she's going to answer the question we all want to know. Is Kubernetes delivering on its promise? Please welcome Aparna Subramanian. Hello again. My name is Aparna Subramanian. I'm a long-time CubeCon CloudNativeCon attendee, and this is my first time being a co-chair. I'm truly excited to be serving in this role, alongside my wonderful co-chairs, Emily and Frederick. I'm also the co-chair for the CNCF EndUser Developer Experience Sync. So in CNCF parlance, EndUser means a very specific thing. So EndUser are members who use cloud-native technologies, but they do not sell cloud-native services. So we are basically non-benders. So this Sync, the CNCF EndUser Developer Experience Sync, we are a community of end-users, and we come together to discuss all things related to building and operating a Kubernetes platform. So in this talk, I will be presenting a summary of our experience and attempt to answer the question, is Kubernetes delivering on its promise objectively? So these are the key topics I'll be covering in this presentation. I'll start off by discussing why we need an internal platform, and then I'll talk about the promise of Kubernetes and how it was a perfect match for building an internal platform. And then I'll provide an evaluation summary based on the perspective of platform engineering teams. To understand why we need an internal platform, let's look at the key stakeholders and their expectations. So we have application developers. They don't want to deal with the complexity of infrastructure. They want simple building blocks that they can self-serve. They want well-established golden path to production for each of their different applications. And they want full integration to the platform from within their development workflows. And then we have our business stakeholders. They care about fast time-to-market. They want to make sure we avoid costly production issues that have financial implications for the business or can erode customer trust. And they want uniform enforcement of security and compliance, which, if not done correctly, could expose the business to a lot of risk. And then we have our platform engineers. They care about scalability, reliability, resiliency, but most importantly, they care about extensibility because that is what helps them build a platform that exactly serves the very specific needs of their internal customers. So let's see what it takes to build an internal platform. And let's imagine this was a time before Kubernetes. And I know that feels like eons ago, but it's a really helpful comparison to put things in context. And I'm going to highlight some key components. So most platforms back in the day had an orchestrator. At Shopify, we had our own home-grown orchestrator, which was basically a collection of bash scripts that would SSH into each host, restart the container, and roll out the application changes. We did not have anything like auto-scaling. We couldn't even restart a failed container automatically or move it to another node. And most platforms also had a central configuration management database. There was no GitOps back in the day, and there was no reconciliation or desired state. And there were tools for logging, security scanning, CI CD, et cetera. And if it was an advanced platform, developers could self-service all of this using a service catalog. You probably know where I'm going with this. This platform was not great. It needed an army of experts, and it took many, many months to build such an internal platform before you can actually start running a business on top. So compared to the platforms that we have today, this was mostly infrastructure as a service, plus some collection of tools on top, and then some best practice guidelines to the application developers, hoping that they'll do the right thing. So this was the landscape before Kubernetes. And then Kubernetes came in with the promise of a scalable container management platform that will enable us to run containers anywhere and everywhere. So several end users were really hungry for a solution like this and became early adopters. And fast forward to what they have today, Kubernetes has given them such a head start in building these internal platforms. You can see the power of Kubernetes and its rich ecosystem of tools, because that's what they've used to build every aspect of their platform. Kubernetes has a well-structured API and a well-understood extensibility mechanism that we can use to model, configure, and deploy all of these resources consistently inside the platform. It has truly enabled platform teams to operate with greater velocity. So a big picture takeaway. If you're a new business and you want an internal platform for your business, you can get started in a matter of weeks. And this is not just infrastructure as a service, this enables platform teams to truly operate platform as a service and in turn helps their application developers operate with greater velocity and get that fast time to market that the business stakeholders need. So now's the objective evaluation part of Kubernetes. First is portability. I would give portability five stars. Clearly Kubernetes is portable and it has created this layer of standardization across infrastructure and multiple cloud providers. And more importantly, as a platform team, our skills are also portable. Next is reliability. Reliability is also five stars. Platform teams have learned to treat everything as replaceable, things that can come and go. We have no longer treat things as sacred and something that needs a whole lot of protection. And this has overall helped with the reliability of the platform. Next is extensibility. This is what makes Kubernetes really powerful and much more than a container orchestration system. By leveraging the power of custom resources and operators, we can model and lifecycle manage every aspect of our infrastructure. Scalability. Kubernetes is good at scale, but end users like Shopify and other end users that I talked to, we've really stretched the limits of Kubernetes. Although we've been able to work around most of these issues by just creating more clusters, I would give scalability only three stars because we are constantly raising the bar here on scalability. And finally, simplicity. Kubernetes is a complex system and it is not an out-of-the-box solution for application developers to use. It is powerful and it's flexible enough for platform teams to use and we don't mind the complexity. Many successful businesses, including Shopify, have built their entire platform on top of this technology. I'd like to outline some challenges that platform teams are still grappling with. Number one is YAML configuration complexity. There's trade-offs between simple defaults and exposing enough flexibility for applications to use, but when you're doing this at a scale of thousands of apps across a global fleet, this problem just gets so much harder. And Kubernetes upgrades are painful. It has gotten better than before, but it is still an area of work in progress. Platform teams are happier to deal with one less upgrade each year. It's now three per year, compared to four per year in the past. And the third thing is multi-cluster management. I mean, I mentioned how to deal with the limitations and scalability we've ended up creating more clusters. That's not the only reason why we've ended up with clusters for all. When you operate across regions and multiple cloud providers, creating more clusters is almost unavoidable. And there are other organizational reasons why we create more clusters. Shared ownership is really hard, and when most things are only configurable at the cluster level, you almost have to create one cluster or a set of clusters per organization. And there are other operational considerations why we create more clusters to separate our stateless and stateful workloads. So, combined with all of this, we've now ended up in a situation where we have just too many clusters, and there's a new set of challenges around multi-cluster management. We always look for solutions in the ecosystem before we build in-house solutions. And I would say this is a great area of opportunity, business opportunity in the ecosystem, because most end users that I talk to are building heavy in-house solutions for multi-cluster management. Before the final verdict, I would like to talk about different options that you have when you're building a platform, because you may be wondering, well, if I can only leverage the power of Kubernetes if I have a large platform team, well, I don't have that luxury, right? So, this is how I think about the various options in terms of building a platform. And I think of this as a four-layer cake, and let's start from the bottom. You can build your own Kubernetes platform in your own data center. This will probably give you the most flexibility and the most freedom for innovation, but also the most effort. Let's say if that's too much, but you all have existing capital investments in your data center, and you're working on your Kubernetes skills. The next best option would be to use a vendor platform, like OpenShift or Tanzu or similar, and leverage that platform. And if you don't want to deal with infrastructure at all, the next best option would be to use, or if you're born in the cloud, the next best option would be to leverage a cloud platform. So, you use a cloud provider, get their VMs, and you build out your Kubernetes platform on top of their VMs. Or the other option is using all of the benefits of the cloud and using the Kubernetes managed services that the cloud vendor offers. This has, this is definitely less effort, but it will also give you less flexibility. So, when making these decisions, the trade-off is obviously going to be between simplicity and flexibility. Cost is another dimension, but it's hard to generalize and say which of these would be cheaper. You can also mix and match your approach based on the problem that you're trying to solve, and I'll give you a few examples. I'll go back to the previous slide. So, let's say you want to build a multi-tenanted platform for a large number of stateless applications, and you don't have a large platform team. You may want to use cloud and the managed service because it's fairly straightforward. We all know Kubernetes works really well for stateless apps. And let's say you want to build an AI ML platform or a database platform on top of Kubernetes, and you want to have the flexibility to turn all the knobs, optimize, tweak it continuously because we know that this problem is hard, so you'll have, it won't just work out of the box. So, you may want to consider building a platform on the cloud or using a vendor platform so that you have the flexibility to control the control plane, the worker nodes, and continuously optimize them. And yesterday, I was talking to an end user who's in the telecom industry, and they have very strict latency requirements. So, they are using the first layer of the stake, which is they build their own Kubernetes platform in their data center, and they're actively considering the clouds, but it's going to be a while before the cloud providers can meet their latency requirements. So, that's how I think of in terms of the various options, so you definitely consider your own personal situation, the size of the platform teams that you have, the workload itself, the use case, and then the differentiated value that you're able to add to your business by going with each of these options. So, here's the final verdict, and I'll start off by saying, platform teams love Kubernetes. I can't imagine any of us wanting to go back to using homegrown scripts to orchestrate containers and production. Given the growth of scale and complexity over the past years, Kubernetes has truly been the engine of progress for businesses, and it is the engine of productivity for platform teams. Overall, Kubernetes has more than fulfilled its promise, and with its impressive ecosystem of tools, the possibilities for innovation are truly endless. And I'll use this opportunity to invite you to join the developer experiencing. We are a community of learners, and we grow together by learning from each other. Our next meeting will be on May 4th, and I'll share the QR codes shortly. And here's a plug for a great end user panel discussion that's happening at 11 a.m., where you'll be able to directly hear from several of our end users and how they are dealing with this particular challenge of platform efficiency. And if you're interested, we would love to have you. Thank you for your time. Please scan the QR code on your left to join the developer experiencing, and the QR code on the right to provide session feedback. Thank you.