 Welcome everyone. Welcome. If you're having a really good first day of KubeCon, the talk is going to be about multi-arch infrastructure from the ground up. A little bit about me. My name is Cheryl. I have been part of the Cloud Native community for a little while. Now I work at ARM, where I'm the Senior Director of the Infrastructure Ecosystem. But generally I say my background is software, Cloud Native and Open Source. So I've built software and built software teams at Google and Apple. I started working in the Open Source community through Meetups. I run Cloud Native London, which has about 7,000 members. And I worked for CNTF as well, building up the end-user community around Cloud Native and Kubernetes. When I say the ARM Info Ecosystem, this is what I mostly do. So I have a team of five. We split across Cloud, 5G, Telco, networking. And generally we try and make it as easy and painless as possible to adopt ARM. So we do three things. We do developer outreach. We try and make people aware of what ARM is doing. We try and encourage as much software and hardware support, so make it really easy to get hold of ARM hardware and then all the software that you want on top of it should support ARM as well. And then making sure that ARM is in all of the standards bodies and ARM's customers are also represented. So CNTF, NF Networking, OPI, and a bunch of other Open Source foundations. But overall, my goal, my mission, is just make it as easy and painless as possible to adopt ARM. So when I talk about multi-arch, mostly I talk about x86 and ARM, but risk five and there are other options as well. But just because of my background, I'm going to talk mostly about x86 and ARM. Oh, one thing to point out here. I don't do anything to do with mobile phones. So I know ARM is in every mobile device ever. I don't do anything to do with mobile phones. And I don't do anything to do with cars either. So objectives of today's 30, 35 minutes. I want to answer the following questions. So why is multi-arch infrastructure tricky? Why do we even want to do it? Secondly, if you are running Kubernetes, what are some things that you should look out for? Where do you get started? And then I'm going to look at a couple of case studies. First up, why is multi-arch tricky? Quick poll for the audience before I start. So does your infrastructure support multi-arch today? I'm going to give you five options to just raise your hand so I want to see where everybody's at right now. So first one is, yes, 100% completely support multi-arch. Nobody, okay. Second, partially, like bits of it. Okay, so probably like a quarter, maybe less than a quarter. We're experimenting right now. Okay, maybe like 10, 15 people. No, but we're maybe interested in trying it out. Probably quarter, and then no, not yet, not at all. Okay, one. Okay, thank you for coming along, even though you're not interested in running multi-arch. But thank you for being here. All right, so why do people look at multi-arch infrastructure? Typically, there's two main reasons when I talk to people. The first one is basically price performance. And the second one is a certain secretive fruit tech company that I'm not allowed to say the name because they're so secretive, but I think you can guess who that is. So reason number one is usually because either they've gone, oh, it looks like it's cheaper for us to run some stuff on ARM or maybe our customers are running stuff on ARM and they've asked us to offer ARM support. And like, does anyone feel this? You know, like the price for cloud resources is too damn high. All right, and the number just keeps going up and up and up and up. And particularly in the current climate that we're at, lots of companies are saying, okay, we need to find a way to rein in our costs. And if we can do a bit of work up front and then save on our cost over time, then that's actually a good outcome for us. A second one, the fruit computer. And this is where I'm gonna talk explicitly about ARM. So ARM historically has targeted power efficiency. So ARM's been around as a company since 1985. And in the early days, it was part of, it was kind of spun off from ACORN machines, but turns out power efficiency was really good for mobile devices, and Raspberry Pi's came along. And then a few years ago, Amazon started putting ARM instances in AWS because low power efficiency, if you're running a huge data centers and you're consuming tons of power, low power, good power efficiency is actually really important. And then a couple of years after that, the fruit machines came out, fruit computers. And then what happened is people go, oh, hold on, I can no longer compile and build on my day-to-day developer machine. Like, now this is actually painful for me. So now this is a good time for me to, for us to look at our whole infrastructure and see what we can do about it. So more explicitly, the goals of multi-arch infrastructure is for workloads to run on the best hardware for their price performance needs, without the developers having to consider what is the underlying architecture. That's the whole goal of why we wanted to do multi-arch. The problem is multi-arch is not just one thing, it touches absolutely everything. So this is a sort of sample e-commerce kind of website. So okay, you have Kubernetes at the bottom. You also have like CRCD, you have monitoring, you have a bunch of microservices that are containers. They probably, they might have some storage, they might be stateless, or they might be attached to some storage. And so you're not just saying, okay, we have like a Kubernetes instance, we upgrade it and we're done. You actually need to think about many, many things. So you need to think about, well, is your infrastructure, do you do it in for us code at the moment? What does your CRCD look like? What's your packaging, your binary images look like? Are you do have good testing practices when you do wall outs, when you schedule? Are you doing performance testing? When Kubernetes, you need to do Kubernetes upgrades, but you also need to do like everything else upgrades. So that's kind of the problem, the state where we are. So on one hand, yes, good price performance. On the other hand, a huge wave of things that you could potentially touch. So the next part of the section of talk, I'm gonna give you a high level framework to think about where you would get started with this. Because this, the actual mechanics of how you do this is gonna be really different depending on how you run Kubernetes and how you run your CI CD. So assumption one was this, you have roughly a cloud native infrastructure. You don't have to have all of this, but you have something along these lines. Number two, you're running this on public cloud. Good news is ARM is actually available in all of these public clouds already, but if you are running on prem and you actually need to go and buy ARM servers or something, then that would be a different, again, that would impact how you actually do the transition. So I'm gonna take the assumption that you're running on public cloud. And I'm gonna borrow the Phenops model, which is in full optimize and operates for your three stages. Your first one is to inventory your entire software stack. So what operating system are you running? Operating systems, what images are you running? What do they rely on? What libraries, what frameworks? What do you use to build and deploy and test? What do you use to monitor or manage? For example, security, and just make a huge long list of everything that you use and check each and every one of those for ARM support. So you'll see this listed in two different ways. So some places it will be called ARCH64. Other places you'll find this in listed as ARM64. And then identify what are your hotspots? So what do you actually, what are your most expensive compute? Where do you spend the most in terms of your compute at the moment? The second phase here is optimize. So it's pretty easy to provision a test ARM environment. You can just spin it up on a public cloud. And then you start working on all these little upgrades and making all the changes and adding in all the if statements for where stuff runs on different architectures. Hopefully you also have an idea of what metrics you care about. So you can do some performance testing on it. And then you can also upgrade your CI CD kind of in parallel. The important thing to notice here is this is not a one off stage. This takes, this is probably going to be a couple of iterations. And also the important thing to notice is you do not need to like upgrade Kubernetes first or upgrade everything in your environment first before you start working, before you pick a workload to migrate over. Okay, then the third one is where we get interesting because now we actually hit Kubernetes. So decide on how you're going to build your Kubernetes cluster. Do you want to do a mixture of, you're probably not going to go from zero to 100% or like go from all x86 to all or whatever in one go. So are you going to move your controlling nodes over first? Are you going to move your worker nodes over first? Are you going to try and do a mixture of it? And these all have different trade-offs based on what your software stack looks like and what the availability is and what you choose your initial workloads to look like. Once you've decided that, that's going to affect how you create your clusters. So then you need to go through all your cluster creation scripts and add in where that changes. Once you have a mixture of probably x86 and ARM, that's going to affect anything that runs in a demon set. So now you just go and check everything that runs in demon set and make sure that it's putting down the correct flavor of image for whichever architecture you're running on. And then, and then, and then we get to actually deploying. So again, I hope you have good Canary, your blue-green deployment practices already in place. So you start by rolling out, you know, 1%, monitor, see what's going on. Make sure that you're actually scheduling the right workloads on the right nodes. So you'll use node affinity for this. You'll use taints, tolerations. And then as you slowly migrate over, you will check your, so each architecture will have different, will be able to handle different numbers of requests. So then you may have to adjust the limits that you're putting on each architecture. So that's basically it. Like inform, figure out what you're running. Two, pick a couple of example workloads and try and do the upgrade, try and do the test first in some kind of test environment. And then three is where you actually do all of everything else that you need to upgrade in your Kubernetes cluster. So having given you this like framework for roughly how it works, I'm gonna look at a couple of case studies now. And I'm gonna look at AWS Graviton because it's been out the longest. So it's easiest to find case studies for this. Actually, according to AWS themselves, 48 out of the top 50 EC2 customers use Graviton in some way. So first case study I want to look at is called FusionOrth. They are a developer-focused API-first authorization provider. So they started in 2020 because someone on their forum was typing and saying, hey, I'm playing with Raspberry Pi and I want to try and get FusionOrth running on a Raspberry Pi. And so they started asking for ARM support. A year later, they started doing actual load tests. By March after that, they were officially supported in the downloadable packages, rolled out to their SaaS platform a few months later. And by March of this year, more than 70% of their SaaS instances run on ARM. In terms of the technical timeline and what they had to do, first up, they're a Java shop. So they had to find the JVM, the right JVM support ARM, which was Java 17. So they spent a couple of months just migrating stuff over to Java 17. Then they had to do all these door scripts and updating Docker as well. They, at the time, I think ARM was not available. I think it's still not available in all of the public cloud regions. So they had to make sure that the ones that they were running in supported ARM. And then the very last piece in that was the actual application itself. And for their load tests, they chose logins because the logins were particularly CPU intensive. They tested 50,000 logins and found that ARM handled between 26 to 49% more logins per second and was a bit cheaper. And just some nice quotes from them. Their CTO said like, oh, our lift was actually pretty small. Once we got started, we didn't actually have to make too many changes. And one of the users that I, oh, I didn't even notice the difference between Intel and ARM. Next one up is called Honeycomb. They're an observability provider. So they started doing experiments in March 2020. They moved their first workloads over to production in 2021. By the end of the year, not six months later, virtually everything was on ARM. And then in April, 2022, they turned off the last of their x86 EC2 instances and basically 100% ARM from then on. So, the technical changes that they had to make. So they've written actually a set of really good blog posts about this. I do recommend reading those. And they make a big, they thought very carefully about which workloads they wanted to migrate in which order. So the first one they picked was their ingest workers because they're stateless, they were performance critical, and they scale out horizontally. They do everything in Go. So again, it was pretty easy to recompile for ARM. They had multiple environments already. They had a dog food environment. So they just spun it up in the dog food environment. They didn't use Kubernetes or containers initially. Everything was in Teleform and Chef. So they had to update those scripts. And then the next up were the workloads that they wrote themselves because they're easiest to control and Kafka. And then over time, they basically just went through that long list. And then the last ones they did were just the ad hoc one-off services. And then once they got to that point, they decided if we're gonna go this far, like are we ever gonna go back to x86 because it does add complexity to have to deal with both. And they decided no. So they said, okay, we're gonna switch 100% to ARM and just focus on those. And again, a couple of things that they've said about it is basically it allows them to scale up really easily and it makes it much cheaper for them. So they save 40% on their EC2 instances. And this is actually a really good competitive advantage for them as well. They are cheaper than the alternatives in their same space because their operation costs are that much cheaper. Also, I think it's worth noting that in this second point, they said this takes a few spare afternoons. So those first two phases are figuring out your workloads, spinning up a test environment, and then doing a bit of upgrades. It actually doesn't take very long. And I would suggest that you consider that and give it a go, even if you don't actually want to do anything else. The initial phase to set up is actually much easier today than it was a few years ago. And this is just a bunch of other quotes that I happen to pull, which is basically all about other companies and other case studies that you can find out on the internet and why they chose to use Graviton. And of course, you wouldn't be a good tech company if you went dog-fooding your own stuff. So 2019, ARM was using EDA, which is electronic design automation, very compute heavy workloads that used to design ARM chips. And that was running on X86 and decided to move those to Graviton as well. So they found like 60% better performers, cut the cost in half, and saved a bunch of power, which is good for the environment, good for the world. And that's it. So I have a couple of takeaways for you before we finish up and then I go to questions. So why MontyArch? MontyArch because better price performance and because fruit machines mean that now your daily machines may not, you want to be able to continue working on those developer machines. How are you gonna do it? Start by informing yourself about what sort of software stack that you're running and what are the hotspots that are gonna be the most critical for you to move first. Optimize, set up a couple of test environments, do some very small-scale testing, and this shouldn't take you more than a couple of days. And then the actual operations part is where you need to actually upgrade Kubernetes and do all of the slow rollout. For that one, I particularly recommend a talk from Airbnb at the last KubeCon where they go into a lot of detail about the different changes that they had to make. And then lastly, please do come and talk to me if this is something that interests you. So ARM does not like, it doesn't do the ports and it doesn't, like it's not gonna help you port your software and it's not gonna pay you to port software, but if you're running into problems, then come and chat to me and I can connect you with the right folks and we can give advice. And maybe if you're having difficulties doing performance testing, then come and chat to me. If you run an open source project, then potentially we can offer you some credits to run CI CD on ARM as well. Or then if you're doing something really cool and you think this is something that the world should know more about and would be fun to share, and again, let me know, come and chat to me. So I wanna say thank you very much. These slides are gonna be available by the end of the day at my blog at loyshow.com. And thank you. Any questions from Vex? Do you wanna? Thank you very much for your talk. My question is what guides would you give regarding the steps before, so for the developers? If you're not completely on fruit computers and still in the glass out of the building circle of users, how would you recommend that people build, people work, are there any case studies on that one as well? Then you probably need to move your CI CD work up ahead. So if you need your developers to have access to an environment, not necessarily. Fruit machines, fruit computers. Yeah. Thank you. Anybody else wanna pass? Is there someone in the back? Okay, awesome. Thank you so much. Have a good day.