 Yeah, this is working So yeah, welcome everyone. Thank you again for for giving me the opportunity to talk about This what I call a cloud native Swiss knife. My name is Ricardo. I'm a competing engineer at CERN Let's jump right through it. This works. Yeah, it does All right, so I'm from Switzerland like CERN is based in Switzerland So one common thing that we see everyone carries often is a Swiss knife It's a handy tool. It has actually created more than a hundred years ago And it was the description that I found was that it was a tool that allowed people to or soldiers in this case Open canned foods and maintain the Swiss service rifle So two very different things but two things that a group of people had to do often When they were in service so this became really popular To the point that the company that was building them started adding new tools people had ideas Let's put more things in and they started producing different variations of the Swiss knife And this was variations that would suit different types of individual users So you can see kind of where I'm getting here like you have a lot of tools available But you also have users that will have different needs and they will need to package things differently So on the left here, we have the original Swiss knife on the in the middle We have a modern version of the Swiss knife Which is pretty much similar to the first one, but then on the right we have a very different Swiss knife It's something like from the blades and the the tools to open canned food Suddenly we have a laser pointer We have a USP stick some sort of scissor and someone actually thought it was a good idea and practical to carry that So they built it and you can get it Of course, some people are a bit more Demanding so this is another Swiss knife and suddenly you have a lot of tools if I look at them I actually don't know what all the tools are for but I Recognize that someone actually thought that they need these tools and the company started building it so you can maybe see like a magnifying lens some scissors and some sort of Cutting tool and someone needs it It's not the knife that I would carry, but it's something that someone someone did Now this can go a bit too far So this is the current world Guinness record for the largest Swiss knife and it's Definitely something I would not carry I doubt that someone is actually doing it But but it's to see how far you can push a good concept to something that is entertaining but a bit silly So coming back to the the point what what I would like to highlight today is this comparison between The idea of a Swiss knife having small tools together that suit individual users and what we do it cloud native So I'll tell you a bit about what I would put in my own Swiss knife And in my Swiss knife the first thing I would need for cloud native is something to do logging like logging is really important So you I would need kubernetes kubernetes actually gets you really really far on what you can do it longing But you also need projects like flu indeed that aggregate the logs and push them centrally And then for metrics very similar in a very similar way. You would need a permit is now I'll do a really really simple demo But it's just to show you how far you can get with something that is a Hope we can see yeah something that is kind of a Simple tool, but that gets you really far. So in this case I have like the traditional NGDX deployment that everyone will use and I start getting my logs everyone has done this and It's pretty simple, but if you think about it, I'm not even talking about nodes about where the resources is is handling Being deployed and suddenly I do a command and the system is streaming the log back to me I don't have to care about anything about the infrastructure. So that's that's really powerful But one thing that is even more powerful is that if we actually look at what's possible with this command You don't even need to specify a pod or resource You can actually say that kind of resource you want to stream logs for so the example here I'm using a label instead of specifying the the the pod I want to to To get logs from and certainly like you start seeing quite a lot more logs coming and Suddenly you if you think about it the system is hiding all the complexity for you It's trimming the logs. You don't have to give any details about the infrastructure and this is incredibly powerful And it's something that every time I explain this command to someone It's something that kind of comes back to me On the simplification from previous systems. I had to deal with now the second thing that I put on my Swiss knife is also debugging tools and again Kubernetes gets you a long way and Debugging is kind of hard in Kubernetes But I think the reason is one of the best things about containers and Kubernetes and cloud native which is reproducibility when we deploy our applications we create container images, they are immutable and When we deploy our clusters, we actually also use immutable nodes The the fact that all of this is reproducible and you can get really far in getting achieving reproducibility Is one of the key things about cloud native tools now When I started using Kubernetes, this was actually quite hard to do To handle but this I wanted to show how things also actually get got better as well So I'll do another very simple demo and I'll come back to my pods here So if you would try to log in to like to debug a pods, you would probably do something like trying to get a shell To one of them. So let's say I wanted to get a bashell and this is a To the yesterday we had this nice talk about public container images and best practices And this is an image that actually has best practice I was trying to have a bashell the image doesn't have bashed because it doesn't need it to run the tool So it doesn't doesn't provide. So what do I do like I need to access it There's a tool that I don't need Temptation is to add stuff to the image then you make them be bigger more complex and all of this can kind of escalate So this this specific Image actually has a sage instead of batch so I can do that But then I want to debug some network thing tcp dump. There is no tcp dump. So kind of stuck So again, you start putting all these things on the image. Not a good idea now This wasn't there when I started, but it was added Already and it's actually stable in 125 for Kubernetes, which is the ability to deploy ephemeral debug containers This is it amazingly nice feature Which is the fact that you can attach a container to an existing container with a different image And it will be in the same namespace as you will see the same resources behind So I have my image that I call conveniently my own Swiss knife and I can attach it to the same container that I had before and You can see that it's the same Processes I'm seeing here. This is pretty cool But I can see now I can see speed up and I can see like if some traffic comes in I can actually go and see okay There's some HTML going on here some HTTP calls So it's so so easy now to do this kind of thing that I think I should I thought that would highlight it also like Things like top are actually available in the original one, but I'm fancy and cool So I use age top and then I'm also fancy and cool. So suddenly I have VI as well So you can extend this to have your all your usual debugging tools available, which is really really handy Now the other part that I mentioned here is hot reloading. This is the other Complex part, which is the development cycle that people are used to is not Necessarily easy to do with the cloud native deployments or wasn't Because when you do changes you have to rebuild images redeploy and and this can be a lengthy project a process So there are two things I would like to highlight that are really interesting I won't have time to demo them one is things like the bridge to Kubernetes plugin in VS code And the other one is telepresence and this is from the magic realm of cloud native where suddenly your your local development environment becomes part of the production clusters and they do all the necessary tunneling so that traffic is redirected to your local machine You can use your idea ID checkpoints everything But as if you would be in the production cluster like this is something we demo often internally as well and it's kind of mind-blowing Now the next thing again, I highlight the talk yesterday about public container images. I really like it This is one of the issues we've had in the past couple of years, which is image handling So we heard yesterday The state of the how much work and challenges we have to improve this But I list here two of the common popular images Alpine and Ubuntu. There are 30 megabytes Maybe let's say a hundred megabytes who in here raise your hand if you had to have to handle an image That is a hundred megabytes or more All right, pretty much everyone I Was expecting that so this is in in reality We have images that are quite quite a lot bigger So I give an example here we have this ADM admin image that we use for again handy tools that we need and We started growing it. So we now are now close to two one gigabyte So it's kind of time to think should we make it smaller with to split it She's all good things, but who in here has had to use an image that's one gigabyte or more raise your hand All right still quite a lot of people Now I'll jump right through it Who in here has had to handle an image that is 19 gigabytes or more? One two All right. Yeah, a lot less. We actually have this image is running on our clusters And and there is a reason for that the example here is from a from an image from an experiment called atlas Which is one of the LHC experiments the way they are used deploying their software is they have a central repository where they Builds and make everything available all the releases and they have a very efficient way to distribute codes Across the nodes in distributed clusters when we move to containers The obvious option was just to containerize all of this and it's there's actually no easy way to split it because the jobs Got used to having one single place where all the software is available and they access just what they need if you move to containers, there's no way to split it and attach a no obvious way up to now to attach like One job two to one image So it's kind of a challenge to deal with this So we started growing we got to this 19 gigabytes So we kind of stopped here someone had the idea to create like a ranking of the biggest image used that CERN But this would be a really bad idea. So we didn't we didn't start to this. So this is where we are Well, actually it's not it's we have one record now Which is like two weeks ago someone actually started using an image that is 76 gigabytes compressed 125 gigabytes I'm compressed so who in here has an image that is 76 gigabytes All right, that's good. That's good. You don't want to do this But actually this I want to highlight this because there is a use case for this like cloud native is not about Microservices only anymore. There are so many different use cases for cloud native that you people will come up with these ideas So the problem here is really like if you have all these images deployed in hundreds thousands of nodes And you start pulling it you're basically doing a denial of service in your registry and you're putting the network under a lot of pressure So we can fix this In most cases we can fix it and one one suggestion that We have is from our head of computing security at CERN Stefan that is here in the picture and the way he deals with security and users is uses three tools The first one is the baseball bats and the baseball bat is for negotiation tactics The second one is a water pistol the water pistol is to kill servers in the data center because a dead server is pretty secure The third one is a mop and the mop is to clean up the mess people leave behind when dealing with security So I think this process is probably something that we can learn from To when dealing with users and and the way that they package their images It also helps to have a big smile and like Star Wars characters that you can you see on the background to deal with users All right, and then the last bit I want to yeah, so the way we deal with this today is technically so not really Solving the problem with images that we want to do but we also have a technical solution Which is to use projects like a dragonfly which does peer-to-peer distribution of containers So you pull once and you basically distribute in the nodes and you reduce a lot to load on registry You still deal with big images though The other one is continuity and the support for remote snapshot and tools like continuity where instead of downloading the image You're actually mounting the image remotely meaning that you can start your content immediately And then it will only fetch the files that it actually needs. So this is something we observed It's like our jobs actually only use around 6% of the of the images when these are like 19 gigabytes So you can see the improvements on the right the container startup goes from something like four and a half minutes to 15 seconds And we reduce the threat network traffic dramatically So the last tool I will talk about in my Swiss knife is get ups And this is something we promote internally heavily There are tools like flux and Argos CD that a lot of people are using already The principle of get ups is really that you have the central git repository where you declare all your resources all your Custom resources everything and you can use branches and you can do pull requests and and reviews and then you Attach those Definitions to multiple clusters. So we have clusters on premises. We have them in the public cloud and basically You just have to say each cluster should get this this set of resources from this branch And you can even have like development staging production environments using this so it's extremely powerful It's something we do all the time It also allows us to do what we call clusters as cattle Which is instead of having a one big cluster where you put everything it becomes really easy to have multiple clusters and spreading The application across multiple clusters and this is really beneficial. So I'll give you an example of how this is beneficial I hope this is readable So back in earlier this year just before cook on Valencia We actually had an incident at CERN someone posted in our internal chat saying is there anything wrong going wrong with the registry The UI is failing on me Which is okay. Let's see then a couple of minutes later someone came and said, okay I see the same it's it looks down So we went to check I was around so we're looking and what we realized is that a couple of Like half an hour before we had updated a maintenance script that started slowing deleting all our production clusters And these are not like production clusters where we manage everything These are users production clusters anything can be running there We have no control and we got really scared and when we stopped it It had deleted a hundred something clusters So it's like a third of our capacity and it's pretty serious incident But we didn't know how far we had gotten with this idea of get ups to our users But we actually realized that even if we deleted a third of our production capacity We had no downtime in any service. We had degradation But not only that when we got things back together people were able to get their clusters back up and running in Around 30 minutes and this is really thanks to to the this idea of clusters as cattle and deploying in multiple clusters instead of having pet clusters And also the fact that you can really dynamically deploy your applications really quickly So in the end we had one user saying no worries at the end. It was a chaos monkey test. We pass it It's not something we want to do again, but it was really a demonstration of the potential of all of this So I'll finalize saying that this is my own Swiss knife For cloud native as we mentioned everyone will have a different one Not only that if you come up with ideas for new tools that you would like to see Just propose them join forces get together with other people build them and we'll all be able to Benefit from that work. So thank you very much and enjoy