 All right. So first of all, welcome to this talk. Thank you for being here. We are going to talk about a lot of things in 25 minutes. But we want you to keep in mind two main things. So one of them is what about if you want to migrate an existing product to the cloud native? So then this is your talk. Or even more, we are going also to teach you about our experience and what we did well and what we did wrong. So then you can improve and make it better. So my name is Natalia. I'm working at Adobe as a software developer engineer. And I love math and coding in my spare time. And I'm Carlos. I'm a principal scientist also at Adobe Experience Manager. My background is open source a lot. One of the things I did was to start in the Kubernetes plug-in. Well, thank you for having us here today. Quick introduction about Adobe Experience Manager so you have a better understanding of why we took some decisions. It's an existing distributed Java OSJ application. There was a bunch of Java developers out here. So you will understand a little better what it takes to move Java to Kubernetes or cloud native. It uses a lot of open source components from the ASF. And we also contribute back to a lot of open source pairs that we use. And it has a huge market for extension developers that can write code on top of AM. And that kind of limits what we can do. How can we change API? How can we change implementations and things like that to prevent breaking people? So of course, this product, AM, is running on Kubernetes. We have a quite big product. We are running on Azure. And we have 37 or more clusters in production. So we are present in multiple regions in the states, Europe, Australia, Singapore, Japan, India, and even more availability zones are coming. And we have a dedicated team who is building and managing the infrastructure for us. So this is quite important for us because then we don't need to care about the low level stuff. We just use Kubernetes. So the AM environments. Customers can have multiple AM environments in our systems. So they can just go to the UI and then just click a button. And then they just get as many environments as they want. Usually, each customer has three or more Kubernetes namespaces because we use namespaces to divide customers to have customer isolation. So its environment, in our case, we call it like it is a micro monolith. It's like a hybrid, right? We are always in the trending. If microservices are trending, we are there. If monoliths are trending, we are also there. So yeah? Monoliths are trending now more. Yeah, yeah, of course. It depends on the day. So we use namespaces to provide a scope, as I told you. So basically, we have network isolation. We also have control about the quotas and also permissions. And actually, permissions is really related to our main topic, which is security. Because the problem is that, well, the problem. The fact is that we are running customer code on our clusters. So then we need to take care about the security. This is very important stuff. Services. So we are a big company. This is a big product. So we have multiple teams building services here. So we have different requirements, different languages, Go, Java, Bash, whatever, a lot. So the rule is that if you build it, then you run it and you drive it. And of course, we use also some internal contracts, like APIs or Kubernetes operators patterns. So then we have an agreement, internal agreement. And environment. So one thing that we use is init containers and site cut containers in order to divide responsibilities and don't have conflicts. Because that's the main problem about monoliths. So then. Yeah, the site cut is an interesting pattern in Kubernetes, which is very useful for kind of breaking the monolith into separate services. But you're running the services in the same box, in the same space of C groups, networking, and all. So we took this approach to start moving some parts of the application so they can be separate. The code can be deployed separately, different Docker images, different containers, isolation, and so on. But they run together with the main application. We have a bunch of them, service warm-up, storage initialization. We have a HTTP service in front of the Java application. This also comes from the time when people are running this on-premise. This is the typical setup. Site cut to support metrics to our analytics systems, fluent bit as a site cut to send the logs, threat dump collection on Java. When the JVM stores a threat dump, we automatically get it and ship it. Envoy proxying. I'll talk a bit about this on the auto-updater. So the service warm-up is something for a big application on Java that takes a while to start. This will make sure that the paths that are being hit by the users, the cache is already initialized. Because if you don't do this, when the path comes up and the Java application comes up and you start getting requests, this request may take a long time to be served because the cache is not initialized. It does lazy caching. It goes through the most requested paths and caches those. And this way, we don't need expensive starts. The other option would be, at the start-up, say, OK, go and load everything in the cache, but then it takes you a longer time for the start-up. Fluent bit opens our solution to send the logs. We use a shared volume in a bot but a separate container. Application writes another site cut writes logs to a specific location. And Fluent Bit, we can configure it and deploy it separately and get those logs and ship them. Envoy is a very popular cloud-native proxy. We use this for traffic tunneling and routing. And in our use case, this allows us to have dedicated IPs per tenant and VPN connectivity from each customer pod to each customer endpoint VPN or giving them a specific IP without having to have separate hardware or virtual hardware for them. We set up a sidecar. We configure the Java JVM to send the traffic to that sidecar. That sidecar is going to go and send it to another Envoy that is outside the Kubernetes area. And that way, it goes out to the internal customer network. Delta Updater is a solution that we came up with. You may have heard of some issue with Log4j some time ago. Not very well known. If you work with Java, then you definitely heard about that. So how can we update Log4j across a fleet of thousands of tens of thousands of containers? One way is we rebuild all the docker or the container images. We redeploy everything. That would be one way. The other way is we run an init container. And this allows us to patch live the main container without having to touch the monolith. So when the monolith starts, the Log4j is replaced. And this way, it's extensible, so we can replace any file we want. So operators, we use Kubernetes operators. We have a main operator, which actually is not open source. It's an internal one. But we want to introduce it because, actually, it's base is the main architecture that we have. So this operator is managing the lifecycle of the environments. So actually, it is an operator to run them all. So then we use other ones that I'll explain later. And what this operator does is we launch some pre and post jobs. And also, we reconcile all the internal operators and also the environments. So then this is one of the operators that we reconcile. We use Fluxy, the Helm operator. This operator is used so then it allows us to manage the environment creation, the environment upgrades using Helm. It allows us to have a declarative way of having all the information there in a Kubernetes custom resource. And then we can quickly use this information for debugging so developers can just go. We have some internal tooling to get this information. So it is pretty simple to have a straightforward way to get the status of the Helm. And also, it is also very important because we all have a way to automatically manage the status from the main operator. We also use or started to use Argo rollouts. So this is a thing that we just recently started to. So we use it to provide an advanced way of having deployment strategies. This is pretty awesome. We then have access to Canada, Blue, Green, A.V. testing, and more. But also, it allows us to have automated rollbacks. And for the reason we have the main operator, it's also to have an API that other services can use. And we could replace, in the future, when I replace the Helm operator for something else, we can just do it. On the scaling part and optimization resources, we talked about how this is a micro-monolith. As we like to call it, we have more than 17,000 of these environments, main Java application, both with sidecars and all. Because we have these multiple teams building services, we are looking for ways to scale that are orthogonal, so that each service doesn't need to be aware of very specific things to do. But we can apply across the whole fleet of clusters. On Kubernetes, you can have resource requests and limits. Request is basically how many resources are guaranteed, limits is how many resources you can consume, and you can overcome it. You can have limits higher than requests. And these request and limits are applied to CPU, memory, and ephemeral storage. On the memory side, the limit is enforced, and it results on the kernel OOM kill. For all of you Java developers, this is not related. This is separate from the JVM OOM exceptions that you may get. Basically, if you go over the amount of memory that your container is limited to, the kernel is going to kill your process. On the ephemeral storage part, the limit is enforced, and if you go over the limit of the storage that you're using, your pod is going to get evicted from that node, and Kubernetes is going to schedule it somewhere else. An interesting portion part of this on the CPU side is how CPU requests and limits work. For requests, yes, it's used for scheduling at the very beginning. You said, oh, I want to deploy this and have two CPUs. But then it's not really the number of CPUs that can be used. It's a relative weight once the pod is running or the container is running on the node. It's a relative weight between all the containers running on the node. So one CPU means that it can consume one CPU cycle for each CPU period. So if you have two containers running on the same node, but they only request 0.1 CPUs and there's nothing else running on that node, they can use up to 50% each. Well, they can use up to 100%, but if both are using all the CPU they can, they will split the CPU in 50-50. So the request, once the container is running, is just a weight relative to other containers. For limits, on Kubernetes, this relates to C groups, Quora, and period. The period on the kernel is, by default, 100 milliseconds. And the limit that you set in Kubernetes is how many CPU cycles can be used in a period. If you go over that limit, the container is throttled. This is important on applications that are multi-threaded, like typically Java, and you'll see it there more often. So if your Java application in one thread is using, sorry, if your Java application has four threads and they are using all CPU they can, and the period is 100 milliseconds, in 25 milliseconds, you already use all the CPUs you could use in 100 milliseconds. So your container is going to be running for 25 milliseconds and throttled doing nothing for 75 milliseconds. So this is very important, again, very multi-threaded applications like Java. The other thing we're using for scaling and cost savings is switching to ARM. Anybody using ARM? The people that are not using ARM, why? So for our numbers, we are getting 15% to 25% cost savings for the same performance. And in Java, for instance, it's very easy because you have JVMs built for ARM, and you just have to change the base image for your containers. And that's all. On the specific case of Java, Java was a bit picky running on containers for a long time. Now it's a lot better. But if you look at the defaults, if you have a JVM that a container running with more than 512 megabytes of memory, the JVM by default is only going to use 25% as the heap size. So you're wasting 75% of the memory of the container if you just use the defaults. We typically use, and I think most cases, you can use 75% of the container memory unless you have things that are off heap, like elastic search, Spark, any native code that you're calling. And on Kubernetes, the JVM, when it creates the heap, it basically takes that memory. Kubernetes doesn't know how much of the heap is running, so you have to consider that. And for now, for Java applications, typically, you would set request and limits to the same value because, again, the JVM takes the heap size, and it's going to be constant across the whole time that the container is running. So perfect timing. Kubernetes auto-scaling. We use auto-scalers. Which ones? We use Kubernetes auto-scaler, the horizontal pod one, and also the vertical one. So let's talk a little bit about them. The cluster of the scaler. You might know already, but this is to increase and reduce the cluster size. So we base this auto-scaling on the CPU on memory requests. So this sentence actually is going to be quite important because I'll show you an example why. Don't forget to set a maximum of nodes when you are using the cluster of the scaler. You'll see. So using this cluster of the scaler, we have some savings around 30% and 50%. And here is the example why you should set this maximum of nodes. So you can see here a normal behavior. You can see we are going from 50 instances up to 100. Then you can just decide and wake up one day and see this kind of back. You can see that, suddenly, the number of instances went up right straight forward to 150. This was because we introduced a back, then we realized about some 11. Hey, you are getting the maximum of nodes, and then we just fix it, and then you can see the behavior just start going down. Otherwise, the bill for that month would be very funny. We also use the vertical pod auto-scaler. So just a little bit. I'll explain why. This is about decreasing and increasing the resources for its pods, so making it bigger or smaller. And the fact is that right now, at least in our Kubernetes version that we are using, it requires the restart of pods. Actually, this is one good thing that is going to change in future Kubernetes versions that will avoid that. And actually, it will make it more interesting for us. So this vertical pod auto-scaler, we only use it in Dev environments to scale it down if they are unused. That's why we just have a little savings, like 5% and 15%. And the horizontal pod auto-scaler. Actually, this is one of the most interesting ones for us. We create more pods when they are needed. So if we need more pods, then we create more. If we just have less traffic, then we create less. Well, we remove. So we scale this auto-scaler on based on CPU and HTTP requests per minute. So the thing is that don't use the same metric for the horizontal one and the vertical one, because otherwise you are going to have some troubles. CPU auto-scaling only is problematic. And you can just think about this case. What happens for these periodic tasks that have a high consumption of CPU on a startup or just because you have a spike? If you just base the horizontal pod auto-scaler on this CPU metric, then you might have the same trouble, but with a lot of pods that are unused. So then that's not something that you want to behave. So the thing is that we recommend you to put something else, not only CPU, but also request per minute or any other metric. This is allowing us to have some savings around 50 and 75%, so which is a lot. Yeah, so to wrap it up in our, or just giving you the example of our experience, I think it's very easy to start in Kubernetes, then optimize, you can start. It's very easy to do a lift and shift, bring your monolith or whatever application you have running somewhere else, bring it to Kubernetes and then optimize with these patterns like sidecars or starting with microservices as you need it. Yeah, use these patterns to decompose the application, sidecars in the containers, new services, microservices, whatever you want over time. And it's also important to consider the results of optimization. How do you tune this JVM if you're doing Java? How do you set up the CPU request? Lim is the memory garbage collector on the JVM. All these things you can do afterwards. So if you have questions, we'll take, we have three minutes for questions, otherwise I'll bring Oleg here to give us more jokes. One question over there. Okay, so the question is, I'll have to repeat it, summarizing the, how do we manage CP outer scaling with threat groups on, especially on JVM, right? Yeah, so I think we didn't go into detail there, but we had this issue, for instance, the throttling. You create multiple threats because your JVM assumes it's running, especially on older versions of Java, assumes it's running with the same CPUs that the host has or something like that. If you don't narrow it down to the limits you set in the container, it's gonna think, oh, I have 32 CPUs. I'm gonna create all these threats. And then your container doesn't have, maybe the limits are set to one CPU or two CPUs. And the container, the JVM is gonna create very big thread pools, is going to start using at the same time a lot of the CPU and it's gonna cause, you're gonna see these as throttling in metrics in Kubernetes. Yeah, so you have to, we can go into detail later, but you have to consider how did you set the number of CPUs that the JVM can see with the CPUs you limit the container to. And you have to figure out what's the right number for those things to match. And especially when you allow out to scaling to go. So if you have a container that can run with 10 CPUs or two CPUs based on dynamic metrics, you have to figure out on a startup, not have a static number of how many CPUs the JVM can see, but have a dynamic number based on those request and limits. Any other question, quick? Yes? Yeah, okay. So do we depend totally on the auto-scaler or if we switch, or we look at other metrics? So we look at the, how busy the clusters are and when we see levels in regions, we create new clusters. That was the question about whether we create more clusters. We don't, yeah. We depend totally on the auto-scaler for each cluster and when we reach the capacity, what we consider is the capacity of a cluster, we create more clusters. Well, thank you very much for having us. Thank you.