 We are going to start. First of all, welcome to the talk, Lessons Learn how to migrate an existing product to a multi-tenant cloud-native environment. Thank you for being here. So here in this talk, we are going to show you a lot of things, but we want you to keep in mind three different things that you are going to learn. One of them is how to migrate your existing product to cloud-native things. Another thing is particularly how to run Kubernetes, sorry, Java in Kubernetes, which is not an easy thing. And the last thing is going to be how we deal with the troubles and what went well and what went wrong in our case. So you can learn and do it better, of course. So well, let's start. This is Natalia. I'm working at Adobe as a software developer engineer. And in my spare time, I like Maths and Codeine. And I'm Carlos. I'm a principal scientist at Adobe Experience Manager. We both work at Adobe Experience Cloud Service. I'm a long-time open source contributor. And some of the things I've done was starting the Jenkins Kubernetes plug-in many years ago. So who's here using Kubernetes? OK. I'm not going to ask who heard about Kubernetes because if you haven't heard about Kubernetes then we have a problem here. A short introduction about Adobe Experience Managers to understand better what was our case case. It's a content management system with asset management, enrollment forms. It's not very well known as other Adobe products that you've probably heard of. But it's used by many for $1,400. A lot of enterprises use it. From the stack point of view, it's an existing Java OSGI application using modules. And it uses a lot of open source components from the Apache Software Foundation. And it also has, and this is interesting later on, it has a huge market for extension developers, people that write plug-ins, write components for AM. We contribute and build on top of Apache projects like Sling, Felix, Maven. So we use and we contribute back to the open source community. So of course, we use Kubernetes, as we were asking you who use it. So for the use of Kubernetes that we have, we are running on Azure Cloud. And we have more than 37 clusters right now, but even more we're growing. We are present on multiple regions because of course, we have different customers across the world. So we need to have different availability zones. For the customers, we have the states, Europe, Australia, Singapore, Japan, India, and even more are coming. And at Adobe, we have a dedicated team who is actually managing the clusters. It is just a dedicated team that provides us the clusters. And then we use it. You can imagine it like a provider for us. It is like the same internally. So how it works is, in our case, customers can run their own code on our clusters. So you can imagine that this might be a little bit unsecured for us if we don't have any control. So in order to control that, we have some cluster permissions limit by security and a lot of restrictions, and we have some security controls in order to deploy the code to production. One of them, for instance, is one easy thing is we always encrypt the traffic that is leaving the clusters. For the environments that we have in our product, well, customers can just go to our platform and they can directly self-serve the environments. So they just click a button and then they create automatically the environments that they need. So each customer, usually, they have at least three Kubernetes name spaces or more. We create one environment per name space so it is fully isolated from the rest of the customers. And each environment actually is a micro monolith. So it's kind of a trending, right? We want to choose micro services, monoliths, so we have a hybrid thing. So again, as I mentioned before, we run a customer environment for each name space. So we need them to be fully isolated and this is quite important just to keep the privacy and the restriction for each customer. So we have network isolation. We also have some quotas so we have everything under control and of course, the permissions, as I mentioned before. But again, Adobe is a big company and this is also a big product. So we have a lot of different teams building services here. And of course, we also have different requirements and different languages. We use Java, Go, a lot of things under the hood. So the rule that we use is if you build something, then you run it. So then a team is driving and owning up a dedicated service that they run. And also we have some APIs created so the services can also have some kind of contracts. So we have everything behind a contract and the team, we use API first and so on. And also we have some Kubernetes operators patterns that we will actually explain a little bit more about operators later on. And again, to divide these kind of responsibilities for the team, what we have been doing is actually creating init containers and sidecars. So the thing is that, hey, you're going to build a new feature for this micro monolith, then here you have an init container on a sidecar and then you are fully isolated and then you don't conflict with the others and other versions and so on. Yeah, so we use the sidecar pattern a lot because this allows us to not introduce new features on the existing Java application and people can move faster by creating sidecars or external services. But sidecars are convenient because you can run next to the application but without having to make your code or having to follow the same release process. Examples of sidecars and in the containers we use service warmup, like storage initialization. We also have HDPD front-end Java application. This comes from the on-premise times where HDPD does some caching on top of Java. Sidecars to export metrics, flu and bit to send the logs to our central location. In Java, you can do threat dumps and we have a sidecar that will collect those threat dumps and store them in a central place. Envoy for proxying and networking requests and also a sidecar for auto updates. Example, the service warmup, this ensures that the service when the Java application is starting is ready to serve traffic because requests could be expensive if the Java service comes up and it has not cached the top requests, it may take a while to process them. So in order for customers to have a good experience, we initially warm up the top requested paths to the lazy caching. And this allows us to have some caching without expensive starts, and especially in the Kubernetes world where pots come and go all the time, this kind of prevents having a bad experience by the end user if the page of the request is not cached. We also have another sidecar to export metrics, OS level like this size, this space, if the network is reachable or not, and we have alerts based on this. Fluent bit, an open source project that fixes the logs from all the containers running in the pod. They all write to this as share file location, so as share directory across all the containers and then the sidecar just looks at those files and send them to our central location and by having it in a sidecar instead of each application allows us to centrally control it better and make changes without affecting the application. Threaddump collections on the JVM, same thing, when we want a threaddump or periodically we take threaddumps, the files are stored in a share location, we take the files, we upload them. Envoy, we use Envoy is a proxy open source also and we use it for traffic tunneling and routing. We have a use case where some customers are asking us, in these share multi-tenant clusters that we run, some customers are asking to have dedicated IPs or be able to connect to their internal VPN. So we use Envoy as a sidecar, a bit of the service mesh pattern that is proxying the traffic from the Java application and getting it out of the cluster through this dedicated infrastructure that we set up for the customers. Envoy is a very powerful and very popular open source proxy. It can be used as load balancer, reverse proxy and it has rate limit and circuit breaking, retries all this, I mean, it's used by Istio and a lot of the service mesh out there. The out-updater, I don't know if, did anybody hear about the lock for shell vulnerability? Probably. So we figure out, we need a way to patch the everything running on the cluster without requiring the customers to do anything on their side. So we set up an init container that does this out-updating. So it checks if the version that the customer is running or any of the components the customer is running, if we need to make any change, like update lock for J. And this runs as an init container without having to touch the main application. So this is transparent for customers, they don't have to do anything, it makes their life easier. And this allows us to patch the whole cluster fleet life. All right, so as I mentioned before, the operators, we have multiple operators running on our Kubernetes cluster. So first of all, we are going to introduce a non-open source operator just because it is our main operator and it will connect the pieces for the open source ones that we are using. So this main operator is managing the life cycle of the environment that we have. So it means that we use an operator to run them all and we have here some pre and post jobs to the environment creation. We just launch and trigger some jobs that we need to prepare the environment and to process post operations. But again, the main thing here is that we are reconciling also some internal operators. Again, we are providing some APIs for these kind of contracts that I mentioned before so we can connect and make sense to connect and to have an agreement for all our internal systems to work together. So let's talk about these internal operators that we are triggering and reconciling from this one. One of them is the Fluke CD Helm operator. This open source operator is actually pretty useful for us because it allows us to manage the Helm charts for our environments in a declarative state. So it allows us to graph and to gather the status of this object that we have in Kubernetes, the custom resource that we create for this operator and then we can just get the status from the main operator and say, all right, the Helm operator was successful. So then we just have all the status in this main operator. This also is useful when engineers need to debug and then they just go to a central place and see all the status and all the debugging things, what went well, what went wrong and all these kind of things. We are also starting to use Argo rollouts operator and this is also a pretty interesting operator because it provides us advanced deployment strategies like a canary deployment, saving testing or blue cream but also allows us to have automatic rollbacks. We are starting to use it but sounds amazing. And again, I mentioned this topic before as well, is the security. We are running and keep it in mind, we are running customer code on our clusters. So we need to be aware of the security things and in order to control the security profiles for the pod, we are using this Kubernetes security profile operator, which is also open source. It allows us the security profiles to provide, to define some capabilities that a pod can use. So we have everything like control. It can be integrated with a c-con or c-linux or up-armor and also it can be installed, recorded and distributed via OCI images. The other important part is how do we scale this, how do we scale this, all the thousands of environments we have for each customer because we have this micromanally set up. So we have over 17,000 environments across the clusters and we have this multiple teams building services. So we need a way to scale that. Ideally, different teams don't have to be concerned about. If we can find a way that applies to all teams or we can do centrally, that's much better than having to go to each team and tell, you know, this is the best way of doing this and you should adhere to these processes. If we can do it cluster-wide, then it's a lot easier. On Kubernetes, if you run Kubernetes workloads, there's two critical things that you need to be aware of which is on the resource side is the requests and the limits. Request is basically how many resources are warranted, limits is how many resources can you consume and they can be different. There's a slight, there's a fundamental difference there. These resources, request and limits can be applied to CPU, memory and ephemeral storage. What can happen when you don't, when your workload is not complying with the request and limits, on the CPU side, you can end with CPU throttling and you just realize that your pods, your containers are not giving the performance that you expected. On the memory side, the limit is enforced by the kernel. So if your workload goes over the memory limit, the kernel is gonna kill your processes or some process inside that container. And on the ephemeral storage part, the limit is also enforced. So if you use more ephemeral storage than what you were set up to, the kubelet Kubernetes is gonna evict your pod from that node and Kubernetes is gonna schedule it somewhere else, but you're gonna realize that your pod is gonna disappear and then later on reappear. And that may be a problem depending on your use case. We're switching now to ARM architecture. Anybody's looking into ARM? Yeah, the rest of you, you should. Because you can get just 15, in our case, 15, we estimate 15 to 25% savings, just switching the images we use and switching the nodes we use to ARM. In the Java case, this is very simple because your code will run, unless you're doing something really special, your code is gonna run on a JVM on ARM, the same way that's gonna run on a JVM on Intel, and you just have to switch the base Docker image and you're done. There's nothing else that you need to do for your application. Now, Java Kubernetes, who is using Java here? All right, just a half, a third and a half. So I don't know if this is gonna be interesting for you all, but I'm just gonna go through it a bit quicker than in Java conferences. So let's assume that we have a workload in Java that is typical Java workload running on one of the latest releases, a recent JVM that has been around for less than 15 years or 10, with a big, well, not so big memory, four gigabytes and two or more CPUs. So typical Java workload, not one of the new microservice Java ones. So if you look at what is the default JVM heap size, so the Java JVM is gonna allocate a heap when it starts and you will think, okay, I'm running Java in a container, I want to use as much memory of my container as possible. So do you think it's gonna be 75 of the container memory or the host memory, 25 of the container memory or 25 of the host memory or the magic number 127? Not 126, not 128, but 127 megabytes. So it depends, the answer, it depends. For a typical container, you're gonna be using 25% of the container memory, so something that it has more than 512 megs of memory, your container has that. The JVM is only gonna use 25% of the container memory. 127 megabytes is a magic number that happens if you have 256 to 512 megs in your container. And this has changed and now it's kinda stabilized because before, the JVM would detect the memory available in the host and not the container and as you can imagine, this was a problem when you were running Java inside a container. And it uses the memory limits to decide how big is gonna be this default heap. So there's really no guarantee that you're gonna have that physical memory available because it's using the limits, not the request. The request, you have them guaranteed, the limits you don't. You can oversubscribe on limits, your notes. So advice, always do not trust the JVM ergonomics, configure memory, you can pass these JVM flags, initial run percentage, match run percentage. Not mean run percentage because it means something totally different. But in our case, we use 75% of the container memory, we explicitly override it because otherwise you are wasting memory, you're wasting money, you're burning CO2 for no reason. The only special cases, I mean you have to consider your case, but there's some special cases like off heap memory, that's if you use native code inside Java, some things like Elasticsearch, Spark do that. And from the Kubernetes or the kernel point of view, the JVM is managing that memory. So from the Kubernetes point of view, that memory is used. So you have to be careful about how you deal with this because Kubernetes is not gonna tell you your memory is going up and down because the heap is always gonna be taken by the JVM. And the advice is set the request and limits to the same value because you're not gonna have flexibility unless you use off heap memory. On the JVM we have this garbage collector and there's different types of garbage collectors. Now we have five across all the JVM versions. You have a serial one, a parallel one, G1, C, and Shenandoah. The default, they change depending on how many CPUs and memory is available in the container. So you might find yourself running a container with a Java application in one place and with some request and limits. And if you increase the request and limits or you reduce them, now your performance may change because the garbage collector being used is different. So you make no changes to your Java application, you make no changes to your JVM arguments, but now because you change the container memory or CPUs, you have different results because different garbage collectors being picked up automatically. And the consequences of this is that if you don't have the right garbage collector in Java, you could have pauses on your application. Again, do not trust the JVM ergonomics. You can use the different flags to configure the different garbage collectors. And one example from Microsoft is they have this table that you may agree or not about it, but there's different cases where different garbage collectors make sense. If you only have one CPU, a serial garbage collector may be okay. If you have more than two, then you can go to parallel garbage collector or if you have more or less memory, you can go to a different one that gives you different performance depending on memories and CPU. When you run Java on a container on the JVM, what do you think, how do you think, or how many CPUs do you think the JVM is gonna use? Do you think it's gonna be the CPU request? It's gonna be the CPU limit that you said on Kubernetes. It's gonna use as many CPUs as the OS allows. So again, this is something that changed last October. On the versions before last October, it would rely on the CPU limits to figure out how many CPUs the JVM things are available. And this was changed on October to take a look at the CPUs from the OS level. Before, it would do some mathematics that makes sense so as long as you have less than 1024, if things is one CPU, if you have 2048, it's two CPUs and so on, that makes sense. But if you set 1024 mili CPUs, it thinks I can use all the CPUs. And why is this important on the JVM? The number of CPUs or what JVM calls active processor count is used to compute the number of threads that the JVM can use in some cases, different subsystems. So for instance, when you start a thread pool in Java, the JVM is gonna look at how many CPUs are available or how many CPUs things can use. And based on that, it's gonna size the thread pool to one number or another. If you set your limits, I think it was, to 1028, it's gonna think, oh, I can use all the CPUs, so your thread pool can be bigger. If you run on a machine with a lot of CPUs, then that becomes a problem. So this becomes a problem when you have extreme, right? Like huge, beefy VMs. Maybe your cluster has very beefy VMs mixed with smaller VMs, or you are testing in a stage with cluster with smaller VMs, and then production, you run with bigger VMs, and you're gonna see differences and you're gonna wonder what happened here. Just the bug that, or the bug that was fixed on October before it was using CPU shares, which was the CPU requests in Kubernetes. So Kubernetes takes the CPU requests, translates them to CPU shares on the container runtime, and the JDK looked at that to see how many requests, how many CPUs could be used. And this was changed. And again, if you follow up from here, do not trust the JVM ergonomics, so you can explicitly set what, how many CPUs you want the JVM to consider. Another case that is a bit more complex. So let's say you have a 32 CPU host, and you have two JVMs, and this kind of applies for everything, they're not just JVMs. You have two processes, one of them is idle, the other one is trying to use as much CPU as possible. If you set them with eight CPU requests, 16 CPU limits, this, the max CPU that is going to be used in this case is 16, because you set the limit. So this kind of makes sense. I set the limit to 16, my container cannot use more than 16, even though the rest of the host is doing nothing. If you don't set limits, so the same case, but no limits, you allow your machine, your containers to use as many CPUs as possible. So this is interesting because people think that they should put limits to protect the workloads, but it's not always that case. If you put limits in, you are gonna have a lot of unused resources, or you could, depending on your use case. So the specific case of CPU requests and limits on Kubernetes is that the CPU request is used for scheduling, so Kubernetes is gonna say, oh, you're requesting two CPUs, I'm gonna give you a note that has two CPUs available and they're gonna be used by you. You can use two CPUs if you want, but also is used as a relative weight after a scheduling happens. So it's not the number of CPUs that can be used, but it's the relative weight between workloads that you have in the same node. So if you have one CPU set as a request, this means it can consume one CPU cycle per CPU period. So even if you set a request of 0.1 CPU request, the minimum you want, and you have two processes, they can use up to 50% of the CPU time. So the request is a relative weight. It doesn't matter that you say if you have two workloads with the same number, it doesn't matter if 0.5 CPUs or it's 100 CPUs. It's gonna be a relative weight and they both can use up to... If they want to use as much CPU as they can, they can use 50%. This translates... So that was the request. On the limit side, this translates to C-groups quantum periods. So this is kernel-level or C-groups-level limits. On the CPU, the period by default is 100 milliseconds. The limit is the number of cycles that can be used in that period. If you use your cycle, your time on the period, your process is gonna get throttle. So this is not just containers, but this is typical processes in Linux. So if you have a single thread process and it's running in one core, you are fine. I mean, if you assign it one CPU in Kubernetes, this becomes a C-groups share. Your process is gonna be running on that thread in one core for 100 milliseconds. Everything is fine. But this is challenging for Java because Java uses multiple threads. It uses a lot of threads. And any other language that uses multi-threading. So an example, if you have a Java or any process that has four threads, but you only have one CPU limit, you could consume your CPU time in 25 milliseconds. So you could consume your time in one fourth of the period and then getting throttle for 3 quarters of the period. I don't know if this example makes it better or more confusing. So you have four threads running. You have 100 millisecond period and you assign one CPU limit. Your threads, if they are using all the CPU they can, they're gonna, in 25 milliseconds, they have consumed 25 milliseconds four times 100 milliseconds. So for the rest of 75 milliseconds, your process is gonna get throttle. And this is very... This hit us a lot of times because web pages at some point, the request response time is very important and then you realize why is the response time increasing? Why is the response time going crazy at this point of time? It's because you are hitting the limits. And it's not that some request may go faster, some request may go slower because your process is getting throttle. And this is interesting for Java application because typically they are very multi-threaded, but any process that is multi-threaded is gonna have the same problem. So now we are going to talk about the autoscaling that we have configured very quickly. It's going to be just an overview. So we have three different autoscalers set up in our Kubernetes cluster. We have the cluster of the scaler, we have the horizontal part of the scaler and the vertical one. So for the cluster of the scaler, it allows us, probably you know, it allows us to automatically increase and decrease the cluster size. So how we increase or decrease the cluster size? In our case, we base on CPU and memory requests. And of course, we also leave some headroom for spikes because this process of adding a new instance is not a quick process, it's just speeding up a pop. So we just need to have some room for growing very quickly. Although we try to keep it, you know, the more restrictive as we can just for saving resources. But again, we just have some headroom. We also have different availability zones and we need to scale in all of them. This is important, this is what is involved, setting up a maximum of nodes for the cluster of the scaler. And you will see later on because I will show you some graphs. The cluster of the scaler is saving us around 30 or 50%. So it is a lot. Here you can see a graph where we are just a normal usage of the cluster of the scaler. You can see it is up and down from 50 up to 100 of cluster instances. But now here you can see a back. So we just introduced a back. And you can see a spike from 100 up to the maximum of nodes which in our case was 150. So this is why it is important to set up a maximum because otherwise the build at the end of the month is going to be very funny, you know, you introduce this kind of back. Then you can see that the back was fixed and then we just were in a normal usage again. The vertical pod autoscaler. This is great. I mean, you'll see later on I will explain the differences but this is just about increasing and decreasing the pod so making it bigger or smaller for the resources. It is good, you know, because you can just scale up and down a deployment. But this has a big disadvantage as it is right now in Kubernetes which is you need to restart the pod to make it effective. You can either do it automatically or do it on the next start but it is not like the original one that we will see later on. Again, this is a disadvantage right now because in next Kubernetes versions it will be avoided so this is going to be like a huge change that will change this vertical autoscaler one. And one recommendation that we give you here is don't set the vertical pod autoscaler to auto because otherwise you will start seeing like random pre-starts so it's going to be like a very weird state. We only use it so far for development environments and we just scale it down if it is unused so we just use it for saving resources. And the GDN footprint is hard to reduce because again, you need to have a lot of resources for Java in order to run properly. So the savings for this is around 5 and 15%. And the horizontal pod autoscaler which is just scaling horizontally so you create more pods when you need it is for me personally the most interesting one because you can scale, well or we, what we do is we scale based on two metrics on the CPU and on the HTTP requests per minute. We don't use and we don't recommend to use the same metrics like for the vertical pod autoscaler and in our case and probably in most general cases just scaling on the CPU only might be problematic. Why? Well, I'm sure that all of you might, you know, when you run multiple things on the cluster you can see some spikes on the startup like hey, then I'm running like 10, 20 processes at the same time on the cluster. We have capacity but since you are starting up all of them at the same time you are consuming and boosting the CPU. And then after some minutes or hours it is okay. Then this can trigger a cascading effect if you are based on the CPU only. But not only that startup, also this kind of periodic task that you are running on the cluster and that you might be, you know, detecting a spike on the CPU and if you are only based your horizontal autoscaler on that CPU metric, then you might be triggering the same pod again you know and this will not solve anything because you have a spike because this is a huge task. All right. So, let's go to the end. We talk too much. So, three things to finish three things that we hope you take out of this talk. And Kubernetes is really easy to start lift and ship and then optimize over time. That's what we did. We used patterns to decompose and to create a new application, cycle containers, and create new services and over time you can keep doing this if you bring a monolith into Kubernetes. And it's also very important to optimize the resources that you are using and it's over time you can also improve this. You can tune the JVM CPU memory collector but other processes can be used. All right. Thank you. If you have any questions, we'll be around. Thank you very much for having me.