 All right, we're going to go ahead and get started. I hope everyone can hear me. Like to thank everyone who's joining us today. Welcome to today's CNCF webinar on application modernization and portability across clouds. I'm Karen Chu, CNCF ambassador and community program manager at Microsoft and I'll be moderating today's webinar. We'd like to welcome our presenter today, Ravi Alubuena, a senior architect at robin.io. And before we start, just a few housekeeping items. During the webinar, you're not going to be able to talk as an attendee. So there's a Q&A box at the bottom of your screen. If you have any questions, please feel free to drop them in and we'll get to as many of them as we can at the end. So that being said, I will hand it over to Ravi to kick off today's presentation. This is Ravi Kumar Alubuena. I'm a senior product architect at robin.io. And today I'll give you a brief introduction on application modernization and portability across clouds. Before we talk about modernization, I want to establish the kinds of applications that fall into our spectrum. So we start with the smaller bucket, which are built as microservices typically. They're more than web apps, they could be application services. The second bucket is our traditional SQL databases, which could be MariaDB, Oracle, SAP, etc. And the third spectrum is the databases that were built in the last decade, which fall into the SQL databases. These are typically the distributed databases. And the big bucket here is the big data applications, the Hadoops, the World, Cascas, Elastic Search and Elk stacks. So this viewpoint is from the applications. How are applications classified? They could be microservices or there could be monolithic applications, databases, no SQL. This is an application classification. But when we look at it from the infrastructure point of view, let's say if you're building an infrastructure component, you don't want to be doing this pinpoint classification of applications. You do a broad classification of applications from the infrastructure point of view. And that traditionally has been your status applications and stateful applications. As you can notice, the big bucket here is the stateful applications. And that is where Kubernetes is moving towards. It is trying to solve this problem of stateful applications. So once we have established our applications, let's try and see what are the challenges in addressing the stateful applications of Kubernetes. What is modernization and why is it a prerequisite for bringing these applications onto Kubernetes? At the end of the talk, hopefully, we will walk away with what, why and how of modernization. Currently, if you think about modernization, it's become a trend. And there are several keywords attached to this modernization trend. We talk about LXC, Kubernetes, OpenShift. If you take D, there are namespaces, C groups. And how do you even decide? I mean, how do you even formulate the concept of modernization? And there is one interesting question. When virtualization came about, nobody talked about this modernization. And why is it now? What changed? When we're talking about Kubernetes, why is this boom all of a sudden about cloud native or modernization? And that's a question to think about. Now let's go to what. And there could be several definitions of a modern application. But here is my definition. My definition is application that can adapt to any environment and perform equally well. When I said any environment, it could be bare metal, virtual machines, containers, or cloud. Which mandates that these modern applications have different formats of distribution. It could be your RPM packages, or VPN, or virtual machine images, or Docker containers, or an OCS type. Another key ingredient of a modern application is it should be able to plug into the infrastructure components like security, performance monitoring, logging backup, etc. When we drop in this application on the infrastructure, all the pluggable infrastructure components should work seamlessly with the SAP. That is one of the major requirements. And the other two are really dependent on the application design itself. They have to be elastic and agile and portable. The last point is really important. If you really, really have to liberate these applications from the underlying infrastructure, be it compute storage network, and maybe the authentication and authorization libraries. So that's what the application should run anywhere. Let's go with that for the rest of the presentation. Why? What changed and why should I modernize new applications now? Traditionally, these applications were built, at least the data heavy applications are built for bare metals. And they graduated from bare metal into virtual machines. This was rather a smooth transition for the application, not for the infrastructure. It took maybe a decade or more than a decade for enterprises to really adopt virtual machines. But for the applications, it was a smooth transition because applications didn't need to change. If my application was running on bare metal, it more or less ran equally well on virtual machines. When we move towards Kubernetes or containerization, that was a huge climb. And why is that a steep climb? The reason is what the containerization of the Kubernetes is very open-ended frameworks, especially Docker and Kubernetes, who are prominent players in this area. They are very, very open-ended frameworks. Applications have to be redesigned to fit into these ecosystems. And that is a big ask from the application developers. And that is the reason why we have to modernize our applications. If you come to the how part of modernization, there are multiple stages. If you take a traditional application, the first thing that you have to do is containerize that application. There are several tools out there, starting from Docker, Rocket, Podman, and Builder from Red Hat. These are the tools that allow you to containerize any application. Once you containerize, the second step is Kubernetes these applications. Assuming that Kubernetes is the de facto standard that we are picking here. When you want to Kubernetes these applications, what does that really mean? It means teaching multiple Docker containers to give a service or an application out. And the most prominent tools in this area are HelmChart, K-Net, and the current trend is operators. All of these tools will help you Kubernetes this application. But the biggest challenge amongst all of three is productizing that application. So you have a Docker image, you have a HelmChart, but when we run this HelmChart on a Kubernetes platform, is that production ready? That is a big question to ask. And we will see why it's a big question and what are the challenges involved in productizing that application. Let's briefly look at the different stages and present some challenges in each one of them. So containerization, what is the process? I have a database, a traditional application, and I want to Dockerize it or containerize it. The stages involved are roughly process separation. The theory behind Docker or any container is that it's a microservice pattern. As much as possible, isolate the processes and run them in their own name space or in their own bubble. So that is process separation. And the second stage is configuration separation. Because the root FS of the Docker container is not portable across machines, you have to separate the configuration and data and put them in a volume that can be put across hosts. So there is configuration separation, there is data separation, and there is an entry point. Especially with Docker, the entry point is a single window system. Entry point doesn't recognize whether it's the first start or the second start or the upgrade. Why is this important? Because all these traditional applications, they have, the very first time they start, they do some initialization, they do some setup. And the second time, you don't run a setup, you just start the database engine, service my SQL, start a service, post a start. But if you have to bring this application into Docker, there's only one script that runs. That is entry point, which means you have to build that logic in the entry point. If this is the first start, run the setup. If this is a subsequent start, just start the service. If this is an upgrade, run the additional logic. So all of this has to be coded in the entry point. That becomes tricky when you're containerizing applications. And let's say if we do all of these steps, is it really Dockerized or is it really containerized? Unfortunately, not yet. There are some issues with Docker containers today. This is partly because of incomplete SQL virtualization. If I were to give you some examples, some of the applications will need to use CPU IDs if they may be doing CPU pinning. And in some cases, the CPU count on the memory phone has to be presented accurately to the containers. It is not just proc CPU info virtualization or proc mem info. It is more than that. Some applications derive the info from system for some applications derived from a system call. So all of these entry points, which is procs, SFS, and system calls have to be virtualized for the application to to use the right set of resources. When it comes to storage, some of these applications will need block access, which is Oracle to be specific and NABAR. And current Kubernetes doesn't allow you to do that. It is evolving, but the state of the art, the GA version of a Kubernetes doesn't allow you to do that. The other part about storage is the IOPS control. When you talk about Kubernetes or applications like this, especially, there are multiple volumes involved. You cannot set the IOPS volume by volume. There has to be a higher level construct. It says this application or this role needs X number of IOPS or cap to X number of IOPS. There has to be higher level constructs at an application level that will make the administrators life easy to control these applications. So let's move on to the second phase for modernization, which is Kubernetes. Like I said, Kubernetes is a very opinionated framework. What do I mean by that? Traditionally, these applications have storage components, network component, and a process component, which is a compute. Ignore the network component for a minute. So there is storage and there is compute. But that's not enough in a Kubernetes ecosystem. There are a lot many components that make up your fully functioning MySQL application. There is PVC, there is POD, of course, but there is deployment, there is reductile set to manage the application and its upgrade cycles. There is service, which is required to expose this to the outside world or the consumer of this application. There is config map and secret, which will store the configuration and the passwords or any secrets associated with this application. So you have to follow these rules to really bring in an application into the Kubernetes ecosystem. So if MySQL is this complex or has so many components, let's see the other applications. If you look at MariaDB, these are the components, MongoDB, Elasticsearch, the complexity only grows when you move from one end of the spectrum to the other end of the spectrum. There are several components to manage. Is it that bad? No. We have operators. They are trying to solve this problem. It is relatively easy to come up with this structure for the applications because end users are not building this. The recent trend is that the companies that are backing these applications are trying to come up with operators that have a chance. So it's not all that bad. So you have something like application as code format now. All of these elements are codified. But what is the problem? I have a Docker container. I have a Helm Chat that is published by maybe MariaDB or MongoDB. So that should be the end of the world, right? So I have everything here, but unfortunately not. There is a third component which says productization. What does this mean? Given a Helm Chat and a Kubernetes deployment, it could be on-prem or open shift or a Cloud deployment. If I just take that Helm Chat and run it on this Kubernetes environment, can I claim the SLA? Do I get the SLA in terms of availability and performance? Let's look at those aspects. If you look at the current Kubernetes ecosystem, there are several storage vendors and network vendors. But people still claim that storage and networking are the biggest problems. And why is that? So we are clearly missing something. Storage is not just limited to providing a block device to the application. It is much more than that. So let's go back to our MySQL application. So we have this nice layout of the application. And we started running it, Helm install that gives us a running MySQL application. But after running it for a few days, maybe, you started noticing some performance issues. And the first place to start debugging the issue or troubleshooting the issue is from the application. Then we go to MySQL documentation. We see that MySQL is running in EnoDB storage engine and roughly claims that there are two IO patents here. There is an Appendon U workload and there is a batch write. Traditionally, all of these databases, Acid compliant databases, they have a transaction log, which is an append only thing. Every transaction has to be written to the disk. And then there is a batch process that comes in, accumulate all the transactions and write it to the data volumes. So the idea behind that is you separate the spindles for both these workloads to gain performance. Now let's fix it. Let's follow the recommendation. So which means we are going to add a new volume. So we can dedicate one volume for the redo or the undo wrong and the other volume for the data blocks. So we are running happily after this change, but we still see some performance issues at some point. And why is this? Now is the bigger challenge. So any guesses? Since there are no questions and answers, I'm going to skip it. Now we start the process of troubleshooting. This is our pod. This is MySQL and there are two pbcs, which map to a redo and the data volumes. This pod is running on a Kubernetes environment. For Kubernetes, there is a CSA component which will drive a software defined storage stack or any storage stack. So Kubernetes will go to this CSI and say I need a volume and Kubernetes will. So the CSI stack will drive the underlying storage stack to get a volume. Now let's see how it is working out here. So what we notice here is that the redo volume on the data volume are coming from the same spindle. And this is the problem. So naturally the solution is we separate these volumes out onto two different spindles. If we do that, is the problem solved? No, not yet. There are more problems here. Kubernetes allows us to run hundreds, maybe thousands of these applications on a single cluster. Some other department comes in and they want to run MySQL on the same infrastructure. We ran into the same problem again. So we fixed the problem with one application, two volumes competing for a spindle. But when we were talking about multiple applications, we could end up in something called as an IO render effect. Multiple applications are competing for the same spindle. The natural way to fix it is to separate the traffic out. If two applications are that complex, meaning you have to understand what the application needs, you have to understand how the bits are laid out inside the storage stack. Think about multiple applications coming in and these applications are of different nature. You have MySQL, you have Cassandra, you have how to run the same cluster. And to manage all those PBCs and PVs, it is a nightmare. So what we have generally seen is that the storage stack is not a way of what the compute needs. Let's inspect more. Our MySQL admin is happy, but our Cassandra admin is not. And why is that? Let's look at Cassandra. With Cassandra, it's a much, much bigger problem. With MySQL, it's just performance. So worst case, there will be fewer transactions per second. But with Cassandra, it's near-death. Why? Cassandra is an replicated system. Cassandra takes upon itself the responsibility of data availability. So the application makes three copies, configurable, but it makes three copies. And in Kubledy's world, Cassandra interfaces with three PBCs or three volumes. And what if this happens? What if all the three volumes are coming from the same disk or the same host or the same rack? This is a single-point failure. If the disk were to fade, even though Cassandra is making three copies, so anyway, what the software defines stack or the CSI part of Kubledy's has done is that it has negated the assumptions made by these applications. Cassandra thinks that it is making two copies, but the CSI stack just nullified it or the storage stack just nullified it by placing all the three blocks on the same spindle. So in this scenario, it is definitely not resilient to faults. The natural way to fix it, of course, is to separate these volumes onto different spindles or different nodes and different racks. So I kept saying, let's fix it. Let's fix it. What does that really mean? You cannot just go and manually separate these blocks out for every application. So we'll come to the answer at the end, but let's talk about... We talked about MySQL. We talked about Cassandra. Let's talk about Hadoop, which has many, many components. There are data volumes. There are zookeepers. There is Hodoop. There is CloudRAM manager. There are name nodes, which is the primary and secondary. There are several components and every component has a different ask from the storage stack. And this is the most complex application that we have seen when orchestrating our communities. So what are we talking about? So we have talked about MySQL, Cassandra, and Hadoop. We have established that there are different sets of challenges. And if I were to design a storage stack for Kubernetes, it needs to have all of these components to really host a very heavy application. It needs to understand the IO patterns. It needs to understand the assumptions built into the applications, like Cassandra and data nodes, three-way replicas. It needs storage and compute affinity. It needs location awareness. The storage has to be highly available, maybe by using replicas. And it also needs data protection, snapshots, and all of that. These are very, very critical for you to run a data heavy application like SQL Database or NoSQL Database on Kubernetes in a production style. So recap. Data heavy applications use multiple volumes. They have different IO characteristics. Consolidation makes the problem much harder. And application replication makes the allocation tricky. So what are we looking for? So the answer is application aware storage provisioning. So it means the storage layer has to be aware of what the application needs, what does it mean? In Cassandra, we talked about three volumes. We have to tell the storage stack, look, these three applications, these three volumes or PVCs are related. So allocate them in a certain way, maybe using an affinity policy or an anti-affinity policy or a storage and compute affinity policy. So that knowledge has to be trickled down into the storage stack for accurate placement of these blocks. So we are done with the placement. Assume that we have a storage stack, which is application aware, and it gives the right placement. That's not the end. That solves our day zero or the day one problem, which is running these applications, starting these applications. But after that is the bigger challenge where we have to worry about protecting the applications, guarding against user errors, guarding against disc failures, node failures, and maybe extend to cloud. So these are the next set of challenges in managing these applications. Let's quickly talk about data protection. This is a way of protecting the databases is by using database checkpoints. These are, in a way, because it's just a bookmark in the transaction stream. The other way of protecting data for these applications is by leveraging the block snapshots, the volume manager snapshots, or LAN snapshots. Both of these are insufficient in protecting applications, especially when you're talking about Kubernetes and why. Let's see the volume snapshot flow. Let's say we have created a MySQL application and you've taken the first snapshot. While after creating the application, we have taken a snapshot. We add some tables, ingest some data, and we take the second snapshot. So this is after data changes. Let's change the password of the MySQL application. Now here you notice that the password is changed at two places. One in the database could be system tables, the other in the Kubernetes secret. So we change the password and let's take a snapshot. So we noticed that we captured the PVC or the volume. Now let's change the configuration settings. This is also possible. It is also possible the conflict changes can go to the PVC. So we captured, let's roll back to the initial snapshot. Let's roll back to this point. If you notice, we move the PVC here. The PVC went back to its initial snapshot. Now what is the problem? So this is the final picture after the rollback is done, to the initial snapshot. So we have three blocks that we're interested in, the volume, config map, and sync data. The problem is fairly obvious here. In traditional systems, the bare metals of the VMs, this is called conflict drift. And there are tools built to address this problem of conflict drift. The configuration of the application changes or the passwords associated with the application is changing. And the consumer of this application cannot connect to this application anymore after you roll back. So this is called the conflict drift problem. If you had to really address it, then the better strategy is to capture all the components that make up this my signal application. So this is after the initial snapshot, we change the data, we change the sync password, we change the conflict map. You can clearly notice that you can roll back to any point in time and you get the exact state of the application back. Data protection is not just capturing the volume snapshots. It is more than that, especially in Kubernetes. And not all applications provide checkpoints or snapshots. MongoDB, for example, relies on five-system snapshots and volume VM snapshots. And data snapshots are prone to conflict drift issues that we have seen. And there is a very important concept called consistency group. When you have to snapshot multiple volumes, we need a construct called a consistency group. We have to snapshot all of these volumes in one shot, which means you have to flush the application buffers, the file system buffers, and tell the storage stack that these volumes are related. These three volumes make a group and snapshot all of these volumes in one shot. So again, what are we looking for? So we are looking for application snapshots, which are consistent. So any other challenges? So we have address snapshots as a data protection strategy, but that is not the end of it, because snapshots will not protect from hardware failures. If the disk were to fail, your snapshots are useless. So for that, we have to take a snapshot and move to a different medium. That is traditionally called as backups. And there are derivatives out of that backup. Allow is one way to make applications portable. Before we talk about how the backups are done, let's talk about the use cases of the portability. Why do we need portability? For several reasons. It could be hardware refresh. It could be testing my upgrades. I want to spin up my application in the cloud. Or I don't want vendor lock-in. Today I'm on AWS. I want to move to Azure, because that is the direction the company is going in. There's a data center migration use case. There are several use cases why I want my application to be truly portable, which means it's not attached to the infrastructure. So one way to do this, accomplish this, is by leveraging backups. So we have seen the snapshots, which capture the entire, all the components of the application in communities. We can move these snapshots to the cloud, to a cheaper storage like object storage. And increment of changes only. There are, so you have to, when it comes to backup, you have RPO and RTO on one side, and you have time and cost on the other side. So we optimize on cost by moving the backups to object storage systems, which are cheaper than the block storage systems. And we optimize on that time by moving only the incrementals. When backing up to the cloud, when rehydrating it in cloud, which means I take a backup, and I create the application in the cloud using that backup set, which means I will get a fully functional running application in cloud. Even at that time, we have to optimize on cost. If I backup a four terabyte database, I don't want to rehydrate the four terabyte from the object storage into the block storage. That is the cost. And that is also time. It takes time to rehydrate the four terabytes from object storage into the block storage. Instead, the better way to rehydrate is on demand. Think of it as like the page fault in Linux kernel. The application can come up instantaneously by tearing from the object storage. For MySQL to come online in the cloud, it will probably need 50 megs or 100 megs, not the entire four terabytes. So we rehydrate only 50 megs or 100 megs. So you save on time and cost there. So I'm going to give a brief look at how it should be done. So we take a snapshot. Let's say you have an on-prem cluster running. You can capture the entire application state as part of a snapshot. We can create local clones using that snapshot. These are thin clones. We can register a cloud object storage system, like S3GC or Azure Blob. And we can push our snapshot into the cloud more like and then from any other Kubernetes deployment, it could be cloud or it could be on-prem. We can pull that application in, which means you can have your application upon running in the cloud. Think VMWare or VMDK5. What VMWare has allowed us or virtualization has allowed us to do is there is a VMDK5, which is portable, which can be snapshotted. It really liberated the application from the infrastructure. Elevate that experience to the application because under our MongoDB. I have this application blob. I can take it anywhere and I can start it anywhere. So that is the experience that we are looking for. We have to elevate deployment, protection, backup, cloning, everything at an application level. So with that, what is Robin? We are an application-aware storage stack. It's a distributed volume stack with all the enterprise great features built-in. Snapshots, clones, QS, replication backups. We run on top of any Kubernetes framework. It could be your OpenShare of TKE or on the OpenJaws Kubernetes. We also have flexible networking options via the CNI. And there is a meta-orchestrator called application workflow manager, which will help in these flows where you'll be able to capture an entire application, all the components of the application. It will give you one click Snapshots, clones, scale, backup, and upgrade. So we have many customer deployments, large customer deployments of Elastic Sesh, LogStash, Kibana, Kafka. It's huge in just rates. We have multiple cloud-arrack clusters running on a single Robin platform. Kubernetes cluster with Robin and Tameed. We have a deployment with 400 Oracle Rack instances managed by Robin. Awesome. Thanks, Robbie, for the great presentation. We'll now have some time for questions. If you have a question that you'd like to ask, remind me to please drop it in the Q&A tab at the bottom of your screen, and we'll get to as many as we have time for. While people type in their questions, I'll go ahead and bring up one question that came up in the chat. It said, shouldn't the admin be given choice to restore just config or PVC or et cetera? I can't remember the context of this, but Robbie, do you have an answer for that? My take is that it is error-prone. You can always restore bits and pieces. You can always restore a config map, or a secret, or a PVC. But my benefit is it's mostly in the troubleshooting time that you want to restore bits and pieces. Otherwise, I think application never snaps out so much. Great. The next question is, will there be any custom storage class gets created? As the demon said, as part of the installation process, there will be a Robin storage class. Perfect. Question around serverless and functions for this criteria, not today. We'll give it a few more minutes for questions. I'm still waiting for more to come in. Again, reminder, if you have questions to put them in the Q&A tab at the bottom of the screen. I'd like to bring up that you can try Robin by visiting get.robin.io. That's a free downloadable that you can use. It looks like there aren't that many questions coming in. I guess you did a really good job answering everything early on. Great. Thank you, Robbie, for a great presentation. That's all the questions we have time for. Thanks for joining us today. The webinar recording and the slides will be online later today. We're looking forward to seeing you all at a future CNCF webinar. Have a great day. Thank you.