 Hello there. My name is James Spurin and welcome to this tour on Kubernetes clusters need persistent data I'm the product evangelist for storage OS and I have a mixed background that covers a variety of different areas so enterprise storage dev ops and software development I previously worked for Nomura Goldman Sachs and Dell EMC Lastly, I'm the author of dive into Ansible Here's an overview of what we're going to be covering in this session Okay, so jumping into this how are organizations adopting and using Kubernetes without customers we see a mixture of different approaches and strategies We have what are the hybrid solutions? Ones that could be deployed both on-prem in your own data center or in the cloud These are clusters that you build or manage yourself using the the likes of cube ADM You've also got those being built using Rancher where Rancher deploys and Manages the Kubernetes environment for your target locations Then you've got the likes of OpenShift that Uses Kubernetes at its core, but also acts as a platform as a service Next we have the traditional cloud-based offerings of Kubernetes. So for example your Amazon EKS your Azure AKS and Google GKE just to name a few We also have environments that may be used by developers more locally on their laptops or local system So this could include cube ADM. There's mini cube There's kind and there's other awesome projects like cube fire Then as most of you will know this is still only a small Subsection of the options that are available and I highlight these to show a consideration I recommend when choosing a persistent storage solution this presentation is provided to you by the CNCF cloud native compute foundation and Choosing a data plane should follow the same principles with a solution being cloud native and Able to cover all of the circumstances that you require i.e on-prem bare metal and in the cloud If we take a look at the market utilization of Kubernetes and container technologies There is mostly an emphasis on ephemeral workloads So the container can be started stopped and there's often not a concern for that underlying data If we look at the top two technologies running on Docker according to datadog Both nginx and redis are first and second From a kubernetes viewpoint they're great apps for running as ephemeral workloads. So for example nginx it's very easy to run and deploy and Technically you could do this without storage You may for example use an init container to initialize the Content that nginx serves and when you want to update you destroy and recreate Redis is similar and again you can use this without storage the instances where required They can sync with other nodes to get shared information So whilst these ephemeral workloads are often great use cases and a convenient Treating a cluster as ephemeral first can overall be Counterproductive what we see in the industry whilst many organizations are adopting kubernetes in production for ephemeral workloads They continue to actually support legacy environments for non-persistent workloads As we know here kubernetes as an operating system provides many benefits you've got your scheduling your resource utilization recovery and If you're having to maintain a legacy environment to support the workloads that you can't run on kubernetes Then essentially you're doubling the workload of the team a lot of the cases for Retaining a legacy environment Unfortunately do relate to persistent storage and they they can actually be easily addressed with the use of a Persistent data layer as well as this applications may be running in less than optimal ways So if we go ephemeral first, then yes, we can easily run both redis and nginx as per the examples But having a better tool set in your kubernetes cluster gives you better options As we go through this I want to show how both of these can be targeted in different ways with an effective data plane Lastly, you don't need to create scaffolding to work around the problems of non-persistent storage and really people do this and it's painful and ineffective to see so for example This is one I found on the internet where to actually work around the problems of storage assistance Node affinity at the bottom there is being used to restrict that volume so that it can only ever run on k8s node one Now it's easy to see why this is a bad idea. What happens if k8s node one fails? Well, we lose our data. We potentially lose our application With an effective data plane You don't need to do work rounds like this and can work more in The kubernetes way in which you'd expect so it just works as Desired for me when I set up a kubernetes cluster myself one of the generic steps I often perform is the deployment of a networking technology I'm a big fan of weaveworks and when I use kube-adm as part my general workflow. I install the network stack In the very same way I encourage you as part of your standard kubernetes deployment to look at Solutions that install a data plane in the same manner Storage OS for example installs as an operator and it's a quick one line installation script and once installed This runs as a demon set so Why would you want to do this? Essentially an effective data plane is like a power up kubernetes. You can see here. We've got Mario's kubernetes We have the flower as the data plane and when these are combined. You're essentially running a powered-up cluster When you increase the clusters tools and functionality you also increase Productivity so to understand this let's look at this a little bit closer a Storage class is a standard kubernetes component that acts as the gateway for storage OS interaction We use the native kubernetes CSI driver So when you're using storage OS you're using the decorative language as you'd expect to use with kubernetes however With storage classes acting as your gateway you can use this to promote Multitenancy and agility in the environment that you're actually working in so after installing you may for example set up Different storage classes. We have a development example here So in this case not really concerned about the copies of data, but wish for it to be highly available Therefore pot is killed on one node and it's starting on another that data is still available for production we might have a requirement to take this a little bit further and Create two replicas taking this further again top secret You could have your two replicas and in addition to this you have data encryption at rest Lastly you can have something like archived where we could utilize Compression with a single replica by doing this you've essentially created four named components that greatly simplify use your users just make reference to this and Ultimately, they get usable storage, which most of the time is typically what they care about the most but the specifications on the back end meet the Requirements that align to your application and Organization setting up a storage class is very simple and the following is a template and it's just some minor changes to add parameters This is the production example and you can see here that we have two replicas One setup you utilize this in Kubernetes as you'd expect So this example here is literally taken from the Kubernetes docs And the only change is to the storage class name and I did benefit with this You can remove the need to create persistent volumes yourself You just do the claim for what you need and storage OS would manage the persistent volume and its relationship to the claim increasing productivity and Kubernetes usage Many organizations won't move their applications to Kubernetes or the cloud owing to compliance reasons for data at rest so this is especially prevalent with financial and healthcare organizations and With storage OS encrypted data at rest is a single parameter change to the storage class When we use this all of our data is encrypted We actually have a full blog post showing this functionality in detail where we actually look at a low level and Prove that the data is encrypted on the underlying node So please take a look at that if you're interested with an effective data plane Running persistent applications is literally as you would expect Kubernetes to do so you can see in this example We have a mysql pod and all we're doing is specifying the volume data persistence works as you'd Expected to I mentioned earlier about how we may architect an application when we're looking at it from an ephemeral viewpoint However with storage OS we can look at this in different ways So for example, we have the opportunity to use Kubernetes read write many volumes therefore all nginx pods could access and share the same volume and an update to the data within that volume means that all nginx containers are also updated removing the design of having to destroy or recreate pods to initialize an update for a website Redis natively supports persistent storage, which rapidly improves recovery time So with a persistent data layer you could kill a container running redis it could be redeployed elsewhere and The in-memory keys are repopulated from the persistent storage and again We have a detailed reference actually showing this for those who are actually interested Lastly other areas such as GitOps are easier to implement with the correct framework So take for example where we spoke about storage classes. I mentioned production development top-secret and archive There is however nothing preventing us having a storage class per application So with this approach the application declaration stays exactly the same But we can vary the environment to suit the needs So let's say for example, it's a financial application that oint compliance has to have encryption when running in the cloud here the deployment can be managed with GitOps and the Deployment adapts accordingly to the environment So in this case, we would have the storage classes my app one both on-prem and in the cloud for the On-prem version, this would actually be running with two replicas for the cloud version whilst it has the same name we could have this with two replicas and encryption and From a declaration perspective the application declaration stays exactly the same and then you can configure your CICD execution jobs with it to execute accordingly Okay, so with that covered, let's have a look at a demo Okay, so I'm here in my terminal and if we take a look I've got a six node Kubernetes cluster So this is just a standard Kubernetes cluster built with cube ADM We have the control plane master as the first node then we have five Worker nodes and if we just take a look There's actually not that much running here This is just as I say base cube ADM cluster and as I mentioned earlier I've actually used we've works there for the network and In this I have some examples that we're going to actually look through So the first thing we're going to do is we're going to install SED so storage OS uses an independent at CD for its configuration data and Here I'm just using the convenient at CD operator When actually running this in production, we recommend that you Configure a high availability at CD cluster, but for evaluation purposes This is absolutely fine and you can see at the bottom there We have the end point or at CD access So I'm going to be making reference to this as we actually go through next I'm going to install the storage OS operator and This is quite simple and as per other examples like I said with waveworks This is just a cube CTL create and we're actually doing this straight from the github repository for storage OS and That will actually go through in the background and that one Execution will set up the infrastructure that's actually required for storage OS Okay, and next we're setting up the storage OS credentials So here we're just using and it's nothing special. It's just a kubernetes secret and within this We've actually got the API username API password and other settings that are used by storage OS Now you'll notice there that in this case all of the values are the same and if we just take a look at this All of these are set to the value of storage OS So if you're using this in evaluation by all means actually keep this as it is But when you actually use this in production set this to something which is a little bit more secure Next we're setting up the storage OS cluster. So for this we're using the storage OS cluster custom resource definition that was actually set up and within this we have a Reference to the version of storage OS that we're actually installing and There in that KV back end you can see the address is the Xcd endpoint that was actually set up in the very first set from a Resort is perspective we have a very very low footprint there and you can see that quite simply for this we're just requesting 712 megabytes and the use of one CPU core Okay, this is a Convenient script that I have which just literally waits for those components to be available Okay, so now we've created our storage class So as per where I actually showed there as we were going through the presentation Much of this is actually boilerplate and then within the parameters section We have the references for the number of replicas, which we've got to two So that means one primary copy of data plus two additional replicas so you've actually got three sources there and Encryption is set to true If we take a look at the storage classes, we have two listed there We have fast, which is the default storage class set up by storage OS So where I was actually going through those different ones the development production top secret and archive This is equivalent to the development storage class that I mentioned Then underneath this we have top secret Okay, so we create a mysql PVC and Persistent volume now you'll notice they I'm only actually specifying the persistent volume claim and For the storage class name. I'm making reference to top secret and again This is the example that has just been taken from the official Kubernetes documentation. There's nothing really to this It has a request there at the bottom for 2 gig of storage Now if we take a look you can actually see that when we do our cube CTL get PV and get PVC we have both a PVC and the persistent volume and this is actually there it's connected to top secret as desired and The UUID which has actually been Generated by storage OS matches for both the persistent volume and the persistent volume claims So it's a nice and convenient way of actually tallying those up and This is an encrypted volume So if you actually take a look at the persistent volume So here we're just doing a cube CTL describe on the persistent volume And you'll notice in the annotation section that we have reference to a secret name now this is just a standard Kubernetes secret and if you actually wished you could actually remove that secret and that would actually render access to that Data as inaccessible. So it's a great way for Managing and maintaining encryption within a Kubernetes environment Okay, so what I'm actually doing here as I mentioned at the start This is a six node cluster. The first is the control node. So that's not actually Schedule then we have five nodes five worker nodes and For the example here what I'm actually doing is I'm Untainted one of the nodes and I'm gonna taint the others So this will actually force the Kubernetes scheduler to schedule any workload that I actually do here to KH2 and I'm doing this implicitly so I can show high availability of data So let's create a MySQL pod. So This one is pretty straightforward what we actually have there within the Container section you can see we're using the image of MySQL and version 5.7 We've got an environment block MySQL allow empty password There's a port section where it's actually just listening on the standard MySQL port so 3 306 and we have volume mounts which References Valib MySQL, which is the standard location for MySQL and at the bottom there We've got the volume section where we're actually pulling in that Persistent volume claim which is running with storage OS We take a look Okay, great. You can actually see there that that is running and as desired This is actually running on k8 to and what I'm gonna do. I'm just gonna check the logs on that and Great, that's where I was looking for at the bottom there. MySQL D ready for connections and it's actually showing the socket Okay, so there we're just using the kubectl exec and We're passing in a chunk of SQL data So we're creating a database called shop. We're using that shop database create a table called fruit Within that we populate that with some of my favorite fruits and right at the end there We're just doing a query to actually prove that that data is there Okay, so now we delete that pod and What I'm doing now is I'm adjusting those tints that we actually used earlier So I'm implicitly tainting k8 to and I'm untingting k8 free and then I'm tainting 45 and 6 so next time that scheduler runs the only one it can actually schedule to is k8 free which is a different node to where we actually executed previously and We recreate that mySQL pod and we just give that a moment to actually start Okay, so if we check now We can see that that's running again. I'm gonna just follow the logs and great that's ready for connections and Let's actually check if our data is still available Okay, so I'm running a check there against that and you can actually see there It's run a cube CTL exec mySQL and we're running a reset query cache And we're doing a select from that Shop database and the fruit table and you can see there that all of our data is Still available as Expected so this really highlights how simple It should actually be in this whilst this is an example for my SQL you can apply these same kind of examples to legacy applications or other applications that use operators and have those storage class references and What I'm gonna do now we've spent a bit of time here where we're actually looking at the CLI and we've been working at a Kubernetes level Let's take a look behind the scenes with in storage OS So I've used the convenient cube CTL put forward I love this feature for three's and I've just actually forwarded that UI interface of 5705 to the system that I'm working on k8 one So if we take a look at this in a browser and if you recall I set up the credentials here as storage OS And the password is storage OS and Here we can actually see now that we have the volume that we've actually created So this is two gigabyte volume It has a replication target of two The primary node is k8 to whilst this is attached to k8 free in some instances These will actually be the same but with storage OS the volume can actually run anywhere and can attach to the node We use labels much in the same way that is used in Kubernetes so we follow those standards there and you can actually see that we have these labels So encryption is actually true and replicas is set to two there And if we actually click into this You can see that again. This is attached there This is the primary node for this then we have our two replicas So the first replica is running on k8 free and the second replica is running on k8 six Okay, great. So I hope you enjoyed that and As you actually go forward and deploy your Kubernetes clusters I hope after seeing this you consider a persistent data layer as part of your standard Kubernetes deployment as you can see here It really makes life a lot easier in different respects and it will really help your Organization to actually grow its Kubernetes usage. So some further reading here We have the performance benchmarking cloud native storage solutions. So this was a benchmarking test of ourselves and other persistent storage solutions That's a very good read. We have the Civo's lightning fast managed Kubernetes development and deployment resource. So Civo are another provider who actually offer a Cloud managed Kubernetes solution. It's really really cool and they actually use storage OS on mass so some great resources on on the internet where they actually show that running on bare metal on huge clusters definitely worth a look Some other Areas that we've actually got there. We've got the platform architecture overview. There's the documentation and If you actually want to reach out to us and chat to us a bit further We have a dedicated slack group So by all means please come and speak to myself come and speak to our other engineers And we'd be happy to get you started with persistent storage on Kubernetes