 First, thank you for joining us, so the whole premise of this session is going to be a workshop. It's going to be hands-on. We've got some labs out there in the cloud that will give us the ability to walk through some of the storage fundamentals and then some storage troubleshooting. Obviously the name of the session is what went wrong with my persistent data. First of all, introductions. I'm Michael Cade. I'm a technologist at Veeam software or more focused on the cloud-native world. I've been at Veeam looking after data protection storage for the last eight years and before that I was an infrastructure admin focusing on things like net-app storage, virtualisation. So I'm hoping I can bring some of that storage operations into the session and I'm joined with Lee. Yes, hi. My name is Lee. I'm part of the engineering team. And I'm pretty new to Kubernetes and cloud-native in general. I worked with Bosch before in the automotive sector and I have just joined Casten one and a half a year ago. So on the same begin the boat with a lot of folks here, I think. I think that's a trend between everyone that we're speaking to over this week is we have some developers, we have some DevOps focused engineers and we have some operations here. Whereas obviously going back, Kubernetes or KubeCon on cloud-nativeCon, it was probably more focused around developers to begin with. And now we're seeing that trend of people having to look after storage, provision storage. So that's why we wanted to start that, bring that workshop to everyone and get people hands on and start looking at it from a 101 perspective. So, I mean, if you're a storage admin in a Kubernetes world and you're running in production, it's probably going to be stuff that you already know, but we want to get people hands on and understanding what volumes are and everything around the Kubernetes storage space. The other call out that I want to make is Matt Beta. He's the labs that we've got built. He's worked tirelessly on creating those. So he's a huge, huge asset to what we're delivering today as well. OK, so I put this tweet out a couple of weeks ago, Kubernetes storage is easy, right? How many people think Kubernetes storage is easy? Cool. There's probably more sessions that you need to be in. But I think when it comes to Kubernetes storage, you can see just surrounding this, like the word cloud, there's lots of different factors and the storage that maybe started with Kubernetes and that evolution. There's a lot of different term, terminologies, constructs, objects that come with that, that storage situation. So it becomes a little bit overwhelming and that's what we're hopefully going to break down over the next 90 minutes with some hands on, but also some theory. Now, the big thing that we've done with our presentation is we've made it. There's a lot of information in there. There's a lot of wordy slides. I'm not going to bore everyone. Everyone can read these. It's actually in shed so you can you can go and find that presentation now and use it afterwards. The whole point is we want to race through it, not race through. We don't want to go that quickly when we get you the fundamentals of storage and then we're going to get into some troubleshooting around that. So to begin with the session flow of what we want to go through is let's talk about volumes within Kubernetes, the different types of volumes, what they do, what's the use case for them. Then we're going to talk about persistent volume claims. We're going to talk about storage classes, provisioners, volume plug-ins and kind of set that stage of these are the fundamentals when it comes to Kubernetes storage. And then we're going to start get some hands on with those fundamentals and then we're going to come back and Lee's going to cover some actually pulling them all together and deploying an application within a Kubernetes cluster. And then the spoiler alert is that might be some issues with that. So we have to go and find out what those issues are and together we can walk through that and some of the troubleshooting techniques that both Lee and myself would would go through from an ops and a developer point of view. And then if we've got time, but again, these slides are in and we've built labs around these. If we've got time, we want to then start looking at storage performance and the protection of storage where there's storage. There's generally data. How do we look at protecting that data? So if we get there, great. If we don't, that's fine as well. Everyone will get access to the slides afterwards and the labs that we've created an instruct. We've also made available in a GitHub repo that will share that we've written to be. You can run through it using MiniCube or basically any Kubernetes cluster that you have access to. So I think to begin with, what is the requirement around Kubernetes storage? Why do we need state or what is state when it comes to Kubernetes or any data service within Kubernetes? And like I said, there's a lot of words on here because they should be able to stand alone these slides. But what is state is basically something that's relied on when that that pod comes up. We want that database, for example, to be consistent whenever the next pod comes up or whatever that lifecycle looks like. So we have the concept of stateless applications, and that's generally where Kubernetes began. Web servers, web farms, spinning up and giving us the availability. A container has everything it needs inside all of the constructs of that. It doesn't have a database associated or maybe that database is associated outside of the Kubernetes cluster. And now we're seeing more of that trend around people bringing databases. You can see on the show floor the amount of database providers that are out there. You can see that that trend is people are now putting workloads within Kubernetes. So, as I mentioned, stateless workloads is kind of where we started because there wasn't the API, there wasn't the construct within Kubernetes to allow us to store stateful workloads or at least not in a very distributed system manner. So what we wanted to do was kind of highlight around, okay, from a stateful point of view, the lifecycle of that pod, the upgrades, the updates to that pod being refreshed. How do we make sure that that database is the same database when it comes or the data is still accessible from that perspective? So just a short demo and hopefully this won't play like that. Hopefully it auto plays. But the whole idea is that we can create a pod with a Postgres database in a non-persistent fashion and we can write some data to that database. Obviously, if you think about a database, obviously that's storing our important data in some most fashions. This is going to be mission critical data within our environment. So we can go and create that Postgres database. You can see here that this is a deployment. It's a pod that we're going to spin up. All good stuff. It's going to use a config map and I'll get on to those as storage types and volumes shortly. And we're going to create a stateful set that gives us that ability to spin up Postgres in Kubernetes. Great stuff. All good. Except for when we delete that pod or make a change to that pod and it refreshes. And the spoiler alert is that the database or the tables that we've created within that database are not going to be there. So that's why we need that requirement around storing stateful workloads within our cluster. So I've jumped ahead pretty quick here. So we've got our Postgres pod up and running. Great stuff. We can then exec into that pod and we can start then interacting with Postgres. We can create our tables. We can insert data. You can do all that stuff that we do with a database in terms of creating data, storing data. So let's create a table, kubecon 2023. Great stuff. Now we can start adding our data to this. If we exit out of that pod and then we go and delete that pod though, when that new pod comes up in that desired state, it's not going to have access to that data. That data is now going to be gone. It's going to be a brand new Postgres pod and that table is not going to be there. So in a long-winded way, this is just showing that you can run any application in Kubernetes that requires state. But you have to have that knowledge of how do we store that database within our cluster. So the pod has been reprovisioned. A new pod has come alive and we're now, there's no tables in there. So obviously what you'd expect, you can go back and do this from a Docker point of view. If you haven't assigned a volume or a container point of view, if you haven't assigned a stateful data, sorry, a storage unit for that, then obviously that's what's going to happen. So different types of volumes within Kubernetes and we generally will spend more time on the persistent volume type of volume, but it's important to note that there are other volume types within our Kubernetes cluster. So when we look at ephemeral volumes, ephemeral volumes gives us the ability to still run that stateless workload and it might be a web scraper that's pulling off images or pulling off some sort of data and storing it just inside of that container, but the lifecycle of that container will refresh that. Again, we won't have that scraped data in there, but that's fine, depending on the use case around that. It might just be that we're populating images from another source and then we're going to build out our application that way. So again, if we think about a web server and maybe images are stored elsewhere, we want to scrape that, we want to bring that into our pod and every time we do that we want to make sure that it's available to that, but really there's still ephemeral, we don't really care whether they live or die, we're still going to have access to our data, but we might have to scrape that again. The next one is around projective volumes, so think about secrets and config maps. So a secret being I need the keys to get into something or I need the username password to get into my Postgres database, so we might store that in a secret within Kubernetes, or also config maps gives us the ability to look somewhere else. How do we get to another service and have that capability of being able to access that within there? And then the one that we want to spend more time on is around persistent volumes. So if we think about persistent volumes from a storage and operations point of view, let's say that we've got our shared storage, our NAS device, our SAN device, and within there we're going to create lungs, we're going to create some sort of block or file-based storage, and then persistent volumes is a pool of that storage that we exposed to our Kubernetes cluster that now enables us to leverage that within our application, and that could be a different pool of storage for different applications, and I'll get on to storage classes later on as well. There's different types of persistent volumes in terms of probably more commonly, we're seeing now more CSI, but we still have to remember that there are other provisioners or persistent volume types that we have within an available to us. One that we're going to be using, and this is never going to be a best practice, is around the host path and Lee's going to touch on this later as she gets through the troubleshooting aspect, because the host path is just obviously tied to that particular node within your cluster. If you're using the raw disk from the Kubernetes node, then that pod can't be anywhere like, if the pod is on node two, it can't access the data from node one. So a host path is great for demos and workshops and stuff like that, but when we get into production, you definitely don't want to be using host path, you want to be using one or the other, whether that's built into Kubernetes, or whether that's using CSI, which allows us to, and I'll touch on CSI shortly as well. Okay, so the operation side is we're going to look after the storage, we're going to create a persistent volume, this is a pool of storage that's available to us to then use, and then it comes over to the developer where we're going to create a volume claim. My application needs a database of x, y, z size, and I need that to be on a specific storage type, or maybe I don't. Maybe as a developer I don't care, I just want my database up and running for my application, which is generally the case, right? So what that gives us is this claim will then go and look for those persistent volumes and take a chunk of that, or it'll find a persistent volume that suits the needs, the desired state of that persistent volume claim. And then again, quickly moving through the storage classes, is think of this as the type of storage that you're exposing, whether it's using gold silver bronze here, so that could be fast, medium, slow storage for different workloads, and the persistent volume claim can tie into one of those. A developer might always say gold, or they might say silver, but this is where we start talking about things like storage policy-based management, about how we can define what each of those storage classes give you in terms of functionality. And then provisioners kind of tying in with the persistent volumes is how do we... So the history lesson here is before we had CSI, and again I'll touch on that shortly, is we had entry provisioners, where there was something called flex volumes as well in between, but ultimately this gives us the ability to leverage a provisioner to give us that persistent volume or that storage that we have underpinning the cluster. So entry was there. The downside to entry was that although Kubernetes is released on three releases a year, it meant that the storage vendors had to put in the PRs for their storage and their supportability within that Kubernetes release. So every time Kubernetes was released, you'd have your new bit of code, but obviously Kubernetes is not just about storage, there's obviously other PRs going into it to make changes. There's a lot of QC, QA going on, on to that. So it made it slow for any storage vendor to be able to give the functionality that they required within the entry provisioner. So then along came CSI. CSI is quite mature now over the last couple of years. What this gives us is basically a consistent API across our Kubernetes cluster for our storage vendors out of band to be able to create their API calls to this CSI API set of instructions, and we tie into that so that we're not now as a storage provider, we're not reliant on the Kubernetes release cycle. So as a storage vendor, you can release out of band from that Kubernetes release and take advantage of the CSI API to provide that. That gives us a much easier way of being able to leverage storage instead of using entry provisioners. But because we started with entry, it's also anyone that's just started in that Kubernetes storage space, you're probably still going to see some entry provisions out there. Namely in the public cloud as well, they still have options to use both entry provisioner, storage types and provisioners, as well as their own now CSI based driver that interacts with that. If we get to that Lab 3 or if anyone raises their head, you'll see that that CSI API is ever evolving in terms of volume snapshots and other functionality about resizing persistent volumes, etc. on the fly as part of that cluster. The other thing that I used in that first demonstration was around staple sets. So staple sets gives us the ability to provide a bit more of a pet naming convention to our pods that we have. So you'll see that Postgres was Postgres-0 and if we were to scale that up, we would have Postgres-1, 2, 3. And what happens after that is if we're making a change, we're updating our pods or our staple set, it would do it in a uniform way of being able to say, okay, we're going to get rid of 3 first, then 2, then 1. Whereas from a deployment perspective, you're going to have a long naming convention and really we don't care. Let's terminate all the pods at any given time. There's no real order to that. So which is why the best practice is to use a staple set when you're using data services. And there's a table here that goes through some of them commonalities. The difference is between a staple set and a deployment. Now, I've seen people using deployments to look after their staple workloads. It's possible. It just means that those pods, it means that it's quite erratic when we start doing anything to those pods in terms of updates, in terms of what we're doing there. Okay, so let's check on time. Perfect timing. So what we want to do first is get you on to the lab, so we're actually a big shout out here as well, is that we didn't really want however many people are in this room to be downloading container images because that would only end up in a bad way. So we're leveraging instructors as a service for those isolated sandbox environments and we're just building out the steps in which to walk through some of the things that I've just mentioned. What we're going to cover in this first lab is generally around those different volume types, different storage classes, and then we'll go into lab two, which is focused around troubleshooting that. So if you want to, the code on the left-hand side, the QR code on the left-hand side is what's going to take you to that platform. And then if you want to walk through the lab instructions on the right-hand side, and we'll walk through it as well, but I just want to give you time to go and take that in, get hands on, and start building out some of those scenarios. Now, I know it's being recorded as well, but also the one on the right-hand side is a GitHub repo where you don't have to use Instruct. You can use Minicube on your local machine or really any Kubernetes cluster that has some storage associated to it, so anyone can walk through that in your own pace as well. So with that, if everyone wants to dial in, if anyone gets any issues with that, if you raise your hand, we'll come out and try and help you get onto the platform. But we're going to just leave you for a few minutes to see how that pans out, and then we'll get back into the troubleshooting. Yeah, a bit of housekeeping as well. There's a bunch of seats in the front here on the left with charger. So if you sit in the back with no table, feel free to move around so you can have space for your laptop as well. So hopefully everyone's at this similar point if you're walking along, and you'll have a list on the right-hand side of the screen where there's a few scenarios that we're going to walk through based on some of the volumes and the storage classes that we just covered in the slides. Yeah, if anyone has any issues, just raise your hand and we'll try and help. For folks who just got in their seats in the front as well with the table and power bank. Also, if there's any questions about what we just covered while you're walking through the lab, then I think there's two mics down the tracks if you want to go up to them and ask any questions. When I was talking about projective volumes as well, and I specifically mentioned config maps and secrets, you'll see in the text that we also talk about to others. One is downward API which is used to provide cluster environmental details. Anything about pods or the containers such as things like names or things like annotations, it allows us to leverage that as well as part of that, as well as service account tokens as well. There's a little bit more detail in there that you probably didn't see on the slides as well. I think we wanted to keep it quite high level instead of going into the detail around all different volume types. Yeah, yeah, so just hopefully everyone's following along but basically what I'm doing here is I'm following through the steps. I've just created that ephemeral volume which had a scraper by all accounts that allowed us to bring it into our container and then I moved on to the projected volume where I created a secret and a config map and now we're creating a volume that will consume that secret and that config map as that might be a connection to an external database. It might be a secret to access that database. The config map is going to give you maybe a URL out to an Amazon RDS type database that is not obviously stored within the Kubernetes environment but it makes it aware of the application. It allows us to leverage that from the application within Kubernetes and what we're running in this command is actually just going to talk. So we're exacting into that pod that we've just deployed and then we want to understand what that secret is and how we can consume that inside that pod. So start thinking about how that secret is pulled from Kubernetes into the pod and then use to authenticate maybe into our database and then we're also doing that from a config map knowing where we're going to connect to from that. So you can see here that true super secret lab, the sky equals blue, just different data points that we might need to use to access other resources. Obviously we're not connecting to an external database in AWS but this is a scenario that it could be. And then the third volume type being that persistent volume. So we're going to go and create that persistent volume manually using this YAML. It's called MyVolume. It's going to have a capacity of 10 gig and this is actually using the host path. Like remember what I said about in a lab environment this is all good but if we're going to try and share storage across multiple nodes, Kubernetes nodes then we want to consider one of the other persistent volume types. Otherwise you're going to be in for a shock if your pod spins up in a different node and you won't have access to that underpinning persistent volume. So that persistent volume is that that pool of storage that is available to us from the host path here but really that could be any any storage type. If we go and describe that you see all the details of where it is, what it allows us to do. So in terms of access modes read write once, read write many. It's a file system, it has a capacity of 10 gig and it's using that host path type there as well and access to the the mount point that you see there. So then we're going to create a persistent volume claim. So the application now you can see here that it wants, in fact let me copy it first and walk through what persistent volume claim is asking for. So we're going to call it myPVC. We're going to deploy that. So another thing I didn't mention was that a persistent volume is cluster a cluster wide resource but a PVC is specific to a namespace. Now this particular PVC is going to be the size of 4 gig. We're using that persistent volume that we created called myVolume but obviously that leaves 6 gig free in that pool of storage. Now if we had lab 2 and we wanted to create another application that leveraged that same persistent volume for whatever reason that may be we could obviously go and consume that with 2, 4, 5 gig of data. Obviously if that goes over that that persistent volume won't it won't satisfy the desired state of the persistent volume claim so it will be stuck in a in a position where it can't bind that data or that persistent volume claim to a PV. So if we now go and look at the PVC within lab 1 you see that it's bound in fact there's two commands that I just ran so the first one being the myVolume you can see that it's bound you can see what it's taken up or what it's been used by here with the claim didn't need to copy that PV. Notice that it's not anything to do with a namespace so we're not assigned to a particular namespace as a persistent volume but you can see here as well that it's also bound so both of them are married up now so we have access to the ability to then use that within a pod to consume that storage so next up we're going to create that pod it doesn't take that long we probably don't need to watch it but we can jump into that pod now and we want to get the output of no we're going to copy that file into our persistent volume claim a simple file operation we on our on our node we're then going to go and see what we have actually copied you see that we've we've got that detail that we've copied to that file within the persistent volume claim let's delete that pod so one thing to make note is the host path that we have on the PV at foo and the mount path that we have on the on the pod is at bar even though the data in the data is the same the mount path on the pod sorry the mount path on the pod is the directory from within the perspective of the pod so it's going to be a different directory than what you have on the host system we purposely distinguish those out to give the understanding that those are the same data but two different directory depending on where you're coming from the pod perspective or you accessing the data from your from your host yeah exactly that so then we're going to delete that first busybox pod that we created and we're going to create a new new pod that will have access to that same persistent volume claim and the spoiler alert is that detail should still be in that that new pod that we've just created so if we exact into that let me clear let me have that password put there and then it goes on to the storage classes to storage class remember being that different type and you can but I use gold silver bronze because that's an easier way of describing fast medium slow or something along them lines but this could be something around capability it could be around I only want my databases on this particular storage class I only want my messaging queue on a different storage class I don't want it consuming like my most expensive storage for example so if we look at the storage class that we have available to us we have we have two we have the local path which is what we've used so far and we also have a CSI host path which again is still host path but it leverages some of the functionality that you get with the CSI functionality that I mentioned I'm pretty sure that that wall of text tells you that so what we're going to do is we're going to go and create another persistent volume claim and with this we're defining what that storage class name is here and we'll touch on dynamic provisioning as well in a bit I'll leave that one to you Lee because you're going to use that let's send a pod into there to consume that PVC again defining what that storage class is as part of that so you can see that there's lots of different options about how we consume the the underpinning storage so what Michael is doing here is called dynamic provisioning in the previous steps we created the pv and then we bind the pv to the pvc and then use the pvc to mount with the pod but what he's doing here is dynamically requesting the volume through pvc request that it's going to talk to the storage class and so we're going to have a driver that's going to create the volume for you so we don't have to worry about creating volume ourselves so now if you remember we created a persistent volume manually using that yamel file and now we've dynamically created this next one that has this you can't really see that name but hopefully if you're running through that you'll see that we've dynamically created a different named pv called pvc- that and that's that's the dynamic approach to deploy or leverage in storage and something we didn't touch but we can talk about it now is the storage class have a lot of different specs on how you would like your volume to be provision and the one thing we did differently in the storage class is we having the binding mode to be wait for first customer versus immediate and in the previous step we had take a look at the pvc sorry, taking a look at the pvc it's pending instead of bound before our pod was up because the customer which is the pod who's supposed to consume the volume wasn't up yet so with the storage class you can specify more than more than just what drivers you use you can also do what kind of finalize you want you can do sorry you can specify what kind of volume mode you want and then the request going to be fulfilled dynamically as you create the pvc that could be let's say you're in a managed cloud or managed Kubernetes service that only maybe has one storage class available so actually you've got free gain to go and deploy your persistent volume claims in that time so it just dynamically creates them against that so as everyone walk through that lab now a couple of thumbs up so hopefully if everything's done correctly you'll hit next we'll get a big green thumbs up saying well done and then I'll hand that over to Lee to talk about the troubleshooting side and we can get into hopefully playing some games yes so this is where the fun starts so in the previous section Michael have walked us through some of the fundamentals we have walked through all the labs and they just worked all the configuration was correct you shouldn't really having any problems spinning everything up and now in the next step we're going to take all of that and create a full on application so I will go ahead and the application that we are going to deploy today is going to be Pacman and if everything works we should be seeing this on either in the instruct lab or when you do pull forward which I'm going to walk through it a little bit so before we diving into getting hands on I want to give a little bit of application topology to understand what is it that we are working with so this is the high level topology of the application we have the front end Pacman app that is written in Node.js that's going to talk to a backend database and the backend database is we're going to be in MongoDB so there's going to be some scripts that is going to be injected in the format of config map that stores some of the startup script for the pod there's also some data obviously so there's going to be some pvc and pv and the database is going to be a little bit outdated but the database is going to be a stateful set and the app itself of Pacman is going to be a deployment and there's going to be communication between the two so there's going to be a database service that is cluster IP because it's just within the cluster we need to talk to each other and then we're going to have to access the app from outside so Pacman is going to have a service of Node of Node port and so our task is to deploy the application the premise is all of the resource manifest of the topology app just described should be available in lap 2 if you are using the GitHub repo it's in the lap 2 folder if you are using Instruct it's going to be in the Instruct Lab automatically for you and this is a bit of a mapping on the YAMO files you're going to have with the topology and there should also be the same table in the Instruct Lab and also in MiniCube and so our task is pretty simple deploy them then access the game if you are using Instruct the game is going to be available in the tab in Instruct already you don't have to do anything Matt have taken care of that for us if you're using MiniCube then you do have to do a little bit of port forwarding so these are the command you're going to use and then the game is going to be available in your browser and I think we have alluded to this already but these manifest not going to work at the box for us this is why we're here we're going to troubleshoot and the troubleshooting part is going to be on your own we're going to walk through it but probably not going to be immediately so the premise is that we're going to give you guys maybe 15 minutes to play around if you have a problem particularly you can raise your hand we can walk to you but it's a learning space when I want to have the chance to play with it figure it out there's going to be more than one solution so we're going to discuss this after 15 minutes I'll walk through what are the problems what are the solutions we go back and forth on what is the implication of the solutions without further one more thing is this is a we're taking it with the premise that this is going to be a one-on-one lab so as we're going through I'm probably going to talk a little bit more about the commands that we're going to use for debugging as well if you just look at the application it doesn't work and you're like oh my god what did I do we're going to talk from that perspective as well so this is a solution so you can see this is available for you as a short table but as also the cheat sheet from Kubernetes documentation it's available as well in the REPL and I think it's an instruct as well okay so now choose your platform of choice and I think we'll give it 15-20 minutes from now so I'll assess and we'll see where everyone is in the room if you finish early, congratulations you can play the game or move on to the next lab if you want to and again if you have any questions anything raise your hand and we'll come to you okay let's the fun begin does anyone have any problem getting into the lab I'm just going to put this table back up so that everyone has that as a reference point if you go to the instruct lab there should be a text editor you can save the file by hitting the floppy disk can't believe it yeah contrast won't work and if you have a lot of tabs open on the text editor maybe you won't see your files just close off some file you're going to be able to see the symbol how's everyone doing any questions at all deep in thoughts anyone playing pac-man how's it going Julia have you fixed it are you playing pac-man okay by a shot of hand who have completely solved the problem awesome also by a shot of hand who like 5 more minutes to work on this before okay we'll give you 5 more minutes of silence before I jumped in we should have definitely thought of a prize for the highest score on pac-man shouldn't we we should okay so I think I'm going to go ahead and start diving in and feel free to keep walking working on it still if you are really really close if not feel free to follow along I again mentioned that this is a 101 lab so I'm going to take a little bit of time to talk through the commands that I'm using as well so we already know that it's not working and generally so I'm going to talk about the perspective of what I would usually do when I run into a cluster and something is not running what kind of debug command I would use and generally I would like to see the status of my cluster and see what's my part where's my how it's doing, what's going on so I usually do that by um watch, cube cuddle get all and then give the namespace that I'm working in which is lab 2 which command refresh the command every 2 seconds so I can keep the status life as I'm working through and they get all does not really get you all it give you deployment replica set, give you services, give you parts but it doesn't give you PVC or PV or any other resources so there's something to be mindful as well but for the high level I'm going to start with this and I see that oh my parts are not up and running I see that the Pacman MongoDB and the pending and the Pacman Pacman is crash loop back up so I'm going to go and go ahead and check out the part Pacman MongoDB and see what is going on with the part and the command I'm going to be using for that would be cube cuddle describe so the describe command give you a human readable status, it give you all the specs of the part as well as the events happening to the part so it's really useful to have all of the information in the human readable human readable format so I'm going to go ahead and do that I have to say what resources I am describing and the namespace okay so I see that I the part have failed scheduling and the reason is because one part has unbound immediate persistent volume claim okay then the next step I'm going to go follow this path and check what is happening with my pvc and again the get all doesn't really give you the information so I'm going to go ahead and check manually here and pvc are namespace resource so when we do a get pvc we do have to provide a namespace okay I see that my pvc static pvc is pending so I'm going to go ahead and go down the path and check what is happening with my pvc and I see a volume mismatch so pvc the controller is telling me cannot bind the requested volume because they requested pvc too small and the volume being pinpointed here is static pvc so I'm going to go ahead and check again what is the spec for my for my pvc another way of getting specs that is on another way of getting the spec is using the command get so I'm going to go ahead and get my pvc static pvc and pvc is non namespace resource so I don't have to provide a namespace and I can specify the format that I want to get the specs on so I can go ahead and say oh Yemo then I see my spec here this is my persistent volume and I have the storage capacity of one gig so I'm going to go ahead and double check my pvc and see what was the capacity that was being requested and it is too so here we have a mismatch in volume capacity so I'm going to have to repovision my volume to match my pvc and the spec of pvc are immutable so we can't do an edit what I have to do is going to have to delete the pv and then change the spec and then recreate the pv I can do that in my let me go ahead and delete it first okay I'm going to go ahead and change my spec here make that two save it and then apply pvc no need for this okay my pvc is created I'm going to go ahead and check let's see okay so I have my static pvc the capacity is 2 gig but it's still available okay so I'm going to go ahead and go back here and check my pod is still not up and running and my static pvc is still not bound so I'm going to go ahead and check the pvc again and see what is it doing oh by short of hand did everyone got through that first problem awesome all right no I want to know what event happened to my pvc so I'm going to be describe instead of getting the spec okay so we see a new event from the controller cannot bind the requested volume because the store class name does not match all right so let me check here I'm going to go ahead and double check the spec between the two again my static pvc so I see here that you can't really see here I'm trying to find the best way to show you guys the store class that pvc is using so this is very clear but the store class for the static pvc is actually empty and this is because we have manually provisioned the volume so I'm going to go ahead and check the pvc okay and I see that the store class name here is csi host pass sc and so there is a match now one thing I want to point out is that we double check again what we use for the static pvc we actually do not have a store class defined here and that is then this is to point out that if you leave out the store class name for pvc the default is going to be used the it's going to be injected as you create a request for the pvc that the default store class is going to be used and so if you if we go to check here what store class we have in the cluster we see that the csi host pass sc has a default annotation into it and this is an annotation for the store class itself guess sc you can see there's an annotation store class kubinedio is default class true and this is to tell that this is a default store class which is going to be used as an empty store class name this is to point out be mindful about all of some of defaults and some of the automatic injection that could be do for you on your behalf in this case you have to be mindful about what is a default store class of your cluster so to remedy from this we're going to go ahead and specify a store class for our pvc and make it empty because we are doing a manual provision and again the specs for the pvcs are immutable so we have to delete the pvc and then recreate the pvc and also by short of hand have anyone gone through this step awesome okay we'll go ahead and delete that and then make sure I have a store class name in here you can have a type of watching you okay I'm going to go ahead and double check my pvc it's bound yes this is what you want to see when you work with pvc and pvc you want to see it bound and I'm going to go ahead and double check the status of the pods and I still see that MongoDB is running to error and it just restarted a second ago and it's crashing loop back off so I'm going to go ahead and check again what is the what is going on with my pod again I'm going to use describe okay well all I'm seeing here for some reason the pod is failed to restart and it's from here I'm not seeing any more information of what can be wrong with my pod so I'm going to go ahead and switch to a different command for debugging and that's going to be I'm going to be using the log I'm going to log my pod and see hopefully there's some kind of debug log that's going to give information about what's going on with the pod okay I see here a permission deny issue by for creating the directory bitnami MongoDB data by short of hand who have gone to this point okay by short of hand who have gotten over this point awesome so actually I want this to be a bit of an interactive session at this point because I think there's multiple ways of solving this so for folks who have gone over this you might sharing what you've done so far and how you get over the permission deny issue if you would like to if not that's fine too can we turn on the mic yeah okay thank you the easy effect for me was to check where the local data is stored that was in the slash data pod and to easily check what permissions or what user is running I just did chmod 777 we started the pod and then changed the ownership from there and then the pod was running and I played Batman awesome thank you did anyone have a different solution than that oh did anyone hear the solution one over there lady I changed the storage class name to the default storage class okay does anyone have a different solution than that let me repeat the two solutions so far that we have had in the crowd is one is a manual way he went into the local directory he changed the permission of the of the data path to 777 which is freefall anyone can access it and then the pod is able to access it and the gentleman on the back have a different solution which is he changed the storage class name to default and he go the dynamic provision route and there's also I think we've had two more solutions if we try so Michael what do you do yeah so I went into the pod spec and just changed it to run as user whatever yeah I think it could be any number at that point which is kind of hacking the ability to get to that probably not the best practice when it opens up the door to anyone being able to access that file at that point so yeah so the third solution is that he sood all it he escalated the pod to have privilege of the route and then you can't just go in and also let me show you again the pv have the information on where in the host the data path is so what I need to check is the static pv folder in the data path so we see static pv here so I'm going to check the permission so when I first saw this I'm like okay what does this mean some folks here might remember this by heart I don't okay so from here this means everyone can read the directory but its content can only be changed by the user and we can see here the user is root and owned by group root it is root and also owned by the group root so his way would have worked because you changed the group user and the user of the pod itself but might not be the best solution because now you are allowing a pod escalated privilege into your underlying storage system changing changing the permission of the pod to 777 sorry not the pod the pv to 777 is slightly better because you are de-escalating the privilege of the pv itself and the pod is not escalated in privilege at all but if you have critical mission data in the pv you might not want to make that free for all so what I did is I actually changed the owner of the pv to the user that matched the UID of the pod so how do I find that I can see that in the the YAMO file for my state full set if we check out the security contacts here we see that the user for our pod is 101 and also group is 1001 so I'm going to go ahead and change the permission of the volume to just match that or actually I'm going to make the I'm going to change the owner of the volume to match my pod okay let's check it again alright don't ask me who they are but a different route now so I'm going to go ahead and double check my pod status so one thing about crash loop back off is Kubernetes have exponentially incremental wait time as you wait for restart I start with I think 2 seconds and then in 5 seconds then it exponentially increase until it reach 15 seconds so sometime even if you have done something the pod is in the middle of waiting for restart and it's going to take time so what I can do as well I can manually restart it and then I'm going to restart my Pacman MongoDB so I'm going to do a rollout restart for my state full set namespace thank you did I mention you can never type when people are watching you okay we can have a second and then we'll also explore the solution of dynamic provision as well once this is up and running and then see I'm actually curious if anyone seen the permission of the volume when it's dynamic provision by the driver we can also explore that as well but I can go ahead and create the PVC already I'm going to restart the other one too oh okay okay MongoDB is yeah okay so MongoDB is running we'll double check by checking the log as well and then Pacman Pacman is also running so relife now I do want to explore the option of using dynamic provisioning so I'm going to go ahead and delete the PVC but I can't delete the PVC this is if you plan that does anyone know why I can't delete the PVC in this case it's bound and finalized correct so let me double check the spec for my PVC here it's going to be stuck in terminating for a while and that is because of this finalizer Kubernetes IO PVC PVC protection and what this finalizer does is that you are not allowed to delete the PVC until all the user of the PVC have also terminate themselves so in this case we have a running part that is actively using the PVC so it might not be a good idea to go ahead and delete your underlying data so this is the way of protecting your your data as it's running so I'm going to go ahead and create a different PVC okay in this case I'm not going to have the volume name I'm going to leave the story class name empty so the default one is going to be used I'm going to change my name the PVC name to dynamic crap okay good to go okay PVC created again I'm going to double check just to make sure that it's bound to something okay I see that my PVC is bound now the next step for me is to change my stafell set to actually using this PVC and PVC instead of using a static one and for this I can again do a local edit and then apply or I can actually do live edit a coop cuddle edit and then what it does is going to pull in the YAML file to your editor of choice and then you can edit and then save and then the request is going to be pushed back up and then the configuration is going to be applied so I'm going to go ahead and do this here so I think it's a little bit easier to see and here instead of using the static claim I'm going to be using the dynamic claim checking that my claim name is correct okay okay now we watch okay so if we are to take a look at the dynamically created PV for us which is this claim and this is the PV that I'm interested in we see that the path this is oh sorry what I'm trying to do here and we running out of time so what I'm trying to do here is to find you the path of on the host where the volume is created and show you the permission that comes with the volume that's being created and the for the CSI the permission for the volume is also 777 which make the volume free for all and the note on