 You deserve probably an extra beer tonight. Plus it's very hot. I mean, our inside is not too hot. So the session for today is going to be about Kubernetes CIC pipeline for containerized databases. So the topic we're going to be touching on are Kubernetes. So what is a Kubernetes pipeline? It's a pipeline run entirely inside Kubernetes. And for this, we're going to talk about Tecton. We're familiar with Tecton. No one, or a little bit. So we're going to talk about it. We're going to see two types of pipeline. Who is the developer? OK, some of you. So more of a dev pipeline. How do I work with databases in this pipeline if I want to change things? How can I combine this with the rest of my application? And yeah, and the rest is Tecton, as I said, containerized databases. Who is the DBM mean? I don't know if that job exists anymore. I'm always asking, and I never have any hands up. So yeah, we're going to see why, I mean, is it a good idea or not to run a containerized databases? And we're going to talk a little bit about the differences between deployment and stateful set. Who knows Kubernetes is already familiar with the concepts? A little bit. We're going to touch on this at the beginning just to set the stage. So my name is Nick. I'm the head of, I'm heading the DevRel team at Spectro Cloud. I've been working with Kubernetes for the last, six years, doesn't make me younger. Both on the CNI, so more network side, when I was at Cisco, and CSI, I worked for a company called Onda, or StorageOS, anyone who's familiar with this, is distributed storage for Kubernetes, essentially. So let's get started. First, just to set the stage, if you were to run a database, what are the basic constructs you would use in Kubernetes? So you have two choices. Typically, Kubernetes run, oh, I'm happy to have this now. I don't have to point, wait, this is magic. So typically in Kubernetes, you're in pods. So pods can have one or more containers. And those pods in Kubernetes are controlled by a controller. The higher the controller that, you know, if the pods fails, it's reprovisioned. You would just to make sure the right number of pods, or the desired number of pods is running. So we have two types. We have deployment, and we have stateful set. Deployment is a stateless container, or often referred as, you know, cattle. You don't care about it. They don't have any stable identity. If one pod is dead, then another one would be restarted with another DNS name, another IP address, all of that. Deployment have interesting properties in the sense that when you define a PVC in Kubernetes, this is, I want to attach a volume to my pods. I want to store data in my pod. When you have more than one pod, right, they all share, you define the deployment, you define one PVC, and they all share that particular definition. Which means that if you run a PVC in a traditional access mode, which is red, white, red, white ones, not an NFS share or anything, something you like locally to your disk. What that means, it means that the first pod will claim it, will own it, but the other one will try to write inside this one, but it won't be possible. Only the first one will be able to write in this volume. So what that means is that if you need multiple pods to get access to the same volume, then you need something like an NFS share and run, read, write, many access mode, right? This is the only way to make a deployment work with multiple pods who need to access the same data, right? That's stateless, which means that PVC, first off, is the only way for the data to persist. So if you create a deployment and want the pod to have stateful data, you have to use PVC. If it's a deployment PVC, multiple pods, then you need NFS share, which is probably out of the question for a database. That's the point I'm trying to show here, is that if you're running a database, then it's probably a better idea to use stateful set because what you need is a stable identity, especially if you're building a cluster of database, right? Here, imagine this is a database cluster where every pod is one member of that cluster. The difference between stateful set and deployment is part of the declaration of your configuration, so the YAML file. Inside a stateful set, you have, on a per pod basis, you have one instance of PVC as opposed to a single one here. So what that means is that every pod has its own storage volume, own storage capacity, can write uniquely at once, or WO, rewrite once into that particular PVC. So that's one reason you want to run stateful set. As I said, the second one is because when this pod dies, so the stateful set controller will reprovision this pod with exactly the same ID, exactly the same DNS name, and exactly the same pod name as well, right? So it's really the way to provide a stable identity inside communities. Anything that is stateful in communities should run in a stateful set. That's the basic rule, right? So this is, so bear this in mind for the rest of the conversation. This is what that means. Stateful as deployment, stateful, stateful set, hence the name is stateful set. Okay, but at the same time, if you want to run a database into the new world of cloud native, you inherit those properties, like scalability, elasticity, self-healing, observability, all of that is great for its new qualities, new features for your databases because they inherently inherit this from communities because this is what the platform provides. But when you run a database from, you go from a sand or a big array, you still need the enterprise feature that will provision with this array. This is why I ask you, who is the DBA? Because DBA know that you need replication, you need the system to be distributed, you may need encryption. Eventually, because we are in cloud native, you may need self-provisioning and be DevOps friendly, right? So all those features are not, at least the first three months, they're not inside communities. And this is the other part when running stateful set, is that you have to ask yourself this question, how do I provide this inside communities? And the answer is via the CSI, the container storage interface. Again, communities just provide the interface for your storage features. So depending on the CSI, so you have storage OS, SEF, PortRex, all those companies that provide these extra features from the particular CSI, right? So be careful of the CSI you use, especially in production, when you need things like encryption for the databases, replication, all of that. Of course, you may have duplicated function in the sense that the database itself can provide replication, but remember, if you lose your pod, you are gonna start from an empty PVC, right? So you will need to replicate from nothing up to all your data. If you do storage-based replication, when you're gonna restart your failed pod, you only have to replicate the delta, right? So that's the main difference while you need hardware-based replication and let's say software-based replication or replication from the database itself. Okay, another reason for people who are asking should I run a database inside communities? Well, this is from a Datalog report that is listing the top technologies running on containers. And you can see that, already, SposeRex, Elastic, Kafka, Rabbit, Mongo, MySQL, and Vox, all of that are stateful worthless, right? So, obviously, they are running in containers. So what that means is that already, even before communities, those technologies were popular in running containers. So now the top containers image running in communities' stateful sets, again, you find exactly the same ones. So what that means is that people who are running those technology, they do it in a stateful set. So definitely, stateful set is the way to go. But it's not the only thing. So what we've seen so far is just the database, storage, the capacity, the features. But how about the application layer? How do you install the database? Because if you have your database control, the parts controlled by a stateful set, it's gonna maintain a certain number of parts. It can do running updates, all the things. But how about what is running inside those containers? How do you make sure that the database is installed properly, the cluster is installed properly? How do you monitor this? Well, by using operators. So who is familiar with operators? Some of you. So let me explain for the others what an operator is inside communities. So communities is basically an extensible API. So every object inside communities lives within this API space. So you will find things like pod, demo sets, stateful sets. All of this exists as first class citizen inside communities. But there's no such things as MongoDB inside, as an object inside communities. And the idea of the operator is to create what we call a custom resource definition which will make MongoDB first class citizen inside communities. So that communities now knows what a MongoDB is. And that's the first part. That's API definition. I'm extending the communities API to make communities aware of what the MongoDB is. Then the second part is a custom resource. So it's an instance of that particular object definition. So it's basically a YAML file. The same way you create a pod or a service or a secret, you can create a MongoDB cluster, right? By respecting the particular syntax that you have set in the shot. And then the third part is the custom controller. So the custom controller, you can think about it as a runtime environment that manage your custom application. In our particular case, we've extended communities with CRD MongoDB. The role of this runtime is to make sure that whatever I do as a custom resource, the controller watches for the create, update, update, delete operation on that one. It takes appropriate action to make your desired state living inside communities. So if I create a MongoDB resource as a custom resource here as a YAML file, the custom controller makes sure that everything I mentioned in this custom resource will be living in the cluster. So creating basically this application layer in starting MongoDB, making the right number of noting inside the cluster, making sure sometimes providing backup capabilities, all of that is encapsulated into the custom resource for the definition and inside the customer controller logic that will perform all the automation, right? So all of that is what we call an operator. So it's the custom controller, the custom resource and the custom resource definition. So now we have three things, right? We have the operator, we have the stateful sets and we have the CSI. Those are the three components you need to think about when deploying databases inside communities. So you don't have to build this yourself, right? The operator, you can find your preferred vendor. Like Mongo, they have enterprise, they have community operators. There is company out there that's specialized in building operators. For HCD, there's the improbable operators. If you go on operatorhub.io, you will find all the possible operator to manage pretty much everything from Redis, MongoDB and even non-stateful application. But yeah, that's the third part. So today we're gonna see the Marvel app, which is an app I've built for to demonstrate MongoDB and to build a full application on top of this and create some CI pipeline. If you go on this link, you will see I've got a full three-part blog if you want to replicate the same configuration at home. Okay, so this application profile is that one. So if we start with the top, we have multiple front-end. We might know Flask is supposed to be a backend, but it's exposing HTML by using the bootstrap plugin in Flask. Then I have a three-node MongoDB cluster that is deployed, MongoDB is deployed as a stateful set in rating from the storage class. So the storage class, this is basically how you map the CSI to your stateful set. So the CSI layer, like all the features, are controlled via the storage class. So if you want to enable things like replication and encryption is by using a storage class that has this enabled and then you say for the stateful set, oh, by the way, use that particular storage class. This is how you make the link between your PVC, your volumes, and the storage feature set you want to expose for your database, right? And then, so this is typically what you do in a stateful set. Stateful set, you configure a volume claim template, you associate a storage class, and then you say I want, you know, when you deploy a pod controlled by a stateful set, it will have an individual PVC as we see, as we've seen in the beginning, as opposed to the deployment, right? So one per pod, individual PVC, all enable with the feature that is sitting there, right? And the role, this application, what it's doing, actually the front-end Flask application is going to pull Marvel character from the Marvel APIs, store them into a MongoDB databases and display some random cards containing the Marvel character information, right? So this is the pipeline that we are going to run, but essentially it's two different pipelines. The first one, I will show it live. The second one, as I said, I messed up, so I'll show you the video. But the first one, which is on my laptop, right? How do I control the pipeline on my laptop if I want to develop my application and change code live, for example? So what we are going to do is, so we have our application code store in Git repository. Then, of course, we have Docker Hub, we've got built images, published them to Docker Hub, and we're going to use Customize as a way to deploy the manifest into our cluster. So who is familiar with Customize? Little bit. So Customize basically take raw manifest and apply some customization. For example, you could say I'm in my Dev environment and because Mongo is now a native resource in communities, you can control things like the size, right? The number of nodes. So you could say, locally, please deploy only MongoDB on one node and just need one gig of space, right? Now, when you go to the production, let's say pipeline, things get a bit more complicated, right? So we're going to be using a tool called Scaffold to build our images. It's like, you can think about it as Docker compose, but for communities. It's helping building application, building containers, and running custom Docker build commands, right? For a production, we're going to push our code into a Git repository. Then a tecton pipeline should be triggered, but I'm going to do it manually in the video. So tecton is a pipeline the same way like Scaffold will be doing the pipeline bit locally on your laptop. Tecton is going to do it inside communities. And every task, every step of the tecton pipeline will be a container, will be a task that runs as a container. So everything will be running in communities. And the task of tecton will be again, build the image, build the manifest with customize, publish the customized manifest on the Git repository that is monitored by Flux. So it's fabled up with Flux. So Flux is a GitOps tool that allows you to monitor Git repository. And anything that is published in this repository will be deployed in communities. So why should you do that? Because it's for secure. You don't want to deploy to production using Coup Ctr, like here. I mean, Scaffold will be doing the deployment, using Coup Ctr. In production, don't do Coup Ctr, it's bad, right? You don't think you have any experience with this. So typically the manifest, you publish them somewhere as a hand chart. I mean, there's a variety of things you can do, but for the sake of simplicity today, we're just gonna publish raw manifest from customize run as a task in tecton. And Flux will be monitoring the repo as soon as the manifest get uploaded into that repo. Flux will provision all the manifest into the cluster, but this time with different properties, right? We will have a three node databases. We may need more storage and we may enable encryption, you know, replication, deduplication, all those kind of things. Just by changing things with customize, using customize here and here, you have the same base manifests and then you can tweak little bit of, you can patch the manifest, saying I want this amount of storage or I want, you know, that many nodes, I want replication, actually you have all the features there, replication and encryption and this only one replica or no replica at all, right? Just by using customize, different patches. Okay, so let's go for the demo, it's been 20 minutes. So we have 10 minutes, that should be enough. Okay, so let's go here. So I'm gonna show you live the scaffold part. So scaffold, again, very simple syntax. Here I'm using a name, the image I want to build, the Docker image, the name for the artifact, the Docker build context and the build command. So typically you don't need a build command but I'm using an M1, so I'm using ARM, so I need to use Docker build X to cross compile, right? This is why I need this script but you can just use native Docker if you're running on X86, right? Local push through in that I want to push the image to my local Docker demo as well. And here, this is the path to customize so you can see I have my dev overlay. My dev, so in customize, the overlay is the patching. What do I need to change to apply this in my, so for example, for Mongo, this is my overlay and I say I need only one member, right? For example, here and I think the storage is too big. So this is in production, I will have something different. I don't have it here because it will be built by my tecton pipeline. So my tecton pipeline is going to build my customized manifest as well. So it's just to show you there, like with customize depending on the environment, you can easily deploy depending on your needs. Okay, so this is basically what you need and then scaffold can run in multiple mode. You can do scaffold build, scaffold run. So here I'm just making sure that I'm running in my development environment which is this Kubernetes cluster. We're gonna be using K9s. So I don't know if you guys are familiar with K9s, but it's probably one of the best tool to monitor and to do any sort of operation in Kubernetes. So this is my current cluster, let's say my development cluster and scaffold is gonna create a new name space called dev and deploy everything into that name space. So for this, I could do scaffold build, but what I want to do is also change my code live. So as soon as I change the code, I hit save and everything is getting deployed again into my cluster with the updated code. So I don't have to go through the whole Docker build manually, update the image and redeploy. This is this feature I want to show you today. So if I do scaffold dev, so it's gonna build the image and then it's gonna start the deployment and it's gonna log all the container logs. So you can see the logs live from here. So this is my flask model, this is my front end and if you see now in K9s, there is a new name space called dev and you can see that the model front end is deploying. So in customize, I've chosen three in fraud, I think I have five and MongoDB one single instance. So the operator remember is controlling how I should deploy MongoDB. So the operator, the operator is now deploying the MongoDB database and the add data is just like the job that is gonna pull all the API information from model and store them in Mongo. So it should fail because the first time it's a job. So it's gonna repeat until it's successful until the DB is ready. So it's, this one is ready. So our option should already be working. So yeah, you can see here, this is all the API requests on the model API. So now I should be able to test my application. So to test my application, still using K9s, I go to my service, I've got my front end service. If you create just shift F for port forward, I want to redirect this service to my local host for 8080. So now I should be able, this is my model application. So it's running. So you can see, I'm gonna make a quick change to show you the difference. It's commit, I want to add an S. So I'm gonna go back here. I'm gonna go just to my port forward. I'm gonna delete the one I have. Okay. I'm gonna go into my code. So you can see, if I go back here, I still see the loss, right? I go to my app, template, the page. I'm gonna find, to look for a commit. And I'm gonna replace my commits for all the instances, right? And then look, I'm gonna say, commits command S or control S. And if it works, yeah, you can see. It's now rebuilding everything and redeploying everything. So now I should see new container coming. You can see, two second. So it's changing, deleting the old one, creating the new ones. So now if I go to my service again, I'm again creating the port forward. I go back here. So now my application has been updated. So you don't have, so the value here is you don't have to rebuild. I've probably saved, I don't know, five minutes from building everything. Imagine if you have to change the code hundreds times a day, that's hundreds times 10 minutes you're saving, right? So that's for your local laptop. Then I'm gonna move to the second part. Five minutes, right? Five minutes. Okay, that's perfect on time. Actually, time if you want to take this. Yeah, it's fine. So I'm just sorry, I messed up my clusters. I'm missing some storage class so I have to use my backup video. But I'm just gonna show you what I'm doing here. So same principle, I'm gonna use Tecton. So Tecton is basically, you create, I'm not gonna explain every concept. But Tecton has, you run pipelines. Pipelines are composed of tasks and nobody have three tasks. I have build, Docker, image. I have then use customize to create my customization file. And I have Git command to upload the customized file into my repository. But I don't have that tasks here because Tecton has a marketplace where you can just use YAML file, apply them to your cluster with building tasks. The only thing you have to do is give some screens. So for example, here I'm using the Git task and this is my Git commands. And then Tecton just act as a wrapper around my command. So the point is there's a marketplace, it's very easy to use. Even though the structure is kind of not complicated but as everything you have to get your head around it. But then for every task, you can either build your own or use one from the marketplace. So pipelines are composed of tasks, tasks are using resources and then you have the pipeline run. So basically as everything in communities, you trigger the Tecton pipeline by applying the pipeline run manifest which is what I'm doing here. So I'm just creating, so Pupcetia creates pipeline and I'm just gonna monitor what's happening in the logs. So just by doing this, you're gonna see it's gonna trigger my first task as a container. So every task is gonna be a container. And you can see within that container what's happening. So it's building the image. The same thing I've done with Scaffold is doing this. And by using the TKM pipeline logs command, it's displaying the same thing as what is happening here. So you have two choices. Either you use the TKM commands to see the logs of all the tasks or you can use every container. So it's still building. You can use every container for every task, go into that container and display the logs as well. The difference is here, it's everything in a single place. So here still building the image, downloading the required to the MyPip requirements.txt. So really building the image from scratch and pushing it into my upstream repo registry, sorry. And now, so you can see this task is done. Now I'm moving to another task which is building the customized manifests. So again, I'm looking to that container and what is here is exactly the same as here. Now it's done with that, I believe. So customized manifest, things should be done very soon. Third one, so the third one is basically pushing the manifest using my Git from the marketplace, my Git task for the marketplace and uploading this into my manifest repository and that flux is monitoring. As soon as flux, you can see, I'm monitoring, so flux is monitoring. As soon as it sees that there is new manifest on the repo, you will see reconciling and as soon as it does reconciling, the manifest should be deployed into my communities cluster. But this time with the customization I've set up for my production environment. So that will be a three node MongoDB database and I think I just put one gig of storage. I mean, I need more storage in there than in production. It doesn't make sense, but it's just to show you that it's possible. So you can see, I have more front end pods. I think I have five, if I go back here and you can see that the new revision, so the new manifests on the repo have been applied here and you can see one, two, three, four, five and the three tasks from Tecton are now completed. So my deployment of my application is basically done. So the last step is basically, so you can see the MongoDB cluster is still deploying. So this time three node, three node, taking a bit more time. So my job is gonna fail probably once or twice. So you can see also the PVC have been deployed with one gig of storage, the first six ones, the data volume as well as the log volume. I think that's the last bit. So yeah, you can go to the model front end now and same thing. Now it's because it's running in a cloud, it does have an external IP. I don't have to redirect to my local host. So now the only thing that remains is basically doing the same thing, testing that the application is running and working with this public IP address 8080 and yeah, it is now running. So but it's easy, it was a video. So okay, so that was it for the demo. So, and yeah, that's pretty much the end of it. Key takeaways for today. So databases can happily be around as container, provided you have the right tooling in place. So you have to figure out the enterprise data features when migrating from SAM or NAS to Kubernetes. So we've seen you have to think about the CSI, you have to think about stateful services deployment, of course it's gonna be stateful set and you have to choose your CSI wasly and your operator wasly as well. And make sure that you are using the right pipeline tooling. So scaffold is one example. I just did another lab on depth space, which is also a good alternative to scaffold. For your main CI pipeline is up to you. I've shown Tecton because Tecton is Kubernetes and we had Kubernetes in the title. It also allows you to repeat the same principle everywhere. So you can be on-prem in Google, in Azure, in Oracle Cloud. As long as you have a Kubernetes cluster, you will be able to use Tecton rather than the individual cloud specific, vendor specific pipelines. And as I said, I'm part of Spectro Cloud. And if you want to run stateful applications in Spectro, actually, or even any cluster, what we do provide is layers that you can build and are repeatable in any cloud. Once you have your cluster profile defined, we provide things like backup, scanning and all of that using open source tool like Valero. So that's also another options. I mean, there's Spectro but not only Spectro. If you run some cluster somewhere, you have to think about the backup. That's another thing I didn't mention. There is replication, but you also need backups. So whether you use Valero, Spectro has the platform which is using Valero or custom, they also have other solution. They are operators that also provide backup solution but also think about your backup. And that's pretty much it for today.