 Hi everyone, welcome to today's CNCF webinar on CREUS. It is a CLI tool to install or set up observability stack on your multi-cluster setup to observe or to get metrics for your deployed applications. Quick introduction, my name is Yajrika. I'm a Senior Product Engineer working in Infra-Global Technologies. My experience from last couple of years is in developing in Golang around operators, controllers, and that stuff. Hey everyone, this is Rishi Kesha Rishi. I'm the VP of Delivery at Infra-Cloud. I typically look at the pre-sales and growth part of the company along with the general project execution. Let's talk briefly about the agenda for today. Let's discuss the reason why we built CREUS in the first place. What are the factors that drove us to write a utility like CREUS which can set up an observability stack in a multi-cluster fashion in an easier manner. We'll talk briefly about the solution that CREUS is, a brief topology of, you can showcase a brief topology that can be set up using CREUS, and then Yajika will run you through a demonstration of CREUS and how do you use it, so on and so forth. So moving on, the state of monitoring today and some salient points which actually drove us to write something like CREUS. In our experience talking to our customers, they are moving towards offering their products as a service to their end customers. So essentially, certification programs where existing products are being offered as a SaaS service are underway in a lot of space. These are typically microservices, so multiple applications which are cloud native in nature, et cetera, right? And as a result of this, the topology, the deployment topology of these applications when you have to offer it as a service to multiple customers, focusing on tenancy, isolation, using the same shared infrastructure become very complicated. With the deployment topology becoming complicated, so does the way of monitoring or observing these applications add into the mix of a multi-cloud and a multi-cluster deployment scenario, and this becomes even more harder to achieve, right? Applications by themselves or the product that are offered to their end customers need to follow tenancy and isolation requirements in any case, right? And the metric data that you observe for your applications is no different. That needs to be isolated, that needs to be well preserved for a longer duration so that you can run analytics, you can identify patterns, you can reduce alert fatigue, you can start drilling down into monitoring what really matters and constantly iterate and get better at looking at what signals you can observe so that you are proactive, right? With this in mind, I think companies with a complicated deployment topology such as this and the tenancy and isolation requirements and the high availability requirements that come along with it, companies are increasingly finding the need to adopt monitoring solutions such as Thanos, Cortex, Victoria Matrix, so on and so forth, right? And that is exactly what CREAS allows you to do. It allows you to bootstrap these highly available monitoring solutions which are purpose-built and modular in nature, which can run across multiple clusters. Yachika now will talk about CREAS, the CREAS UI and then she will briefly talk about the demonstration for today. So what do you Yachika? Yeah, thank you Rishi. So CREAS is a CLI tool to easily install your observability stack on multiple clusters as I mentioned, right? And so far, we have added support for Prometheus and Thanos. Using CREAS, you don't have to worry about wiring all these components together. You just use the CREAS CLI tool to install to generate a spec file, which is a single source of truth on which CREAS depends and then you just apply it. Now what that config file is, it's just a declarative file like any declarative file in Kubernetes. For example, like a pod where you mentioned like, this is this kind of setup or this is a template for that pod where you need this image or container and stuff. You just mentioned these kind of statements in your config file, where you mentioned that this stack like Prometheus or Thanos is needed in cluster one or cluster two or cluster three. Clusters are just Kubernetes clusters. And then you just apply that spec file using CREAS. So it is CREAS responsibility to bring that expected change into a desired state. Now you can generate any complicated deployment apologies using CREAS. I'll show you a sample deployment apology, which is this, where you can see there are multiple Kubernetes clusters. Prometheus is running over them. It's a federation kind of system where multiple Prometheus are remote writing to Thanos or Thanos is scraping metrics from Prometheus using the Thanos sidecar running along with Prometheus in multiple clusters. Now you can observe all of these metrics for your applications in Thanos square year, which is right here in the left side of the box. And you can visualize all of your metrics in Grafana dashboard also by adding that query as a store or the endpoint of that query in your Grafana dashboards. You can also view all of your metrics in query or front end. There is a component called query front end or query itself also. And to achieve a long-term storage capabilities, there is a object storage also, which is in the right side of the screen, where Prometheus and Thanos, all of them are posting metrics to achieve long-term storage capabilities using any S3 GCP or Azure Blob Storage, or we can even use local Minio setup. Now all of this complex deployment apologies is very hard or it is not as smooth as possible. And that is why we have built Crius because it's really easy and it's really easy to do this kind of complex setup using this CLI tool. I move on to next slide. Now on top of Crius CLI, we have built Crius UI, which is a tool, which is a UI tool to design or create your monitoring deployment apologies. You can add multiple clusters. On the left side of the screen, you can see there are multiple draggable components. You can drag all these components to the canvas and put it inside the clusters, configure them, attach them, and then add some object storages and then download the export the values and apply it. This UI tool is highly extensible or highly customizable. You can design anything you want to. You can configure any value. If you click on any component, you will get a popup where you can answer few questions and then based on those values, you'll get a configuration file. In the demo part, I'll showcase you the UI. I'll show you how to create a sample topology file. I'll generate a spec out of it and then I'll apply that spec and we'll see the actual setup happening. I'll open a Crius UI. Yeah, so this is a page. I'll... This is a product tool which we are getting here. I'll just walk you through it. You can see you can add a pre-baked template over here and then you can also build your deployment topology from scratch by dragging these components. And then I'll... As I mentioned, I'll create a deployment topology. I think it's not added. I'll... I think it's added but somewhere out of the canvassia, here it is. I'll just rename this cluster to... What I'll create is I'll create three clusters, two for Prometheus and then one for Thanos. I'll drag these components here. If you put it outside the cluster, it says just put me inside a cluster so if no use, putting it outside a Thanos or a cluster. And then I'll also need object storage. I'll use AWS S3. I'll name it something like my AWS bucket. I'll not fill in these details as of now. I'll configure Prometheus server. I have to answer these few questions that if I need a fresh installation, yes or no. No means there's already a Prometheus server running and we just add Thanos sidecar along with Prometheus server. And then I'll name it something. This is just a simple name, you can give it anything. Namespace, I want everything in monitoring namespace. If I need a sidecar or a receiver mode. Sidecar mode is where Prometheus server will have a sidecar, Thanos sidecar running alongside it and then it will, the Prometheus, the Thanos server will scrape metrics from that sidecar. So I am designing a sidecar topology. I'll just say a sidecar. And if you notice there's a sidecar is added with this Prometheus server. I'll add same kind of things for my other Prometheus running in another cluster too. So in same namespace and then sidecar. And same we can add, answer few questions for Thanos also. We just skip answering all these questions as of now. Save. I will connect Prometheus with Thanos and this also because Thanos needs to access the Prometheus server. And then to achieve long-term storage capabilities, I'll connect Prometheus server to AWS S3 also because the sidecar will add these metrics to this bucket. And then server also need access to S3 because there is a store gateway in front of a courier to query these long-term stored metrics. Now the deployment topology is ready. And then what we can do is we just export these values. We got these values based on the questions or whatever we have designed. Which says is there are three clusters, cluster one, cluster two and cluster three. And cluster one's type is Prometheus. And then it says that installation, yes, we have to do the new installation. Name is Prometheus, namespace, mode, which is sidecar. And then the object store config, which is this Prometheus server is connected to. And then cluster two, which is also Prometheus. And then there's cluster three, which is of type Thanos and some values of Thanos. And then in the end, we have this object store config list. We can add multiple object stores here. But as of now, I have only S3. And then there's this name, unique name for all the objects store. And this name I'm referring here in my monitoring components, right? So I wanted to show this also. We can add a pre-baked template also, which has receiver or sidecar mode. We designed a sidecar mode. I'll show you a receiver mode also. This is not very different than that, but the value is different where it has this receiver config also, where this Thanos Prometheus server is remote writing to Thanos server. There is no sidecar running. Thanos server is not scraping the metrics. It's Prometheus, which is remote writing using the remote write APIs to Thanos server. But it has all the same long-term storage capabilities in Thanos server and all these things, right? I can even show you the values, which is pretty much same. By adding a pre-baked template, you will get a template and then some pre-populated values also. I'll show you how to apply this config file. I already applied it because it takes some time. So I applied it before starting the demo. You just run clear spec apply config file in the config file name. What it does is it validates the YAML file, run pre-flight error checks in all of your clusters, which says that it's according to the schema or not. You cannot add any extra value in your spec file because it runs all the schema checking and stuff. And then it installed Prometheus and Thanos or whatever is written inside the clusters. We'll verify that all the pods in running state. And yes, I'll show you, I'll put forward this Thanos Query service and I'll show you metrics from multiple Prometheus. Okay, you can see there are multiple these endpoints which are Prometheus server to Prometheus server, which we gave access to our query. And we can also verify it. I'll go to my cluster one, which is Prometheus. I will get the service. And then this is a service, external service of type load balancer, this is external IP. We can verify 3483, 116, 240. This is the endpoint. And then it's 35, 97, 74, and 62. So both the clusters Prometheus one and Prometheus two are sending metrics are accessible to Queryr. And then I can run a sample metric, show you some metrics. So yeah, these are the metrics coming. You can see there are these nodes are from cluster one and few nodes are from cluster two and all the metrics are coming from multiple clusters. Yeah, anything Rishi, you want to add here about Crius UI or config file, anything? Yeah, so I think the general idea or the general user experience that we create which in mind, which Yachika, by the way, covered was that you model your deployment topology of your observability stack, be it a receiver mode or a sidecar or a hybrid of both of them along with long term storage, so on and so forth and configure that entirely. And you get a single source of truth which is the YAML schema that Yachika did walk you through. Now the YAML schema is basically a subset of all the configuration values that Thanos or Prometheus by themselves allow you to configure. The reason for that is because we felt these are the values that we most tinker around with when we are wiring these multi cluster sort of Prometheus, Thanos, the end points, the sidecar end point exposed to the query here, the store gateway, the front end, and there are several components that you have to sort of configure, right? As we move along, we will start adding more and more configuration options in the YAML specification anyways, right? And you can create any sort of topology out of the Crius UI. At the end of the day, you should be able to export these values out, which serves as a single source of truth so that you know what is deployed in production when you actually apply that spec and the entire orchestration happens through Crius, right? We would eventually be adding more and more features. So we adopt GitHub's pattern for the single source of truth or the YAML that you have specified so that we could keep reconciling on that, so on and so forth. But we encourage anybody and everybody to sort of, this is a completely open source tool. You'll find it in the InfraCloud GitHub organization with Crius as the repository name. We encourage any sort of help that we can get. We request for features, contributions in the code, documentation, so on and so forth, right? But yeah, I think that is essentially the user journey that we visualize it to be. Yeah, that's all that I wanted to add in. Yeah, thank you so much. Yes, so this was the demo part. We get aggregated view for all of your clusters, for all of your Prometheus running on multiple clusters and that was it. Please check out Crius repo. It's here, github.com, Crius, InfraCloud IO, Crius. And we have also written a published blog post. Please go through it, try using it, report issues. Thanks a lot. Thank you, everyone.