 Hi guys and welcome to the session on Reddit Quay. In this presentation I will focus on all the aspects you might want to consider before you finally deploy and start to use Quay. Quay is a very powerful product. It provides a ton of very powerful features and it provides also a lot of choices for different things you can solve in a different way. And that's why sometimes Quay is perceived as complicated. At the end of the day it's more flexible than complicated. But however in order to successfully deploy Quay and use it the right way or the recommended way there might be a couple of decisions or questions you need to ask yourself and decisions you need to make in order to have a setup which is sustainable for many different years and effectively gets the best out of Quay as their product. Before we jump into those different patterns and all the questions associated with let's start with a very high level view on the Quay architecture. So Quay is a containerized product which means it can run on nearly any containerized infrastructure. It can run on a standalone host with a container runtime. But of course it runs better on an orchestration platform. It effectively consists of Quay as a containerized application. Optionally you can use Clare the vulnerability scanner, the mirroring worker for repository mirror and the Quay builders for the git build triggers powered by Quay as well. Typically in front of Quay and Clare you'll run a load balancer because typically you run more than one part of both. And then you have your clients, your customers, the UI and API commands which are connecting via the load balancer to Quay and Clare. You are sourcing content from the outside such as container images from the Reddit container catalog from Quay.io, operators from operator hub.io, your own content supplier content and of course we also need to get this CVE metadata into your environment because Clare is using it. And one of the typical clients Quay serving content to is obviously OpenShift or Kubernetes and then we have different operators running on OpenShift supposed to help with the integration of Quay into the Kubernetes platform such as the container security operator and the Quay bridge operator. So this is a very high level overview. We will dive a little bit deeper into all those details in a few minutes. So let's start with a couple of questions our customers typically asking us or we are asking them if they are asking us what's the best way to deploy Quay and then finally run and use Quay. So one of the first question is which infrastructure Quay is supposed to run on? Is it on prem? Is it on public cloud? And obviously this has an impact on a couple of things such what is the corresponding storage backend or database service I can use. Another important question is should I use a distinct registry for each lifecycle environment or should I start with a shared one which is used in both development and production? And then there are a couple of other scenarios which might have an impact on the overall design such as disconnect or aggregate environments and whether the Clare or builders are supposed to use or not. Let me start with the infrastructure. So effectively technically Quay runs on any physical or virtual infrastructure both on prem or public cloud. This doesn't matter. It's a containerized application. It runs everywhere and it also scales from a developer laptop where Quay is running on up to a very massive registry as we are using it at Quay. So there is no difference from a code perspective between a very small size setup on a developer laptop compared with a massive scale setup on public cloud. We recommend if the infrastructure is public cloud then do us a favor and use the public cloud services for the backend services such as database and storage. So if you run for example on AWS you can use the AWS services for load balancer, storage, the database itself, the Redis cache and also we added here a recommendation on the virtual machines, the EC2 virtual machines, at least M3 large probably better is the M4X large sizing there. Nearly the same applies to all other infrastructure. We just picked two of them. So the other one is Azure and again use the public cloud services for database and storage if you run it on public cloud. There is a very detailed overview of what are the different components, infrastructure, backends, etc. which we are testing against and therefore also support as part of the product production support. And all of those items are explicitly called out in the Quay tested configuration metrics, which is in the redhead customer portal. The other question is whether you should run Quay on a standalone host versus running it on OpenShift. It runs perfectly fine on both on the standalone host. We have many customers who are deploying Quay on standalone container hosts. It's a little bit more tricky to run Quay there as an HA setup with multiple hosts involved. So you need to manually or see me automatically take care on all the things Kubernetes offers out of the box. And it gets a little bit more complicated if you look on all the recent changes or extensions or features we added to Quay primarily in the operator space because obviously Kubernetes operators only work with Kubernetes, which means you can't use them on a standalone container host. It's important to know that if you run it on a host, this host needs to be properly subscribed from a subscription perspective. So the recommended way is to run Quay on OpenShift. There are a couple of benefits coming out and most of those benefits are coming out directly out of the box capabilities of Kubernetes. So Kubernetes takes care of all the important aspects of running a containerized application and you can leverage them. And OpenShift goes far beyond just being the plain Kubernetes orchestration there. That's why you have a couple of additional pieces such as the operator lifecycle management, monitoring dashboards and all the great things OpenShift offers out of the box and you can leverage it with Quay as well. So effectively Quay runs everywhere, but it runs probably best on OpenShift. And this has something to do with all the stuff we added, especially on the operator side, as I mentioned it, the Quay operator itself ensures that you can seamlessly deploy Quay on OpenShift and also in future versions of the operator, we will take care on the entire day two management, all the different maturity levels which are critical for operators and why we invented operators or cores invented operators initially. And then we are using them everywhere, especially on the platform layer. The other operator is the container security operator, which brings Quay and Clerm vulnerability information into the Kubernetes platform. And starting from there, it's exposed to the OpenShift console and therefore made visualized to developers and cluster admins from within OpenShift. And the bridge operator is a new operator we just introduced with Quay 3.3, which ensures a seamless user experience if Quay and OpenShift is used together. So as I showed on the other side, there are a couple of benefits if Quay runs on OpenShift. And probably the most important one, it's very easy to deploy Quay on OpenShift, because this is of course the target platform we are developing against. This is something we are investing heavily. This is not the platform we know best, because we own it. And there are a couple of great benefits coming out of it. You can see them on the slide shown there. I don't want to dive too much into the OpenShift specifics, because we will run a dedicated recording for Quay and OpenShift where we dive a little bit deeper into the specifics of all those three operators, and how to use Quay with OpenShift and how to run Quay on OpenShift. Let's have a look at the database backends. So one of the more effectively the most critical backend dependency for Quay is the database. All metadata is stored in the database. So only the physical, the binary blobs are stored in the storage backend, but all the data which is shown in the console is stored in the database. That's why the database is really critical. So since this is the most critical thing, definitely we recommend to run the database in HA mode. We recommend to use PostQuest simply because PostQuest is required by Claire. If you only run, if you only run Quay without Claire, then you can also use other databases such as MySQL or MariaDB. So and again, if you run it on public cloud infrastructure, we recommend you to run the PostQuest service provided by the cloud provider. Typically, since the database is a stateful application, we would recommend to not run it on the Kubernetes cluster without having a database operator. We as right here do not provide an operator. And that's why we partnered with our ecosystem partner crunchy data, because they have a certified PostQuest operator. And we are testing against this operator as part of our QE testing. Since Quay 3.3, we allow you to also push the logs into an elastic search stack instead of pushing them in database. This doesn't make the database less critical. It just limits a little bit or uses the requirements on scalability and performance on the database site. The next prerequisite is storage backend. And by the way, prerequisite really means that it needs to exist before you start deploying Quay. Storage, as I already mentioned, all meta data stored in the database, storage is really to store all the binary blobs. Quay HA requires similar to the database and HA setup for storage. Geo replication has a hard requirement for object storage. We do not support neither local storage nor NFS nor any any other local disk which are mounted in the container for production setups. So there are a couple of choices there for on-prem storage types. We do support SAP Routers or SAP. We do support OpenStack Swift. And we fully support the OpenShift Container Storage version 4. Open Shift Container Storage version 3 remains in tech preview because the Nuva part remains in tech preview as well. And we effectively inherit this support status of the Nuva multi-cloud object gateway. On public cloud, nearly all public cloud storage backends are supported such as AWS S3 or Google Cloud Storage or Azure Plop Storage. It's important to call out here that only hot storage is the storage backend you're supposed to use, not the called standby storage options. Waitest cache is a third component which is kind of stateful, but it's less critical compared with the database and the storage backend. It's primarily used to store all the builder logs and the create tutorial. So let's assume you've already watched the tutorial and only the builder logs are the component which is really eventually important in there. So you need to make decision whether it needs to be HA or not. Typically, it's not done in an HA fashion because the risk of that greatest goes down and it's pretty low and then the associated risk is also low. The next decision you need to make is whether you want to run a dedicated registry, for example, for development and another one for Pratt. So I met a bunch of customers in the past and basically most of them insisted of having those registries keep them separated because we want to ensure that production workflows are protected kind of and that's why we want to ensure that the content which is produced and used in the development environment is not exposed to production at all. However, this is something, so if that's the goal or the requirement, you can easily achieve the same thing using organizations or repositories and the corresponding Arbit permissions. So there isn't any need to split really or to run really two distinct registries. Effectively because you give up all the advantages a registry brings out of the box such as de-location and compression because if the same image is used in Dev and Pratt and you run two distinct registry, you really have a copy of the same binary blob in your storage backend. So effectively, you are nearly doubling the cost over time and looking at the amount of binary data which is stored in registry, this could become pretty expensive. Effectively, the same applies to we want to separate the content. We want to clearly distinguish between the content which is sourced from an external source such as the rather container catalog or our suppliers versus the one we internally produced. So we don't want to mix it in the same tool. And again, this is something you can easily achieve with the organizations and repositories and the Arbit permissions. The same applies to the let's say upgrade and update experience. So you shouldn't be really concerned that an upgrade breaks the registry and then a critical component isn't available anymore and this potentially breaks workloads or even the cluster. So this is something we probably or we hope that we are doing a great job on testing all the features before we ship it. In case you don't know the release and department model of query means we are developing those features. We are testing it intensively internally in our QE. Then we push it to credit or make it available to selected namespaces. And after this has been stabilized and we open the door and basically make it available globally in credit or after this has been stabilized again, then we finally build the product packages or images and ship it to our customers. So it should be stable when it arrives in your environment. So this shouldn't be the main reason why you want to run two distinct registries. The same applies. I met a couple of customers and for whatever reason they wanted to run a registry within each of the data centers. So the default use case is Quake can easily serve content to multiple data centers. AHA can stretch across different data centers simply because the AHA is primarily achieved on the backend side and storage typically all the time runs in more than one data center. But if you just look at the use case of Quake.io where we are serving billions of images to thousands of clients globally dispersed across the globe, this is the use case why this shouldn't apply to you as well. So this shouldn't be an issue. The other one is scalability concerns. Again, the same code base is used for Quated and Quated.io. So if you are running into performance issues with Red Hat Quake, this might mean that either something went wrong or you run a registry at the same scale as we do with Quated.io which is one of the five biggest registries out there. So the scalability of Quake shouldn't be the concern. The only let's say valid reason why it makes sense to have two distinct registries between the F and BOT might be if you really need to have distinct registry write configurations. So for example, if you really want to ensure that the Quake builders are only used and enabled in the development environment but not in production stuff like this. So and this also applies to if you have a different ownership and different team with different users supposed to act as the sub-button for the registry and stuff like this, then it makes sense because obviously if it's a registry write configuration which differs then you can't use one shared registry which is used across all those life-second environments. But in most of the cases it's really the recommended way is to run a shared registry instead of having distinct multiple ones. You need to operate and maintain as well. Quick talk about the disconnected and air-gipped environment. So while Quake runs perfectly fine in an air-gipped environment, Clare does not. Clare needs to fetch the CVE or vulnerability metadata and this requires that Clare is at least connected to the internet as of today. You can use proxy so that that's not an issue, that's not an issue. This still means that the cluster Quake is serving content to, they can be disconnected. So this doesn't matter as long as Quake and Clare is connected, this is not an issue. Future versions of Quake will bring a feature and this is a slide from the Reddit Quake roadmap deck effectively describing what is our future vision on enhanced support for air-gipped environment where we will allow to run both Quake and Clare entirely air-gapped and then of course all the clusters Quake is serving content to are running air-gapped as well. But as of today, Clare needs to run in a connected mode. Hopefully with the upcoming release 3.4 we will get rid of this limitation. From a network access or firewall perspective it's fairly easy, obviously all your clients and all your clusters, all the nodes need to be able to access the registry itself. So this is typically the SSL port so 4.4.3 assuming that you run only encrypted communication to the registry. So port 80 is typically not needed and then there are two optional parts for the contact app and for the Prometheus endpoint which typically is not exposed to the outside world but maybe to a broader internal community of clients who need to have access to it. All the other services so Postsquare, Swedish and Clare are not supposed to be exposed to the outside world. So those are just services which of course need to be accessible by Quake but not by the client. So all the connection happens between the client and Quake as the registry. Only the storage backends needs to be accessible by the client if you do not enable or use the storage proxy option. Which also means that if the client access to the storage backends is not feasible for every reason then you can use the storage proxy to work around this and then the Quake container is serving the binary blob to the client. One of the last questions I had on my slide was the simple question whether you want to use Clare and the Quake build automation. Those are optional components you don't have to. We of course strongly recommend to use them because we believe it's quite powerful and it does a couple of great things for you. Just a very brief introduction. So Clare is the vulnerability scanning tool which has been developed for Quake but is also used by other party products. So you might have seen the announcement from last year where AWS ECR started to use Clare as the scanning back end as well. We remember how was using it, other tools are using it as well. It's a 100% upstream component similar to Quake. It's pretty powerful and we will just introduce a new version of Clare with Quake 3.3 which then introduces the support for programming languages initially limited to Python. So you need to ask yourself the question whether you want to use it. We recommend it because we believe scanning at the registry level is the best place where it makes most sense and where it scales best. So this is our recommendation. This does not automatically mean that you should not use any additional scanner or other security management tools. Of course you can do this and we strongly encourage you to do so but you still can use Clare as a second view on the same things. Another thing which is optional are the Quake Build Triggers. So what is it? Effectively it means that it Quake can take care of automatically building images triggered by actions which happen in any of the Git tools we are supporting such as GitHub, Bitpocket, GitLab and of course also custom Git. So as long as the Dockerfile is stored in the repository we can automatically trigger a build and then the image, the resulting image is pushed into Quake and we just introduced a very powerful feature for a better customization of the tagging. So this is a feature you just need to decide whether you want to use it or not. You can make the decision or change the decision at any point of time. I just want to call it out because this might have an impact in some environment what the underlying infrastructure should look like. On the deployment patterns, so the second part of this presentation, there are a couple of options or choices you have there. So for example, we already briefly touched the question whether should I run it on a standalone host versus Kubernetes? What are my target destinations? How many? Where are they? What's the technology used there? Should I use geo-replication or is it more repository mirroring what I want to use? What about sizing guidance? What about subscriptions I need? And what about HA? So how do I achieve HA? Let's go through those different points. Let me start with the deployment example. As I mentioned it, Quake runs perfectly fine on a developer. It runs in a data center. It runs also as stretched across multiple data centers. So this is not an issue. It runs on effectively any infrastructure as long as it's container runtime. The important point here is again, Quake and Clare and the repository mirroring, all of them are stateless containers. So the critical components of a Quake deployment are still the database and the storage backend. And how you achieve the HA, this is entirely up to you and probably you won't change it just for container workloads. Probably you will continue to use all the services you've used in the past to ensure HA for those mission critical services which are probably not only used by Quake, you believe. The question to which environments or destination targets Quake is serving content to is fairly easy to answer. Quake can serve content to any OCI compliant target. So as long as the client speaks the standards and specifications such as the Docker registry API or OCI distribution spec which is upcoming, you are totally fine. So we also, we still have a long term spec support. We deprecated the Docker V1 push support but we still support Docker V1 pulls in the Quake registry as of today. Which means you can serve content to any container host to plain vanilla Kubernetes to OpenTrip cluster and it doesn't matter if it's one client or a hundred or thousands up to millions. So this doesn't matter. It also doesn't matter whether the clients run in the same data center or in a different data center, even up to a different region. If the clients run in a different region, then you might need to go back to the geo application question we will answer and if you win. But otherwise it doesn't matter where, how many clients etc. This is really not an important question you need to answer. Let's quickly have a look at the question whether you want to use geo replication or repository mirroring. Since Quake is the only registry out there which has two features geo replication and repository mirroring, many customers and other guys are mixing up a little bit what those features have been made for. Those are different and complementary feature. Those are not conflicting with each other. So if you just look at the high level, let's say data flow into your environment, the recommended way to source content from various external sources such as your suppliers, the right to container catalog, community images from Docker Hub or some belt. The recommended way to do is to use a registry as the primary content inverse point. So the single source of truth. So this is how you get content into the registry and repository mirroring has been intentionally designed in a way that it allows explicitly white listed content. Which means you need to explicitly select the external content you want to mirror into your registry. So as a starting point, starting from this primary registry, so the entry point into your environment, you effectively have two or maybe even more options if you also include the various combinations coming out of it. One option is that the primary registry is using geo replication to ensure that if clients, for example, are running in North America and other clients are running in the mirror, there's one large single globally distributed Quake deployment. That's the content, the configuration, the users, everything is the same in both North America and in the mirror. The only difference is that clients in the mirror are pulling the binary blobs from nearby storage in the mirror versus the clients in North America are pulling the binary blobs from the North America storage. That's the main purpose of geo replication. So effectively, it's supposed to speed up the access to the client. And it's an asynchronous replication, so if the replication hasn't been successfully completed, the fallback is still that the client then goes over the ocean and fetches the blob from the other side of the world. So geo replication means you run one large registry and this is achieved by a shared database, which is used on both sides. Again, it has two distinct storage bag ends, but it is using one big database, which is shared across both sides. So this is geo replication. This is one of the options if you have clients on both continents. Another option would be the secondary registry. So basically, you source all the content into the primary registry, and then you deploy a second registry, which then uses repository mirroring again to mirror as an again and again as explicit subset of whitelisted content into the secondary, which also means if your requirement is that you initially source whatever the entire repository or a subset or a huge list of repositories from redhead in the open source community into your development environment. And then you have a second environment where you clearly want to separate, okay, in EMEA, where the software development, for example, doesn't happen, but only production workloads are running. I don't need all the content I originally sourced into my primary registry. I only need a very specific subset, which is required to run in my production cluster. And then a secondary registry makes sense and the community or the the connection between the first and the second would be done via repository mirroring. So effectively, all the clients in EMEA in this example, wouldn't have access probably even to the primary registry, because they only can connect to the nearby secondary registry, which runs in the EMEA region. So basically, you have two options. And the funny thing is, you can even combine them. So you can even use geo replication and repository mirroring side by side. So there are plenty of customers who are doing this because there might be globally dispersed setups, where you, for example, want to run a geo replication setup for both North America and EMEA. But then you have other clusters, for example, in APEC, and they are using a smaller subset of smaller create deployment in the APEC region to serve content there. And for the APEC region, they're using repository mirroring. And for EMEA and North America, they are using geo replication. There will be a more detailed recording on explaining those features in further detail, also how to configure and how to use them. That's why I don't need to dive too much into the details. It's also worth to mention, as I already called out, you can configure the clients easily in a way that you explicitly define to which of those registry tree the client is allowed to talk to. And again, the client can be in an entirely agate or disconnected environment. So to summarize the key difference between repository mirroring, this is a slide I've used in the past. And again, I will run a dedicated recording on those two features and a couple of sample use cases to explain it a little bit better. Let's move on to the sizing recommendations. And this is really a tough question. So we are getting a lot of sizing question. And it's really, really hard to answer them. So first of all, again, scalability is not the issue. So there isn't any known limitation. We know how to use or when Quay reaches its scalability limits because we are running one of the biggest registries out there. And again, it's the same code base we ship as the on-prem product. It's exactly the same thing, right? There aren't any typical sizing recommendation because it's really depends on the multitude of factors. So the number of users, the number of images, the number of concurrent pulls and pushes, all those data points have a significant impact on the performance requirements. There is not even an easy sum pool. One thing which is important to understand and to know is since it's a containerized application, it's fairly easy to scale out Quay and Clare, but this will definitely cause more load on the backend service. So typically the performance bottleneck is not the Quay or Clare container and also not the repository mirroring worker. It's really the backend service. So if you would invest into something, then it's probably into the storage database and the connection to those services from the Quay and Clare containers. Auto-scaling and stuff, it is, you can manually configure today. We will add this as a future capability and probably done via the Quay operator in future versions of Quay. The minimum requirement, this is something we can specify. So the minimum requirement for Quay is four gigabytes. We recommend six and at least two or more virtual or physical CPUs. Clare is a little bit more relaxed because Clare is the scanning engine. From a data standpoint, keep in mind we are fetching all the security metadata from various sources. So it's not limited to Reddit content. We cover a long list of different operating systems and we just added Python, which means there's a lot of security metadata which is fetched and stored in the Clare database. So at a minimum it's 200 megabyte. It will be probably even more and the more images you have, of course the more vulnerability scan reports you have and then of course the database becomes quickly bigger. And even the storage, this really depends on how many images do you have, how many images are you sourcing from whatever, Reddit, how many images you are creating, how many of those images have shared layers. It also depends on the way how you build images and especially how you build your binaries because this is the question behind this, whether the layer is really shared or not. So on this slide, we just try to provide some guidance on what is a typical sizing or a sizing we have seen at our customers. Yeah, so the minimum setup, of course it works that you can run only one Quake container, but typically mid and large size setups run three to five containers in average and then they scale over to eight or 10 containers if heavy loads hits the registry. Clare as I mentioned requires a little bit less resources, so three to six containers are perfectly fine and the mirroring ports, there was a dedicated slide on the mirroring sizing as well and how you can avoid it. You need to run more mirroring ports. So one of the most important recommendation there is that you don't run all the mirroring operations at the same time. So if you mirror 10 repositories every single day, then please divide it into the daily schedule that not all 10 are running at the same time stuff it is. The database, as I said, is the most critical requirement. So we recommend to use at least four to eight core and between six and 32 gig memory. So this is a huge wrench, that's correct, but again, so this is typically the most critical backup bottleneck. And storage, I already mentioned it, the registry is typically growing but never shrinking again, so between one and 20 terabytes everything is perfectly fine there. Radius is a little bit less relaxed, if you by the way you only need or radius becomes only really critical if you're using the create build automation if you're not using the build automation it only stores the tutorial and then of course it can be a very small sizing. Typically the radius cache is running somewhere on the database host just as an additional workload on top of it. And on the infrastructure side we recommend that the infrastructure nodes at the host level has at least four to six cores and between 12 and 16 gigabytes. So on the right side you see kind of nearly the same sizing as we use at Quedo and you can see the number of pods and also the sizing except the storage obviously. It's not that big as you probably would imagine, which the main reason why we added this is to explain, no, it doesn't make sense that you run more than 15 quay containers by default because if we don't need more on the Quedo host side then you probably don't need more quay pods. So if you still have performance issues then it's probably caused by something other than the number of quay pods which are running there just as a consideration. As the product manager of course I'm also taking care on the the commercial aspects of the product. So one of the key questions over and over again are on subscriptions. So what are the different types of subscriptions and how they are measured and stuff like this. As nearly any other product, relative product we sell subscriptions in either standard or premium support and the quay subscription as of today is based on a deployment. And the deployment effectively means it's one single quay registry with a shared data backend, which means the database and the the storage backend is the same. And the easiest way to explain it, in the quay config yaml file you can see there's only a single entry for database and there's a single entry for storage. Which means if you need two entries because you run for example two different storage backends in different data centers regions then it's probably two deployments. And if you run two database backends to this team database backend then it's no longer the same registry because as I said beside the config file and the the certificate everything is stored in the database which means if you have two different database entry points then you have two different data sets and two different registries with different user configurations different configurations whatever. So those are two deployments and then it requires two subscriptions. The exception for the data effectively is georeplication. Georeplication still means it's one database because as I mentioned earlier it's a shared database which is used on both side but you have two distinct storage backends which are then mirrored from from one to each other and that's why it requires as of today two different subscriptions or if you're replicating even further you can also replicate to a third or fourth region if you want and then of course it counts per replica so if you run a georeplication setup with three regions then you need three subscriptions. The number of pods you're running again quay, clear, builder, repository, mirroring worker this doesn't matter. So this has no impact which also means it's the same price tag for a very small deployment but also for a very huge large scale deployment which spans across multiple data centers or regions. There are also no further subscriptions or costs associated with all the operators which run on open shifts as the destination target such as the container security operator or the quay bridge operator. They run on every open shift cluster and the only reason so they require a quay subscription in the sense of you can only use the container security operator if you are using quay and then quay requires the subscription but the container security operator itself or what you're running on open shift does not require an additional subscription. So if you use one quay deployment to serve content to 1,000 open shift clusters it's perfectly fine to install 1,000 container security operators and 1,000 bridge operators on all those clusters without having any additional costs. It also doesn't matter again this sizing is what I already mentioned how big the sizing of the underlying infrastructure is whether it's a standalone host multiple of them if it's in one data center or multiple one this doesn't matter again the requirement is it's one shared database and one shared storage bag and typically there was a load balancer in front of those to ensure that it's only one it's it's worse to be called out here that as of today we do not support replicas for the database neither read only nor active active setups are currently supported we are working on such things for future versions of quay and again the number of destination targets doesn't matter it's really it doesn't have an impact on the subscription and one of the last points was HA and this is probably a little bit more complicated the entire topic as I already mentioned a couple of times the operators the containers it's also the parts quay and clear and mirroring and the builders those are stateless components and effectively the same applies to the quay operator which manage manage those different parts and ensures the deployment and stuff it is yeah so all of them are stateless yeah so the only stateful components which belong to quay but not are not part of the let's say the livable unit of the product is storage the database and the readers cache and I already mentioned the readers cache is stateful but it's less critical so it doesn't require HA you can of course still run it in HA mode but you don't have to but storage and database are the most critical thing and then there are a couple of other things of course if we are running containers or parts then of course somebody needs to take care that those are kind of highly available as well but this is automatically done by Kubernetes or OpuShift in front of those different parts you have to run in load balancer of course this load balancer needs to be available as well and again in future versions of the operators we will do a better management of all the different workloads leveraging all the stuff which already exists such as the health checks all the parameters and point and stuff like this the infrastructure HA typically is achieved by the Kubernetes platforms if a node goes down automatically Kubernetes or OpuShift ensures that the workload is still is moving over to another load and of course the same applies to the entire infrastructure so data center goes down and stuff like this so there is no difference here between all the other workloads you're running somewhere um I already mentioned all of those points which are shown here it's worth to mention that we have a dedicated guide which talks about query HA high availability setup and we will probably extend and improve this guide in in the near future as well to incorporate a couple of changes and a couple of things we want to get in there for storage backends again the recommended way to use storage especially on OpuShift is OpuShift container storage we are using the Nuba multi-cloud object gateway but also plain OCS out of the box features a couple of great capabilities in order to to help us to achieve the HA for the storage so we have by default we have three different replicas which are automatically created and the node failover is automatically handled we have a couple of features on the OCS side which ensure that the storage backend which again is the second most critical backend for quay is always there and up and running and available and accessible by quay and the same applies for for theft there are a couple of options for plain theft to achieve HA there's also a lot of documentation and guidance out there as well um on the database side again the most critical thing it's important to know that the redhead provided database images for post-questionary b and mysql are not supported in production workloads so the support limitations which are linked here from this slide make clear that this is not the recommended way to run the database on OpuShift in an HA fashion and that's why the recommendation is that typically our customers already have an HA database service for post-quest somewhere which is operated by a dba team which is professional and well maintained and it's also used by many other different applications and business critical services the customer is running in public load again you can use the database service provided by the public load provider and typically the cloud provider takes care of its availability or you can use alternatively redhead partner offerings such as the crunchy operator and again for the components itself so all the parts and images this is more or less automatically done with Kubernetes and OpenShift and the auto heating capabilities in there we recommend to use these three parts for each query and clear in HA setups and then the query operators monitors the health of those parts why are they health check and respins it if it's needed and multi-site setups should effectively run parts on both sides which means you have two distinct query clusters but since you are still using the same database it's one query deployment from a subscription perspective. Coming back to the database operator I briefly mentioned the crunchy operator so this is one of the ecosystem partners we've been working with closely and effectively on this side crunchy tries to explain why you need to run an operator for stateful applications such as a database and there's another the other reason why we recommend partner offerings here for the database operator is not only the HA capabilities but also all the additional feature those let's say market leaders offer in this area such as database backup disaster recovery failover monitoring all those great features which are not included in the database offerings we ship and with that I'm done for this day zero session recording I hope you enjoyed it I hope you learned a lot and I hope that I answered all the questions you wanted to ask all the time many thanks for watching take care