 Welcome everyone on DEF CONF, it's the 12th annual conference that we've been doing here for way too long. I'm glad to see some old faces here, friends, also a lot of new people who has never been to the conference, so let's start with this quick show of hands for how many of you this is the first time you're attending DEF CONF. That's quite a lot of people, yeah. Okay, that's good because I have a few things about DEF CONF prepared here and I can go through it now, that's awesome. Maybe I'll try one more question like this for how many of you has been to every single DEF CONF so far? Wow, er, er, some. Cool, so first thing first, DEF CONF, what is it about? You're probably here to listen to some great talks and interesting presentations and things like this, but I want to remind you a few things. How much did you pay for the ticket? What's the price on the black market these days? Let's put it this way, if you paid zero for the ticket you're automatically becoming a volunteer here helping with the conference. So that's the whole notion of the conference. There's pretty much one person who's been paid for organizing this thing, but the whole thing is organized by a group of roughly hundred volunteers. You can actually recognize them because they have this nice black hoodies around here. Pavel, thanks for standing up and showing the hoodie to everyone. And that's one thing I want to ask you about if you see something, if you have some questions, if you need some help, these people are here to help. So these are the black hoodies people. The grey hoodies people are presenters and I want to basically remind you that this is your opportunity to talk to them, give them feedback, socialize with them. So keep in mind that these people know a lot about their subject matter and you should talk to them for sure. There's one more important thing about the speakers. When we created the whole schedule, the whole agenda, we intentionally tried to mix first-time speakers and presenters who are well established and who's been presenting on every single death conf right then. So keep that in mind please, give these people feedback, talk to them after the presentation, talk to them about how they're doing with their presentation and what they can improve and things like this. Again, for a lot of them, this is the first time they're talking in front of an audience like this and it's a brand new experience for them. There's going to be roughly 1,700 attendees here, so that's quite a lot of people. For the keynotes, we have to apologize, but the streaming doesn't work. I won't mention why. So we only have this room, we were planning to have the rooms downstairs available as well. So for the keynotes, we're allowed to have this many people in this room, but for any other presentation, we're only allowed to have the people sitting in the room not standing or sitting on the stairs. So please keep that in mind. The session chairs, the volunteers will always remind you that the room is already full and will guide you to other presentations that are not that full. So please respect them in that manner. What else? I already mentioned, give us a feedback, but there are also feedback forms. The QR codes are somewhere printed around this venue. So tell us what you think about the conference. There's a telegram group for all the attendees. So you can share your feedback there as well. And there's a Twitter tag, defconf underscore under, what is the thing? It's underscore? Yeah. underscore CZ. It's written somewhere around here. So use that for sharing pictures, sharing your thoughts. We're watching that we'll be answering questions. Talking about sharing stuff. This is still a university Wi-Fi, so I can guarantee you it's going to be a bit flaky. So keep that in mind, of course. What else? There'll be some interesting events happening. So as you probably know, there's a party tomorrow. The tickets will be available tomorrow, right? And we don't want to give you the tickets ahead of time. You still have to come to the conference and pay attention to the right time when it's going to be announced that the tickets are available. So keep that in mind. And the last session of the conference is becoming already a tradition that we'll be doing this trivia quiz and we'll be handing out some interesting prizes. So make sure you stay here till the very last session on Sunday. If you do, you'll get something for that. What else? One thing that I have to do for sure is I need to thank the university here. So we're using this beautiful venue again this year. And we promise the university that we'll return it back in the same shape as we got it. So please help us. Please don't bring any food into the rooms. Please, if you see something broken, something not working, let us know about it immediately because we have to deal with it and solve it. Which reminds me, we're trying to be a little more eco-friendly this time. So we have these beautiful cups that you can get in front of the reception, basically, where the coffee stand is. Please keep in mind that there's one cup per person. I've already seen some people getting more of them. There's just one relief per person and reuse it as much as you can. So we don't produce a huge amount of waste here. What else? I think that's pretty much it. What I have here, maybe one more thing. Different than last year, we have the Ventana coffee shop open. This is across the street from the main entrance. You can go there as well. You've probably seen some food trucks that are already here on the main square. There's one in the back as well. There are four in the back. Something like that. I don't get your sign language, sorry. The point is that there are food trucks all over the place. If you use that small tunnel over there, there's a parking lot in the back. There are probably some more food trucks and you can always go there and grab some food. I think that's pretty much it. So enjoy the conference. Enjoy the great keynotes. I'm going to talk quickly about the keynotes. We've picked the keynotes by looking at some interesting trends and interesting technologies. So today it's going to be an interesting trend that is influencing most of the IT and software development. So this is going to be Paul, KB and Jeremy. You guys are all distinguished engineers. Paul, you're not. You're a principal engineer. All right. So the distinguished engineer for me is those are basically the people I usually sit and quietly listen to, because they have tons of experience. And I know Jeremy has a lot of experience with running cloud.com, OpenShift Online as well, right? And things like this. KB has a lot of experience in CentOS, running the CentOS infrastructure. And we know each other for ages already. So I think this is going to be a great keynote. Tomorrow we'll be talking about some AI and machine learning topics. And then on Sunday, we have an interesting presentation from Leslie, which is more on the soft skills side. So interesting keynotes, all of them. Enjoy them. And again, share your feedback, ask questions, these guys. So that's it. It's all up to you guys now. Thank you. Hi, guys. Can you hear me? Yeah. So I think the only distinguished engineer here is actually Jeremy. I like to think of myself as more of an distinguished engineer. And Paul actually represents an engineer who does stuff. But Radek, again, has left the building. Thank you for having us here. It's great to come across. We're a very young organization. Let me see if this works. Yeah, there we go. Maybe? No. So while we are seeing that, there we go. So the service delivery organization, right? So we're a young group. We're about a year and a half in the process. So I think we're sort of, you know, maturing well, full-bodied, hopefully, you know, and soon we'll taste well as well. We execute as a part of an engineering group. We have a part of P&T. We're very much engineering-focused, full function. So we do our own quality work. We do our own reliability work. We do our own development work. We do our own service and support for our own infrastructure and for our tenants as well. But at the heart of the whole piece is that we're what we like to think as a modern practices-driven service organization. We thoroughly enjoy what we do. We're very excited about where we are going and where we've been able to help Red Hat come along with us as well. And over the course of the next 45 minutes or so, we'll try and share a very high-level view of some of, I would say, a very small sliver of the interesting things that we're working on. And I hope, you know, you guys get interested in. If you want to know more, we're going to be here for the next three days. There's about a dozen other people here from the service organization. Feel free to reach out. We even have stickers. So what we actually do fundamentally is that we're an engineering group that has a primary directive, which is to figure out what running services actually means, right? From how do you develop them? How do you plan them? How do you go to market with a service? How do you life-cycle a service? How do you support a customer in service? How do you do very high-velocity turnover in cloud-native patterns using modern system expectations and being able to meet the customers where we think the customers are going as the next leapfrog, right? Moving away from static, slow-moving product spaces. The second part of what we're doing is that Red Hat has invested extensively in the model where we believe that a very large portion of our customers are invested in hybrid cloud patterns. They're going to look at multifaceted infrastructure, and there is no hybrid pattern unless you've got a managed service as a part of that. When a customer invests in the fact that, hey, I'm going to use a cloud, he's already investing in a situation where he's agreed that he doesn't have metal access. He doesn't have physical access. He doesn't need that physical access in order to succeed for his business purpose, right? So we represent the Red Hat interface into what that managed service would look like and how we would deliver Red Hat's product portfolio as a service so that we can go complete the hybrid loop, right? How the organization is set up, we execute across four distinct functions. We have our platform engineering group, which is traditionally the more operationally focused SRE group. They're the guys who engage extensively with the OpenShift teams and the other platform teams, including storage, including networking, including Linux engineering, and they deliver our OpenShift dedicated product and the Azure Red Hat OpenShift product, right? The base fundamental product, working across dozens of flavors, hundreds of clusters, thousands and thousands of nodes, and doing a really good job of it, exposing that as a customer facing product, right? I'm sure most people have heard of OpenShift dedicated here. The second layer above that is our tenant operations, the application SRE team, right? The team that Paul's going to help talk through a little bit. They're invested in consuming OpenShift dedicated, consuming Azure Red Hat OpenShift and Red Hat platforms in order to deliver our application portfolio above that, right? So think of it as if Red Hat is running a SaaS, if Red Hat is running a managed service, it is the tenant operations team that facilitates that, right? So they consume the platform just as a customer would and then deliver above that, right? The third, the third layer within the organization, the third functional piece is the service development group, right? They are the guys who are building out the OpenShift cluster manager, OCM. They are the people who are building out the self-service portal that we then use for all of our toll booth pieces, our subscription management, OpenShift dedicated, etc., right? They are the guys who are basically invested in building out the tools that are required for other groups across Red Hat and for service delivery to go succeed, right? The application development piece as it were. And then the fourth part, the fourth group that complements the whole stack is our emerging technologies and compliance group, right? So I'm going to hand over to Jeremy to talk to us, talk us through what reliability engineering is and what our interpretation of that looks like. Thanks, KB. Yeah. So I'm Jeremy Eater, one of several of the architects within the service delivery group. What I wanted to share with you today is two things. First, how we define the primary responsibilities of a service reliability engineer and then a couple of work, KB mentioned that we're primarily an engineering group. Some worked examples of cool stuff that we've built out of necessity to help operate the platforms that we're responsible for. So who's familiar with as well as hierarchy of needs? This is the SRE hierarchy of needs. Look at the foundation there. Last time I was at DevConf I did a five minute lightning talk on Prometheus and observability. So connecting these two talks fundamentally, we need to know. It is our job to know what is happening at all times on the platform. Calling that observability, that includes monitoring, logging, tracing, so forth. Our teams, including the, well, all of our SRE teams are involved in incident response. Paul's going to take us through a worked example of incident response and post-mortems. Who's sort of the term garbage and garbage out? So we have pipelines that ensure there is as little garbage. I take that, I'm saying this lightly, ingest that the stuff going into our production environments is as qualified as possible. And I'll have some examples of how we do that, how we do that shortly. We're involved in capacity planning. As you can imagine, if marketing wants to do a push for a managed service, we would need to know in advance and be able to know what our services are capable of so that we can budget for the increased load that might occur. Everybody in the team is a developer at some level. I think that's one of the key traits here. Not only do you need to know how to use the software, but you need to know how to fix the software or improve the software. And it all funnels up to our product, our managed services product. So this is what each of our SREs are tasked to understand and to actualize. A couple of worked examples. Our primary cloud right now is AWS. So if you sign up for an OpenShift dedicated cluster, you will get it on AWS. If you sign up for an Arrow cluster, that will be on Microsoft's cloud. This example is called, we've written, I think there's over eight or nine operators that we've written. I think there's actually more than that to run the platform. The first example I've got here is an AWS account operator. What does this guy do? For security purposes, every OSD is in its own VPC. That's a virtual private cloud. I think it's an AWS term. And it's essentially a security bucket. Everything, all the resources in that VPC are scoped to just that account. We used it for isolation between our tenants. How do we actualize the creation of those accounts? We tie them to users and how do we stand up clusters within those accounts? Well, it turns out that using AWS' APIs while they're fantastic, they can be slow and they can be missing features that our team actually needs. For responsiveness, we want to make sure that clusters can be turned on as quickly as possible. Who's installed OpenShift before? Okay. Well, it takes about 35 minutes, sometimes longer and even worse if the cloud provider's having a bad day. So what we want to do is make sure our layer is as minimal as possible in terms of delay. So we preallocate accounts. We actually pre-provision instances in those accounts that kind of warm them, so to say. These are in reaction to behaviors we've seen AWS do. So for example, initial instance provisioning in a VPC can be delayed if you've got a brand new fresh VPC. So we do that. We prime those VPCs. As a managed service, over the last couple of years we've gotten involved kind of at a little bit too low level, I would say, on each individual cluster. What we're trying to do now is bring everything up to a self-service capability. One of the things that the AWS account operators helping us with is to allow customers to be able to peer, what's called peer, which is like a VPN, between their OSD and their internal workstations, for example, their internal networks. So they can do development against OSD without actually going over the internet. Second example, if you've installed OpenShift you've seen the self-signed certificates that are on the API server and that are on the console. And that makes sense for a product default because a lot of our larger customers have their own CAs internally or want to own their own certificate life cycles and so forth. For managed services a little bit less sense. We have an operator now that deals with let's encrypt. I'm sure you've heard of that. Certificate authority. What that guy will do is go out, request certificates, lay them down on the cluster and I think most importantly life cycle them so that we don't ever worry about these things getting expired. So we'll install the certs, cycle the console operator to kind of kick in so that they actually are kicked in and yeah so then you have a real certificate when you get an OpenShift dedicated cluster right now. So that's the second example of something that we've done to improve the product experience and I think it's basically what someone would expect I would think out of managed service. Thirdly application interface. I mentioned garbage in garbage out. Well that pipeline is encapsulated in a GitOps driven workflow we call the app interface. Paul's going to go into it in a lot more detail. I think this may be one of the coolest things we're doing right now. It's certainly very it's very unique and forward-looking in terms of it's in terms of how modern it really is. So we manage everything through this app interface. If you want to bring on a managed service you will end up in some kind of service delivery GitOps flow and there's many maybe in the audience today who've already engaged us engaged with us on on an integration in the app interface. When you put something in there just like GitHub there are PR checks and we try and keep trying to standardize the PRs that are coming in in terms of like best practices that we learn of. Are you setting requests and limits? Things like that. I wish we could enforce pod disruption budgets but certain applications aren't ready for those so we can't really enforce them even though they are indeed unknown best practice. So that's the app interface and again like I said Paul mentioned a lot more later. The manifest balancer I was on a call with a customer I don't know three or four days ago and they were worried about their deployment flow and how they they actually broke their production because the developer merged some code that wasn't didn't have the correct syntax in its Kubernetes in the YAML the OpenShift YAML of deployment YAML I should say. So while I said we're working on tools we have the same problem everybody's got the same problem. We have a tool for this. Every time a PR hits the app interface it will go through a series of PR checks. The manifest balancer is the mechanism by which those PR checks are executed. That's a deliverable from the application SRE team as well. Quite honestly this was built for survival too. We want to make sure that our applications are supportable. Things that were on the hook to SLA. Paul. Thank you. Hi everyone my name is Paul. I'm the team lead for the application SRE team or the app SRE team for short. I'm based out of Oslo so it's nice being able to travel to some warmer climate for the conference. And so as Jeremy pointed out the service delivery org executes in a layer SRE model. So we consume the platforms in the way that any customer would. So this allows us to focus on what we need to be focusing on. The application experience, the application delivery. It also allows us to feed back into the platform and possibly predict customer requirements internally. So we have a concept called the app contract. While we engage as a customer with the platform the app contract defines how the application SRE team engages with the services or our tenants, our customers. We define this as a set of structured schemas with required fields, optional fields, defining the applications, the contacts, metadata around, but also runtime configuration and specifics concerning the applications themselves. This means that developers can focus on delivering the code while we handle day two operations which don't necessarily require developer interaction. We can help scale, we can cycle your TLS certificates, we can do a lot of these things that need to be done around the edges, primarily through automation. As I mentioned, it then defines the service relationship. There is a set of requirements. There is our model supports a opinionated specific way of delivering an application. You get a stage environment, you get a production environment. The developers are on the button, but you have to do your due diligence before you push stuff out. And this is all enforced through a formalized schema. So these schemas are not just metadata. The implementation, which I'll get to in a moment, is the schemas are loaded into a GraphQL instance, which allows us to query the desired state of applications at any given time. So we have a set of integrations running against this data which will continuously reconcile the target state of the application on the clusters themselves or the backing databases or all the other things that we deliver, access, user access. And the key thing is that if you follow our way, we will take your SLOs, which we provide you a way to define, and we will make them our SLOs and we will help you work towards those goals. So a little bit more on the app interface. It's the app contract implementation. It enforces a schema, and it loads the state into GraphQL, which allows us to reconcile against that. And it's a GitHub-driven workflow. The schemas themselves, living Git, the developers offer their merge requests into there. We will sign off, merge, and it's in production. So I'll go back, actually. I'm going to run you through an incident that we experienced a little while back. It's, I think one of many incidents. This particular one was in a gray area where the service was neither down nor up. As such, not clearly as a nonoff switch. But it's not the incident itself I want to focus on. It's the process we went through and how we worked through that incident to improve our flow. So there's a quick slide here with the events that transcribed. We were made aware of an incident affecting cloudreddit.com and it turned out that this was an issue with a subcomponent in the single sign-on service. We were able to identify this. We were able to, through the health metrics of the application, realize that it was operating as it should because it was serving existing tokens, but new logins were affected. So through engaging with reddit through the formalized normal channels, we were able to restore service pretty quickly. This all transcribed in about, I put 90 minutes in there with some troubleshooting close to an hour. It's probably true. And some would probably say, well, yes, okay, you've notified the IT department that corrected the error, outages mitigated everything's over. But as we did the RCA work, we realized that there were a couple of faults in how we'd been approaching this outage. We had only been troubleshooting the dependency, the dependent service, SSO, through one of our applications. And we could have been warned earlier. So here's an example of an app definition. It is edited. This is not complete. And it's not the full picture. I put a couple of the, I'm not going to go through it in detail. But what we had in there was that we had the dependencies explicitly defined as app interface objects. Because we want to know what, what dependencies, what external dependencies your service has, enabled to map out where it integrates into. We also realized that we had the monitoring stack, our observability stack, already deployed through the app interface, doing all the monitoring for all our applications. And that simply meant that we could, in the dependencies definitions which I certainly do not have a slide on, we could just add a monitoring section and then point the monitoring reconciliation loop to actually look at the dependencies as well. So what this means is that every application from now on that has any dependency can just define it with this one line. It's the, well, any one dependency. You point a reference to a commonly defined dependency and it will be monitored. And that allows us also to do the opposite. Instead of figuring out what dependency a specific service has, we can do the reverse query and we can say when we're alerted to the, an SSO outage for example or any other kind of outage in any other dependency, we can do the reverse query and we can immediately identify which services across our stack is, are impacted. And yes, and coupling that up with the metadata that we have about each service, we also know where to notify. We know all the contact points and we have those in a couple of lines of code or queries. So guess the key takeaway is how we, how our technology stack, how our GitHub server and workflow, how our abstraction layers allows us to rapidly evolve and evolve through the post-mortem culture. And that's the fun thing when you realize that you have all the building blocks and we can just do this, that's a great place to be for me and my team. Thank you. Thanks both. Let me see if you can move forward. Yeah. It's actually pretty good that as a developer, when you submit an application through the application interface, you get, okay, you get an opinionated flow through. The app team will want you to do things in a particular way. Credentials have to be shipped in a particular way. Your deliveries have to work in a particular way. Your CI has to work in a particular way. But in return for that, they carry the front-line page of for you. So they can do the guaranteeing work to make sure that what you're pushing out there works and works consistently. So one of the questions that we've been working on over the last six months and spending quite a lot of time on this is that we do a lot of work to do quality assurance, to do reliability in-service, to do integration work, to do all of the liaison, the associations through the stack, you know, all the way from open-shift engineering to quality delivery, et cetera, getting products out that we SLA, et cetera. But what can we do now as a next step that enables other teams at Red Hat to come and consume the processes and be able to use what we are doing for their own benefit, right, for their own wins? How can we scale ourselves in a way that other teams at Red Hat can also come and benefit from the work that we are doing? Before we go down that way, right, I want to bring up one key consideration here. So there's typically two kinds of services that you run in a managed environment, right? So the first part of it is a SaaS. You have a multi-tenant infrastructure, you have a dedicated developer team behind it, you have a dedicated SRE team looking at it, you have dedicated infrastructure that you can scale for its own merit, you lifecycle it, you support it, right, but it's a single SaaS. So think of it like, I don't know, hosted, Bugzilla, for example, right, is a great example of a SaaS potentially. The second model that we're trying to bring through here, that we're trying to crystallize and formalize is the idea of a managed instance delivered by a single developer team. So the differentiation there becomes you have a one to SaaS, one developer team, one product delivering a SaaS versus one developer, one product team delivering to many instances of that service but in a managed environment, right? So think of it like OpenShift dedicated versus OCP. OpenShift dedicated hundreds of instances coming from the OpenShift product, right? So that's an example of one to many. You could think of a hosted managed MySQL instance, for example, as a one to many kind of a model as well, right? A lot of the work that we do, like a lot of the work that Paul was talking about, stuff that we've spoken about earlier in the day to day, a lot of that was built out to essentially solve the problem for how does Red Hat solve the SaaS issue? Like how do we get into a model where we can deliver a million accounts run through a single instance of the software, right? So we are now at a point where we're extending that model to get to a point where any developer team, any product team at Red Hat is able to deliver single instances, hosted multiple times, dedicated for their customer, right? So think of it like a dedicated Quay, you know, you wake up one morning and you say, you know, my life is complete, but it would be really good if I had my own little pet Quay that nobody else could get to, right? And we could do that, right? So you have a dedicated Quay instance and you could have a hundred customers running dedicated Quay instances that are their own, right? Coming from a single team, right? So what we are doing in this place is we've got something called the add-ons flow, right? It's a managed tenants process and we extend this model in a way that we don't interfere with anything that you guys do as developers or from your business perspective or from a product management or from your program management, right? So you build the software you need to build, you build it the way you want to build it, you deliver it the way you feel most confident, right? And then what we do is we meet you along that way, along that path of delivery. So come join us, come participate with us on the automation stack, right? Come join us in the validation stack. Come join us in a way that we can help you lifecycle the whole confidence, right? And get to a point, right? So why do it like this? What we can bring to the table here is that we have a platform SRE team that can engage with you guys on best practices and how to consume OpenShift, how to do your resource allocation, how to do bin packing, how to do scale-out work, how to do your security work, what is the infosec implications of certain pieces? We might already have infosec approval for certain patterns that you can land on the back off, right? So you don't have to do that work. The tenant applications team, Paul's team, already runs applications, right? They've got a hundred plus microservices in production today. A lot of the names that you would see on Red Hat's properties, cloud.redhat.com, for example, is run through the service delivery organization. How do we get to a point where you guys could then use the work that the tenant team is doing, the application SRE team is doing, to basically scale your own application, deliver your own applications, right? Use their CI, use their monitoring, use their pager infrastructure, right? We have a process that let's you do it today, right? Extending that model further down the road becomes that the SRE, the SRE flow, the entire feedback flow, when you deliver a managed instance of your software, what we do is we work with the support teams to bring the support interfaces down to a point where it is consistent with all of our other managed services. So if a customer has three or four or five different pieces or you have a hundred different customers with, let's say two each, the support experience, the engagement model for each of those customers remains consistent. You don't have to talk to a different group at Red Hat if you're using a different product. You can expect that every different group at Red Hat is using the same model to deliver the managed interface, right? So you can set expectations against it. The other key part of it becomes that as the platform expands, like for example Arrow, Arrow V4 is expected soon. If you're already invested with us in our processes and our policies, you will get Arrow enablement for free when Arrow comes up, right? OpenShift dedicated goes to GCP or to, you know, let's say, the next big cloud without taking any names. You get that enablement for free because you can make assumptions that our interfaces are consistent, our models are consistent and we've delivered an expectations-based reliable service against that, right? So this stack is available today. We have almost a dozen teams in different stages of maturity through the stack and it is available today. So this is not something that we're projecting or whatever. This is stuff that is implemented in place, the one-to-many delivery model, right? What does this mean for the customer? So if you look through our cell service portal, from a customer standpoint, the product that you guys are shipping, let's call it, I don't know, let's call it code-ready workspaces, for example, right? The skew reconciliation behind the scenes happens automatically. So when a customer goes out and buys code-ready workspaces, a button will show up in their console that allows them to contextualize a cluster with that particular product, right? With no intervention, no feedback, nothing, right? It's an automated process, reconciliation happening behind the scenes. The delivery is guaranteed. We'll talk about that a little bit down the road. But we get up to a very high level of guarantee. The code is tested through RCI-CD processes. So as OpenShift-dedicated evolves, as Arrow evolves, you know that you're going to get rapid feedback. Like I'll give an example, 4.2.11, OCP going to 4.2.12 might potentially have an implication on your app. Typically, 72 hours before it goes to production, you will get a feedback loop to say, hey, we think your RCI is going to fail, and we think your impact is going to look like X, right? Forty clusters potentially are going to go down in this particular kind of way, through our automation harness. And this is not something we have an option on doing. This is a part of our reliability work that we invest in already. The other part of this is that you could park a lot of that conversation and you could say, what if I just want to consume a white-labeled OpenShift-dedicated or I want to consume a white-labeled application SRD stack, right? You can. apiopenshift.com exposes the full feature function set. So if you want to go and write your own integration suites in Bash or you want to integrate OpenShift-dedicated into your CI pipelines or you want to go integrate any of the managed components into software that you ship, services that you ship that you want to control the customer interface with, there's an API that lets you go do that today, right? Don't want to use an API? We have SDKs that are available that let you go do the same thing. Don't want to use an SDK? There's a command line interface that you could use that we ship because I believe there's a copper for it. And it's available on Apple and in Fedora. That allows you to just go directly engage with apiopenshift.com. Let me hand over to Jeremy who will talk us through one of our implementations here. Okay, cool. So it's DNF Copper Enable OCM, DNF Install OCM CLI. Okay, so we've got all of these building blocks in place. All of that infrastructure that helps abstract AWS, it helps abstract OpenShift. Let's talk about the actual onboarding of an application, which by the way, it's official. Quay is pronounced Quay, according to this shirt. I don't know where you're from, but Quay, according to this shirt, it's Quay. So let's talk about how this went from ideation to production in 2019. So before I get into that, a couple of things were already running. Some of these services you may recognize sometimes not, but in addition to Quay, there's a bunch of other stuff in prod already. Here's the timeline, May of 2019. Actually, a little bit before that, but let's call it May. We decided to, as a team, there will be a project that gets Quay.io, the SaaS, not the pet enterprise Quay that KB was mentioning. We're talking about, if you go to Quay.io, shared infrastructure, that thing will eventually run on an OpenShift dedicated for cluster. This timeline represents the path it's like to get there. So the first was to say, let's talk about people and process. We need to make sure we have the SREs in place to be able to do this. By June, all the teams were clicking and executing. They were making operators out of their stuff. They were realizing that their builders are currently on Kubernetes. That's kind of a bad place to be. I mean, that's legacy Quay stuff from prior to the CoreOS acquisition. They moved their stuff to OpenShift container platform, not OSD. That thing needs to run on bare metal, so it's running on packet.net. They found some other service dependencies like they wanted to move from SCD to one of AWS's managed in-memory databases called ElastiCache. That was June. By November, we're ready to start doing real migration. So the prep work is three, four, five months long, maybe one quarter. A couple of things that needed to happen before they could do an OSD deployment. They need to move from some older AWS infrastructure mentioned legacy setup over to the virtual private cloud side of AWS. And then the SRE management team handled some knowledge transfers. So like Quay has been running in prod for many years, right? For several years. They had all kinds of operational knowledge. Now they're attempting to have other people run this service for them. Carry the pager. All of those checks in the app interface are in place. There's also a human element where we need to have a transition and then this timeframe is when we handle it. So there's, I don't know, half dozen knowledge transfer meetings for the SRE team, our SRE team to take over for the Quay, really the Quay founders, quite honestly. Fast forward to December. We've got two environments set up. We've got Quay.io prod, the old stuff, and we've got Quay.io running on OpenShift dedicated, which incidentally uses the add-ons flow as part of it, the thing that KB just described. So like any other production service, we're going to canary this guy. They're going to slowly, slowly move percentages of load over to the new service. 5% watch the observability graphs. 10% watch the observability graphs. All the way up 50, 100, within a day or two, all of the load for the control plane was on OSD. So I think that was December 18th and was quite an achievement to get done right before the shutdown, but the team did it. The existing infrastructure was still there, just in case, right? As you can imagine, they're going to turn that off before we're ready. Eventually, the folks didn't ever need to migrate back. So that was the official cut over. End of December, OpenShift, sorry, Quay.io running on OpenShift dedicated. And I can see engagements with other groups taking this basic flow. It's like scoping, pre-work, knowledge transfer, and then the actual execution phase. They have a couple of things in their roadmap that they know they need to improve. They do build, like I mentioned, they're doing builds on bare metal for security purposes. We need to figure out how to handle that in OpenShift dedicated. Couple of other things they need to handle to kind of round this out, and that'll happen in the fullness of time. That's how it was onboarded. I can't end a presentation without some graphs or I get docked salary, so here you go. This is what their observability dashboard currently looks like for Quay.io. This is what they're looking at now and this is what they will look to improve over the year. Last thing I will mention. Now that we have Quay.io on OpenShift dedicated, we have a pretty full story for a release of OpenShift software, and we have some pretty frequent touch points with every OpenShift customer. I've got almost in parentheses because I was hoping to say we're done, but we're not done with the origin CI migration that's ongoing. We are done with a Cincinnati migration. Has anybody heard of Cincinnati? Yeah. All this guy is, it's a simple, stateless microservice. Every OpenShift cluster goes out, hits this endpoint to figure out if there is an upgrade available for them. Nothing more. We host that on OpenShift dedicated for as of I think last week. Quay, again, already on OpenShift 4. Telemeter, which is our, some folks maybe I think in this office working on Telemeter on the Bernou area, the backhaul of performance and infrastructure metrics from every OpenShift cluster in the world. If you haven't seen some of our Telemeter dashboards, they are fantastic. You can zoom into a particular cluster, you can slice and dice the data in really any way. As experienced as you are with PromQL is how much data you can fish out of this thing. We also handle the initial customer engagement through, we call a service called Toebooth and we mention we do subscription management. So we also handle try.OpenShift.com. All of this together means that OpenShift itself is dependent on OpenShift itself in a managed way. So I think this is the prime example of how Red Hat's SRE life needs to evolve and it needs to continue on a path like this one. Couple of other teams you may recognize the names of that are in-flight right now. We're bringing on through, likely through the add-ons flow, all of these stacks, some further along than others. KB mentioned Code Ready, Insights Platform, API Management. Obviously we need storage. If anyone's working on OCS, we're gonna figure out how to run that on OSD and then our middleware folks as well very popular offering called RHMI. So we'll be able to run those as soon as they run through the same flow that Quay went through. We'll be able to check all of these boxes. Check all of these boxes as well. Thanks, thanks Jeremy. So anybody here working on OCS? What are you guys doing at a conference? Shouldn't you be like back in the office working? Something like 60% of our customers are asking for OCS. So in case you need any motivation, anybody we've spoken to please come find us. So in conclusion, I think so we've been able to share with you guys a very small sliver of what we do. We're very excited about the work that we're undertaking right now. We think we're making great progress. We're a small team. The SD org today is about 80 people and growing. But we remain committed to the process that we're working on, right? Now from a developer perspective, like if you were looking at, if you're looking at and working in a team that has an execution path for managed services, there are a couple of fundamental things that you always want to be thinking about, right? So we spoke about observability. It's very important that when you're writing your code, when you're designing your code, you should also be thinking about how you're going to monitor your code. The big paradigm shift that I know Red Hat is going through today, right? This is a whole emotional journey and then there's a technical journey. Getting away from delivering products which are one-shot version deliveries to actually keeping something running where potentially your QE cycle only lasts five minutes, right? Because something worked, you can only make a confidence judgment that, hey, I think it's going to work for at least another five minutes. After that, I don't know, I'll have to test again. That's a big paradigm shift that we are trying to drive through the different organizations, right? And so when you're building the software, you are best positioned to make decisions on what you should be monitoring, right? What are the parameters that are interesting? What are the parameters that are critical that somebody should be alerted on? If somebody is alerted, what would the actions there look like, right? And then come and participate. So let me just kind of point out that we have a, if you guys have more questions, we have an office hours that we're running. There's a link that we can pass around next week. Please come join us and then we'll try and make it a more regular thing. You guys can come and engage with us at an open forum if you don't want to do it through official stuff. And then a critical piece, right? It's something that I really enjoy. Nothing that we talk about, nothing that we do is future projections, right? All of our numbers, all of our metrics are driven by quantified data, right? So if you were in the business of trying to get a managed service or trying to run a service that you wanted some level of assurance on, would you want to run it on a platform that gets you potentially three nines? We had a little conversation earlier today because we actually do hit three nines. But if you stretch it over the last year, we had a couple of bad days. And so we actually drop a couple of percentage points there, right? And so the question then becomes, do you want your service to go hit the same numbers? If you do, then we can help. We have the processes, we have the people and we're here to help, we really liked it. So on that, let me say thank you.