 Okay, no, you're not turned on you need to reach back and turn your button on am I on am I coming into the room? Okay, good. Thank you. Okay. I can hear you much louder though. Yeah, I don't think your mic is on is it hello Yeah, maybe maybe you need to little below your chin Yeah, okay one two one two Do you want to let's do the introduction let's do the introduction. Hi everybody. Good afternoon. I'm Karsten Wade I'm a principal community architect in red hat in the open-source program office, and my name is Marcel Hilt I'm a senior manager in the red hat emerging technologies group and we do a lot of Bleeding edge engineering looking forward to what's next on the horizon for two years five years ten years And one of the subjects that we'll be talking about is to open up operations and Other stuff. Yeah another step And so and in order to talk to talk about opening up operations and the reasons for doing that We're going to go in and talk about some examples that have actually been benefiting From having this kind of open operations approach and this idea of operate first is a kind of corollary to upstream first The of running your operations, but you know when your code in operations in a real environment before you release it as ready to go so and So first let's so first I want to go we're gonna we're gonna Take a moment to understand these the two projects that are underneath here in the conversation One is the OS climate and the other is operate first so OS climate is an approach to dealing with Modeling greenhouse. Okay, so there's a really long boring financial promise from what what I was thinking about was like I took was taking a page from like my colleague and friend Eric Who who who got me to think and I went back and looked at the travel app that I used to get here and found out that we It cost it was a 804 pounds of co2 were dumped into the atmosphere for my flight here And then 804 will be there for me on the way back And and so so how many people how many people in the audience have a The availability to look at that kind of detail or if you ever looked at it and notice that that's available for you To look at before Okay, so some of you so this is a detail in some of you because you're using the same travel app as me So that's part of the reason why right? So this is not available to everybody But the idea of being able to take a piece of information about the effect on climate change and Roll that into financial models is something that's beneficial to us as individuals, but when it comes to the hundred trillion dollar You know 10-year cap markets going on where you're looking at You're looking at risk profiles over a long time Those haven't been including climate science as part of their modeling and figuring how to get everybody together in order to do that the OS climate model is basically to use the open-source way of collaborating in the open and creating artifacts and works and materials and code and things you can use to To a scale and grow from and to do which is the open-source community part and and using that as a way to drive the innovation And then having a data commons where you keep track of everything from the relationships between the different Supply chains and companies so that you can track back where something came from to models and things that are helping Be implemented into your financial models and all of this around being able to to work as a community of practice about like the predictive Analysis and so forth like things that so they're learning from each other as the processes are going So jumping to operate first and we have this goal of trying to build this You know we're right now We're building this all open-source community cloud because we want to prove that this model of what we're talking about works So what is the model right in in open-source development? You've got an upstream project that's doing some development Everybody thankfully is buying into the CI gating model. It doesn't pass your tests. It doesn't get released. But then what happens you? Send it to your users, right? It's it's ready to go because it passed all your test But it hasn't done anything in the real world. All you've brought it up to is like day zero. Well, I shouldn't say zero day But anyways day zero Our model starts very similar Oh, and I just let me take a pause and of course when you when you've got that model Then once you've got the code in your hand then users either they could be looking at an advance or then hand then they can provide their Feedback to the upstream so it's kind of it's a loop that requires that release process in order for things to to be there So it's it's more difficult for automation and so forth I mean for for machine learning to come in and be able to to gather more information from that experience because there's a Lot of human gating going on at the old to believe the end But when we come down to the bottom model just after the CI if we deploy into a community cloud Into an environment that's that is an existing production environment But that is not the same expectation or agreement level as a full production environment for users Then they get their same return back mechanism to go back to developers but they also have a chance to influence the testing within the cloud environment itself and To test it against their own integrated environments You've got that model you can be working back from and then also to bring in multiple open source projects So for example OS climate on with open data hub and all the dynamics there are multiple projects Running as services and workloads in the operate first cloud community cloud in that The giving an opportunity to to cross-test and look at each other and get telemetry from each other and so forth So let me know after this show if that makes sense to any questions on that before I go on I want to get lost. Okay, cool. So let's take a look at the OS climate workloads and what's been going on in this world community So essentially what OS climate has been able to do is take advantage of the power of scaling that you can get by using open source innovation and open source development methodologies And and when it comes to and part of the innovation is that the end result you want something It's a durable artifact an actual thing that you can look at and that in the in the modern world is is a pull request or an issue In a in a code repository and to get repository and the context and conversations around that and being able to go back And look at that forever So in order to scale the community I'm having processes and tooling helps make the community self-service and self-teaching the The document when you've learned when you've read a document It both teaches you how to do something but then it often will teach you how to teach somebody else how to do it Just simply by even passing along the link. So it has a little slight viral nature Governance is a way of reflecting essentially human systems who how do I want to get the thing done who's in charge? How do I make it happen? What's the process? Is it fair all those things? I'll go in there and governance gives us a chance to reflect that in and out of the technology The software is a there are the this is what the open-source software is the part that we sort of proven already that we know that this Model works in that being able to produce these these findable and these durable artifacts So it could be a piece of content or a piece of code Allows us to build and evolve technology more rapidly than if we're doing it in a closed environment Just if you think of these durable artifacts as books in a library and it's a library that we're all constantly Evolving and growing into that how viable a resource that is No matter where you are Okay, so then the last one is operations So in the so the the more you make the difficult parts of things easy You look and find the most difficult thing and you make that easier and they keep working down the line And in open-source development It's always getting access to running instances of things to make stuff go well Our whole raison d'etre whatever is the is to be that running thing, right? Both theoretically you should have a running thing and what the heck let's build a running thing you can actually use So I'm gonna I'm just gonna go through these some quick snapshots screenshots You can look if you want to look at the pull request directly but in this See what if it got so in this example and basically there The process was there to onboard a new service and a community member When use the process to start a new to not a new service so so it was a self-teaching and and and You know self-operations or you know self-service basically In this case the That's so Trino Trino Trino popular, you know growing in popularity is a piece of software for Which this is the database the sequel injection stuff, right? So However, not everyone knows how to use it and deploy it So simply by putting together this pull request It's now a reference point for other people who wanted to learn and people have been using this And we've got some stories about customers and partners of Red Hat for example benefiting from this and it and you can imagine that if a Trino package software something being packaged as part of a Red Hat release would then have this put this piece of content or this piece of This piece of configurations an example into the documentation I mean it definitely has a feed in that goes in down and benefit So this is the next one here This is this is an example of governance and effect and this was where like in a conversation with with Kara We were making some decisions and started to pull together the the pull request And so this pull request was written why are we in a meeting and at the end of the meeting? She finished it up and submitted it and so then we hadn't we not just had a record of the conversation that she and I had which was something that couldn't scale that conversation couldn't scale But our record could scale and it could reduce something that caused further conversations that pulled everybody else into it And then being able to to have that be that durable artifact in time And then this is another example. This is an example of of operations benefiting by having By having access to what was this one? This is the Sandbox filter Yeah, so these are both examples of ones where where developer needed a developer wanted to make something happen operationally And did pull request and for the most part was all automated to have it happen there And it automatically grabs me says hey take a look and I take a look and say yeah looks good to me and boom It happens so much simpler lower process to be able to get things done Yeah, so those were those two examples were Oh, I was going backwards. Pardon me. I know I buttons. So so let me jump over to another half of this which is The SIG SRE which is a community of practice that we're just beginning to work with and the idea is that if we're gonna be Creating this open environment. We need to part of it. It's the policies. The process is the the And these days when you do a cloud environment It's over this site reliability engineering because being reliable for your users matching that service level is the goal of what we're here for right so So in order to do that we know since it doesn't exist beyond the great book from the from and blog our posts from Google engineers And so forth We've decided that what that the very least what we could do is start to gather things together By thinking about it as a community of practice and inviting that participation and so one so we have some participation from a couple of Different groups so far and one of them is that is a group of SREs Inside of right hat who are beginning to out to put out some of the materials I'm including architectural decision records, which is a single record that explains a technical decision So it's an even it's taken that durable artifact concept to another level where you're you're you're thinking about you're creating a Record for the artifact that the record is standardized and that the end result is in is a is the durable artifact, right? Another example is that we've started creating SRE training materials and because again all this stuff doesn't Exist we started it from the from the beginning level So the two courses up right now are around open source basics and a general introduction to the project as a first piece And that's part of and we'll be producing our own Pathway for how to be successful in github for example because that's a barrier to a lot of people when you say oh Just make a pull request that you know have the audience just said oh, that's not me I've never done that before and so being able to help make that accessible to everybody Picking the hardest thing to do and making it easy is part of the plan And then we've also been working on long-form content Things that are being put out in as blog posts, but ultimately they have a they have a green Just a general you see in a moment with a general URL is like so just again a couple screenshots and you can pull them up If you're interested in seeing more detail So this is an example of an architectural decision record all the conversation about it can happen in the pull request The end result is a record that actually both captures the meaning of the conversation and the conversations findable by it And so the and then which then are then repeatable out. I mean obviously you can You take this concept and you apply it to a real environment and these are concepts that have been that we applied to our real environment The operation first community cloud and have been applied to other real environments are ready to go the So this is the the training materials and so this is an example of an open source software basics and the I and when the OS climate community was coming together not everyone who came into the room had the same understanding of open source and so rather than just Figuring they'd figure it out amongst themselves later the community architects very wisely in my opinion God got together and pulled materials and did some training and it was really beneficial people learned a lot it built a relationships in the community about trust common understanding across the training and This is another example we one of the first new long pieces is up the URL as you can see is designed to be Permanent it's not dated. You know, it's just right there. Boom. Here's the cast testing guide. It goes with ADR number 13, I think So was in there somewhere number 12 and that is And so this is a cast testing guide that goes with the architectural decision-recking support So creating that whole entire body of pieces is is there and so these are just these are kind of just at the beginning We're looking to connect up with with SREs and there are people who are interested in learning about these practices and cream them for the beginning So now let's get surreal Okay, so What made open source software great was like you could dig into everything like from the Premise where the project started writes down into the code into every single detail of the code and the same thing we want to make Happen for operations for anything in the SRE world so what I want you to get from this part of the Talk is Something really actionable where you can use your hands and put them on a keyboard and do something right? So what cast and just Showed you where some larger communities actually already practicing this model like the hours climate community, which is a set of Yeah seasons engineers doing doing really really good stuff and Benefiting from the operate first community, but how do I approach this if I'm new to SRE practices? How do I approach this if I want to just learn how GitOps works and such with software? It's easy. I go to Stack Overflow I do some tutorials and I start doing stuff on my laptop. Is that so easy with SRE? Obviously not because you need a cloud environment to do so so In order to participate here you need to have just a GitHub handle So I remove the word just because it's obviously you don't also need to Sign up for GitHub. You need to have some interest of participating etc. But What do you see now is all accessible to you with a GitHub handle so you can go to your laptop after this talk Go to all the URLs and see what's happening here on the screen by yourself in a read-only fashion Obviously, so you we don't give you access to actually break the environment, but you can follow along So you can see real GitOps practices in your browser and start really from the from the beginning of Observing Looking playing with it doing some tweaks and then eventually also do your own stuff and The community shouldn't be also should also be inclusive to all personas not only on this scale of proficiency so from beginners to real experts, but also Really inclusive to every person that is involved in setting up in such an environment and it's not just developers. It's The people that operate the boxes It's the people that develop workloads. It's people that develop the the components of the platform It's obviously users. So without users you have a really sad deployment because then it's just your CI CD test But you want to have actual end users using your workloads using your environment so that you get that immediate feedback loop You want to have people that support people because a question asked It's not a dumb question. It's just something. That's maybe not Explained well enough or maybe it's a it's a deficiency in your product because it's not so easy to discover So I also want people that help people in that community to see all those questions raised And I want to have architects building out demo environments building out Like use cases in in a cloud environment where you use these Lego building blocks to stick something new together instead of Riding it from scratch so you can build upon the Lego blocks that others bring to that environment without setting it up yourself and Eventually so my background was also in AI ops. Hopefully at some point We also get all the machines there feed on that data and automate all the stuff so that we can do more more stuff So in one sentence, we want to build a hybrid cloud with a full visibility into the operations center So make it really open From the API endpoints from the user interface But also back into the ops center and by by definition every Deployment is somewhat proprietary because you don't want to open it up to you don't want to open up your logs because you have PII information there you have customer success in it so Usually a cloud deployment stops at the user level or at the documentation level and this is what made Software great by opening it up and now we want to do that same for operations and for services So a quick rundown to the actual environment that we have we started out with a larger deployment at the mass open cloud at say Hosted at a at the Boston unit at the data center in the Boston University Then we have another deployment at Hetzner, which is a German rec space kind of provider. So there's also a deployment. So boom now. We're multi-geo. We're in two Geos having deployments there we have Glasses running in AWS. So we're really hybrid We have something on-premise running in two data centers and we have something in the cloud environment So that's pretty much a multi-geo hybrid cloud environment that we have here Which you can inspect on your in your browser and see how it's being set up with all the good GitOps Principles and Looking into the future. We try to expand that into maybe IBM cloud Google Cloud or other Educational data center providers. So going a little bit up the stacks. Obviously, we have workloads running there most prominent prominently the open data hub we just saw previously we have project of running there another Project out of the emerging tech group. So we really use that environment to also do our own prototyping there. We have communities out of Out of the Java world out of the Python world Epicureo Quarkus Java and pop is a Python index Then You also want to do some management and automation ACM is advanced cluster manager, which we use to set up the Kubernetes and open shift clusters. We have our go CD for continuous integration and continuous deployment There's proud from the Kubernetes community installed there being used. We have tecton pipelines for all the pipeline goodness And we try to treat everything as a service. So we're open to installing alpha versions beta version of operators as long as they are installed in a GitOps fashion. So if we break the cloud we can roll it back and expose some early bugs But we also want other members of the environment to use these components as a service So that we can get this integration benefits from testing several components of the cloud. How do they work? Not in isolation, but in a real setup where they integrate with other services and And last but not least all the operational data that we're creating so that we can have traces logs metrics from real production environments that are licensed under a an open source open data license so that Machine learning people and other ops folks Can dig into them and inspect them and maybe take some information out of these for their own problems. So in the future I imagine a Google search for a stack trace or something and I'm not just in a stack overflow issue But I'm actually ending up in the operate first community cloud and I see how a problem Appeared and how it's being solved because of the locks all the all the data that leads to that problem is still available for inspection So I will try to take you to a Less complicated example how we deployed a service on the operate first community cloud and then how you can replicate that yourself or how you can contribute to this so Peribolos is an application Which does declarative? GitHub org management we all love Kubernetes because it's declarative you just put out your configuration and eventually it will end up in your The cluster will end up in the state that you declared A lot of people are also using github for their development For their development environment. So how about also declaring the state of your github organization in a declarative way? This is what peribolos did and that's has been released to the Kubernetes community at kubecon in 2019 but it's it's an It's an app leak a binary that you run on your computer. So instead of Instead of running it yourself. How about running it as a service? So we created a github app to run your to run the peribolos executable and Provided to the to the community provided to everybody for use and You could stop there and just use it install it into your repository But maybe you don't want to do this because you don't trust the people that are running the service Maybe you want to improve How it's being run? Maybe you want to contribute to it. So this is How you would go about installing such an application on the operate first community cloud? How we're doing on time good? so I I laid out my Architecture, I like created my my my code what what not now now What's next? How do I get access to it and you go to the operate first? website go to community cloud and you have links to the clusters to the Environments that we provide that are provided you can Either go to this get support Link which leads to a support repository and you could file an issue here to Get on board it to one of the two one of the clusters or you could Do it yourself because all the runbooks and the documentation on how to do that is pretty much documented in these in these runbooks So that's that's the material that usually your back-end folks get your operational folks get your SREs They have Documented how to operate your environment and this is what we're doing in completely in the open. So you could either Request somebody else from the community to help you guide to it or you run through the runbook yourself so you go to cluster management and Look at onboarding a project and then it's all written down here how you would do that yourself So here are the pull requests created to onboard This project you start with creating a namespace then You would want to also deploy your application and as you see These pull requests were created by multiple people from our team. So we really spread this These tasks to people that were not so proficient in doing these things because we thought If we let it be done by somebody who's already proficient and experienced with it doing it nothing gained but Let somebody else do it and let him learn it and then create this pull request by By himself or by herself We get a validation of the runbooks and the documentation and improve that in this regard so now that we Created the pull requests How do we get that deployed to the Clusters and this is where we are following a GitOps procedure So in GitOps you want to have everything defined as as code like Kubernetes everything is declared how your cluster would look like but he also wants to declare in a Git repository how your application is being deployed The good thing about it being Tracked in version control is that you could roll back Every change, but you also have a history how it's being deployed for later inspection so people can look at it and really track and Follow along how this application how this thing was being created and anticipated so you writes down in a YAML file and then you get it actually deployed on your cluster and we've chosen chosen our go CD as our continuous deployment tool because it's a cloud native Tool it really fits our infrastructure. That doesn't mean that we will always stick to our go CD So if somebody has different needs for deploying something or wants to try out Another continuous deployment tool you can bring that to the community and deploy that tool there It's but in the end our go CD is Available for the community to use and you can use it without setting up your own Argo CD environment and you can inspect it and and see how it's being used without Setting something up yourself so Argo would monitor the cluster state and if a change happen, it takes action and then it will apply the the the change to the cluster so that Finally Results in the desired state and then continues this this loop on and off so going back to our browser. This is would be our Peribotus application running here and you see it's it's synced and Here are all the resources that are being deployed like Service monitor Custom role bindings etc. So you Can log in with the our GitHub account here and see how this application is being deployed You can even go one step further and log into the back end of The open shift cluster and see all the tasks being executed to actually execute these Peribotus actions that you have been using as a service previously. So if I'm at Installing my app in my GitHub report in my GitHub org it will trigger some task runs and instead of just relying the service doing its thing and then Performing the required actions I can go back to the back end and see how it's doing these actions And if they are failing I can inspect why they failed and maybe open up a a change or Help fixing this so if I'm still locked in I can go in here and click on the logs and boom I see what What this thing did I? Can also go to a public Grafana Dashboard and see see about the state of of of the clusters and Work on the metrics that the Peribotus application Exposes and actually that's that's something That you could contribute to in in this in this project Here are some some issues that are open up for Community contribution Well, we will help you and guide you through the process of actually doing this this some kind of back-end work good and that's That's it. This is how you can Either follow along how a service is being deployed or make changes to this service That's deployed or you can fork the service and insensiate a your own version of that service and try out things try out Modulations to that service or learn from that service and and Applied to some other bot so you don't really have to start from scratch and you don't have to I Mean you still have to read the documentation, but you can actually inspect a running environment a running instance of such in such a service and They learn from that instead of just Doing it and inventing it all from on your own so here are the links to to our website the github repository and Most of the applications that we have running here and That's it and if you have you have some questions Now we're fighting for questions We have stickers so if you have good questions or if you have Bad questions. There are no bad questions Good question Issues is there a way to Very or come work with us and we'll figure out how to make that part work because I know that there's a There's a lot of a lot of that stuff out there in terms of wood. Is that answer the question that you were asking or Okay, yeah, I mean I was sure I was interviewing for a moment I thought you're gonna be asking if we why we didn't use Jira instead or something which is a whole other topic that I'm not going to answer Okay, I didn't for sure. Thanks. Yeah, and to add to that We have projects that are tracking their issues and changes in Jira. So Issues dot red hat comm is based on Jira and some of the open data up folks are using Jira to track issues And this is where you essentially link back to any other issues to any other community And that's super and essential to have that to either for one have community metrics on how are we impacting other Community so that we can count we have these issues Generated in the operate first community which link back to another community And then it's just a matter of linking back and forth But but ultimately I think you've you've connected in with one of the most important aspects of what we're doing because in a An open source projects like communication and lifecycle thing You've got these like synchronous things like we're doing here Which it doesn't really scale very well the async which has to be a form or a mailing list for like that low barrier to entry And then you've got your library right that the durable artifacts and everybody's got our own and that but that's what we have to put Our energy to just saw an automating and cross-connecting things because then it just makes everybody's life easier I mean the other ones put energy into as well, but These are the ones we can fix with software best so Any other questions here online The makeups was the So that the my operate first slide with the animation that was the first time that I've presented that one today It's a fresh one did was how did that go for anybody did that work? Okay? Any question get a plus one over here. It was good. Okay. Thank you I was my question for the audience you have any questions for the audience No, okay. Well, yeah, we took your questions. You took our questions I think that means we must be done then. Thank you very much everybody. So we have to stick us up here Come come forward and grab some stickers and call to action go to operate first and Click all the links check out the new website tells you what to do. Thank you