 I can't just start. I can't do it. Okay. So good morning. It's 10 a.m. in lovely Czech Republic. Everybody had a hangover from yesterday's party, I guess. No, we're still at home. Maybe next year. So operate first. How to open source cloud operations. That's something that we touched on yesterday at the keynote. I don't know how many of you have been there. I talked about it last year. I talked about it at DEF CON. That's something that I've been working on for the last 1.5 years or two years or so. And it evolved from one idea into something larger than that. Maybe it's I have confirmation bias or something. But I think it's actually a paradigm change. Something to re-envision open source in the cloud age. And I will talk a little bit about the philosophical underpinnings, how we came to that conclusion, where it all started, and basically the mindset behind operate first and what it actually means. And then I will go into the state of the systems, the state of the community cloud that we are actually running. So it's not just something that we talk about, but it's something that we actually prototype and implement, which doesn't mean that we should be the only ones implementing it. So it's really a call out to action and a phishing for new members in that community and contributors. And finally, I'll also talk about community metrics, how the community is composed and on what trajectory we are. This is my corporate picture. I'm a senior manager at Red Hat in the office of the CTO. So that's super nice because we can play with things that are completely open and that's envision how the state of affairs looks like in five years from now. So it's a little bit of a startup culture and playing with toys and stuff. I'm working on open services and how well open source can impact services. I worked on AI and I've worked on AI ops. I think all mesh very well together. Without AI in the future, I don't think services will work. Anyways, this is my picture on the internet. I'm a, what does it say, old school open source hacker and demon zombie slayer at B-format and Red Hat's office of the CTO. So let's to get some credibility from the cool kids. Going back, I don't know how many of you saw that thing. Like what is it, 30, 40 years ago? I certainly did not as a grownup. I did an internship at IBM and I was like 11 years old and they took me into the mainframe room and there were these cabinet sized machines and only with one terminal. And actually that's what computers were like. You would order a large rack, large set of machines. IBM would deliver them and you had this interface to these machines and you could pick up the phone and say, IBM, I need more compute power and they would go into the machine and enable one more CPU or one more something. And the only thing that you got was that manual to interact with the machine. And that basically sucks because you are just a user. So you cannot really innovate on top of it beyond the manual and beyond what the vendor allows you to it. And we all know this story. Then Linus Torward said, oh, I actually want to hack on that stuff. This manual has defined system calls. So I would just re-implement those system calls. And hello, my name is Linus Torward and I have renounced Linux as Linux. So that's happened. And that's brought this whole open source community, free software, and people got together and innovated on top of it because suddenly we had this contributor file. We had people that just used the software that, let's say 100 people use the software and then people were asking, so how can I contribute? And they, maybe they write an issue, maybe they talk to the community and the community people that actually implement stuff do those fixes or maybe they contribute their fixes themselves. So you have a funnel where out of 10 out of 100 people using the software, maybe one person actually contributes and contributes code to the software. Not saying that the other contributions on top don't matter. They all matter because you also need users as a feedback pool. But open source actually made it possible so that the contributions flow right into the code. Now today we are in a similar situation. The age of cloud has dawned on us, the age of scale. I think we're going back to mainframes. How many people can run an actual production setup on their computers? I'm not using a local mail client anymore. I'm using a web based mail client. I'm using documents in the cloud. Basically I could just use a tablet to do all my day to day work. Although I still have a beefy machine to do some coding from time to time, but it's still only small parts of the whole setup that I'm doing locally and then deploying into the cloud. So you will never run a whole setup on your machine. And if so, it's always in a downscaled version of set software. So the times where we just did configure make and then we had the full stack running locally, those are gone. And in order to develop against a cloud setup, you want to have that stuff always running. So suddenly availability, scale, performance, basically everything that touches operations becomes equally important, if not more important to you or to users. Because in the end it doesn't matter whether you're coding against a PostgreSQL database, a MySQL database or a Redshift database from AWS. You worry about a database that is available that you can use. And this is a tweet from a person, Matt Assay, who was working at AWS, I think he moved on. And he said, what happens if you open source everything? That's exactly what Yagabyte, they are doing a I think the database thing, did when it's dumped open core to instead release all of its code as open source software. So they were not selling just parts and they were not distributing just parts of their software as open source, like the open core model, but everything as open source, which is great. So suddenly you're not trying to make money out of a reduced feature set, or an expanded feature set, which is enterprise ready, but you're basically giving everything to the community. Offering customers a managed service. How's it going? Blah, blah, blah. And if you really look closely, is everything really everything? No, the important part why and how they can make money from their software. And it's good that they make money out of their software is not the code, not the features of the software itself, but it's the availability of the software so that they are offering it as a managed service to customers and that they operate it. And they certainly did not open source their operations. So you cannot go into Yagerbyte's back office and look at the issues there, why the API call didn't perform good. The only thing that you can do is file an issue, say, hello, this is running slower, or this is cancelling with a four or four error or five something error. Can you please look into it? So it essentially trains this nice funnel that we had previously. So you can still use the software, you can still use the service, but you're basically stopped at the boundary of contribution. I even bet that the issues that we're opening against those nice services running in the cloud are not open because only the back office can see them. So I cannot see into which issues other people ran for that cloud service, unless they maybe post it on Stack Overflow or someplace else. So I'm essentially limited with these hyperscalers, which is bad. So that's the bold statement that I put out there. If the value in IT is an ops, an ops are proprietary, then open source has a problem. And that's how we started out with this operate first paradigm. We needed to build operability into our software. And essentially also to ship software to customers, so that customers don't run into issues that the community or the people that built the software might run into. So basically shift left in your development cycle includes developments, includes QE into your development pipeline really as early as possible to understand and to discover operational problems, to fix them before you ship them and to also deliver software more in a service in a cloud fashion where we have updates and upgrades continuously and not just release cycles every quarter or every year. You get a new update which fixes bugs, know what you want to have continuous integration and continuous deployment, maybe also for the software that just works over the air. Operating your software in a cloud environment, in a cloud native environment, in a hybrid cloud environment as people, as operators doing stuff and then codified as operational, codified the operational knowledge, built into the software. People from the cloud native world know this term of operators. I think it's just a concept, how we codify this operational knowledge. Boom. So say it in another, in more, in another way, operate first is a concept or an initiative to operate software in a production-grade environment, bringing users, developers and operators closer together. And at Red Hat we try to establish, operate first as a basic tenant, just like we have upstream first as one of our tenants, one of our principles, how we operate and how we, how we build code, how we build software. Now, one part, one fun part of working in the office of the CDO is that you can re-envision stuff and you can take a step back and think about what are we doing here exactly. And I think it's really using the power of open source and apply that to these new domains, to operations and to data and maybe also to even higher up the stack. So in this case, open source applied to that context means that you have read-only access to all the data, that means issues, that means metrics, that means logs, that means deployments, artifacts, basically like you can have all access to, to all data, all artifacts regarding a software setup, a software project, you can read the GitHub issues, you can read down to the code, you can see their build pipelines, a cloud setup and then deployment, all the data expands to exactly this, logs, support issues, incidents, but also making it really easy to onboard people. So don't stop or don't require really, really specific knowledge for people to come to your community, but also embrace just users of your service and make it easy for them to grow into power users, grow into contributors and have something for the power users already. So on that whole scope of beginners to experts, you need to embrace people coming to you, essentially to establish that that contributor funnel, read, report and resolve issues. So on the community side, what personas are we targeting? It's certainly those that operate the platform, that operate software, the workloads on that platform, the software stacks running there, the developers that develop parts of the platform, so a Kubernetes developer would be part of that community, but also somebody who develops a workload running in that community. People that use the software because without users, it would be a sad deployment, nobody would use it and you wouldn't run into those edge cases. So you need to expose it to people because they will click the buttons in an unexpected way. Product support meaning that also encompasses documentation and so really find something out about the user experience. Maybe it's not so obvious how you would envision this feature to be used. Architects, software architects. I think it's more like LEGO bricks these days where you build something new for a new vertical or new use case and you take bits and pieces and just stick them together differently, put some glue in there and usually we do that with a dedicated demo environment for a use case where you have to build out these 80% from the bottom and in this environment you can mix and match stuff and have it long running in a community environment. And finally also why we love it AIObspots so that at some point they do most of the tedious work and the chores and we can enjoy funnier stuff. No, that stuff is also funny, but even greater stuff, I don't know. So in one sense it's a hybrid cloud environment with full visibility into the operations center. It's something that you can really touch and this is this URL where you should go to operate-first.cloud that's your entry point into our system. So what are our systems? The environment consists or we started out with a bare metal deployment at the Massachusetts OpenCloud MOC that's a university or a collection of universities in Boston University in Boston and that's like I think it's 20 nodes with some bvcpu so it's 150 cores or maybe 300 cores. I'm not 100% sure but it's not small and we extended that to a deployment in Germany where we have some cloud bare metal deployments and we have deployments running in AWS. So it's already a hybrid because bare metal on-premise install and cloud install and multi-geo because US cloud and EMEA deployment. So that's pretty advanced and we're extending that to other universities and other clouds. Now that's already super cool that you have such an environment which is operated in such a fashion. You can see how people are running clusters, how they are deploying and managing clusters but you also want to run stuff there and we started out with open data because of the shared history with the team that initially started this environment and because we had users there so an open data hub is being operated at scale in that environment. We always upgrade to the latest and greatest version and we have users using this environment and although it's a non-SLA environment which can go down at any time we still have some nines of uptime so people can actually work there. We have project Thoth running there which is AI guided dependency management and software stack resolution and which also contributes CI CD pipelines and bots for your day-to-day software development needs. We have some stuff out of the python and the java world, apcureo, quarkus and pulp which is a python index all running in that environment and used. So actual stuff there and in order to deploy and manage those workloads we obviously need management and automation so we are creating and curating an environment where GitOps is lived and practiced with a mindset that is forward looking where we say okay let's do it right and don't care about legacy system. We can always we can always scrap it and try out something new where we think the future is going to and where the best practices are leading to. So we're using advanced cluster manager for deploying clusters, we're using argocd for deploying workloads and we're using prow for CI jobs, we're using tecton for CI pipelines etc and we try to treat everything as a service and embrace the concept of operators because that's where the future is going to and that's how you would deliver parts of your project. In a more stable production environment you would have a hard time convincing your operators to run a better version of an operator. Here you can just contribute that operator, deploy that operator with the ops team and see if it breaks, probably it doesn't break the whole setup otherwise you would roll back and offer that service, offer that operator as a platform service to users or to power users yourself and get immediate feedback and last but not least and I think that's the most valuable part because it encompasses this notion of we want to share knowledge, we want to act as a catalyst of a community where you come to it and see how stuff is being practiced. So you would see those blueprints, those best practices documented, how to manage secrets in your setup, how to do alerting for a multi cluster setup. All these decisions are noted down and are captured in architectural decision records to go back to and either read up for your future self or if you need to set up such an environment and the operational data that's being captured there, all the metrics, all the logs, all the incidents are also there and people that want to train AIs or models can use these as a data set for doing AI ops or if you have a similar problem that you're running into you would google it and maybe you are directed to a github issue where we already debugged a similar problem and you can follow along to the post-modem of that issue, how we recovered and maybe fix it for yourself. So I'm not aware of anything like that where you would have a larger set of actual operational data available for free which is not just a snapshot of revised data from a setup but that's long running data available of a real production environment where we add data and metrics of the current state of affairs. I re-rend these numbers or I re-rend these numbers, I cannot execute Jupyter notebooks anymore, that's how far of a senior or senior or old manager I became. So these are the metrics from 2021, we have still 26 repositories in our community and the number of external contributors versus internal contributors actually increased. So a couple of months back we had like 80 to 20 and now it's more like 90 to 10 as a ratio which is great. So 232 external contributors plus 28 internal contributors makes a total size of 250 people running in this or working in this community and we have a steady line of issues or an increasing line of issues there. So it's luckily it's not a hockey curve so it would become unsustainable at some point but it's a nice and steady slope going to the upper right. We had a peak in in January that's probably because people are coming back from the PDO and realizing oh I need to actually my new year's resolution is to open more issues I don't know but it's going up so we have like 150 issues per month being created and this is the numbers across repositories so obviously the apps repository this is where we store all the deployment manifests basically the whole setup of the whole cloud environment has the most churn and there are 1500 issues created over that over that time and then the support repository that's number three here has also some good issues good numbers and the orange part is the external contributors versus the was it the internal let's see yes orange is internal which is not which is wrong here so orange is the external part and blue is the internal part so you see most of the oh yeah two minutes I will make it in two minutes and obviously you see most external contributions in the apps repository for people onboarding their stuff and for people asking questions in this in the support repository and contributors by issues that's a similar pie chart then previously that's a little bit in the reverse where most of the people internally in that community are working on the issues so here we have an 80 to 20 balance 20 people external 20% of the external people actually working on issues and 80% of the internal people working on these issues which is normal and which might be also not so true because at some point somebody external just is invited to the GitHub org and becomes an internal person so that will probably always stay the same that ratio so now it's the call to action go to operate-first.cloud and just browse around if you are familiar with exploring GitHub projects that's not different we have these GitHub orgs and you see the issues and see what's going on there and you can deploy demos because the only thing that you need is a GitHub account to access all the clusters so you can go there right now with your GitHub account and go to the web console of an OpenShift cluster this is the website this is your entry point for a developer this is the entry point for a operator for an SRE person and up there in the corner the black and white issues not visible with bad eyesight but the matrix one is the pop-up for the OpenShift clusters for the clusters the GitHub icon the Slack icon in our YouTube channel which has also a lot of content boom that's it thank you thank you Marshall for your amazing talk and I hope my mic is clear yes your your audio is clear now that's much better thank you for listening and thank you for showing up so early yeah weird we had one question but it was already answered in your slides so I think we're good thanks again good have a nice day you too and looking forward to see you in person next year bye-bye