 Good well welcome welcome to the session. We'll be talking about open operations. That is some of these ideas that Emerged during the work we were doing on sovereign cloud six. So that's the project that post Felix and I are working on And we are both working for the open source business Alliance, which is actually the Organization a nonprofit organization from open source industry mostly in Germany Which I received some funding from the federal government in Germany to run this project Was that was just jump right in This works just yeah, I probably you're probably aware of this We will publish the slides also if you want to have the PDFs on our website Which we do with most of the the presentations we give and of course there's the recording which Can be watched as well Okay, and with that it's my pleasure to actually introduce my colleague court court has a long history with open source as you can see from all the logos he's been a long-term contributor to various projects and Initially studied physics and we both work on SCS. I joined to that project in February and court has worked at Susie and as the one who's initially responsible for the open telecom cloud and build that and Aside from that you see he has been active for a long time. He's very passionate about open source and actually my personal Best fact about court is that he's one of those tech guys that actually think outside the box So it actually makes a lot of fun to discuss with him things that are not only technical But all sorts of other things which is also why he qualifies to actually hold a talk like this about open operations It's not purely technical Well, I guess some people call me opinionated, but that's that's fine. That's what I like about you Okay, my pleasure to introduce Felix he got infected actually with open source in 98 ish I guess went to this big Linux world expo 99 in San Jose which then finally got him and then long-term contributor to open BSD also open Darwin until that somehow got a bit disrupted by actions spoiled By actions from an unnamed company He was then founding bite mine. That was an open source company infrastructure company, which he Well ran for a man number of years and that also stayed with Within the team for for quite some more time He's been working for grid scale for a number of years now also been in the Board of the open source business alliance for for many many years and actually when we started this project and Decided that we would do it inside of the open source business alliance He was one of those people we supporting it and actually I'm happy He's now always working directly on sovereign cloud stack as the product owner of the infrastructure and operation teams which is great and a small side note for so for me, this is the first Summit of the open infra foundation court I'm sure many know him since he's also a member of the board at the open infra foundation But for me actually all the open stack stuff is fairly new since so far why I worked. We did not use open stack So I'm really happy to be here and also meet lots of pay faces from the community and Without further ado Quote, why don't you start? Yeah? Well, it's it's it's definitely and it's a great summit I mean I've been to yes a number of them 2012 was the first one in San Francisco So I just maybe quickly looking back why we're talking about open Operations and I want to look back at what what we did with open source when we started I started in 94 when I got this Linux infection And then eventually actually got to work with some very smart people that we are the ones that actually created Linux Which was fun and I think at the time a lot of us had this feeling like While this IT thing is really becoming larger and bigger and more important and it starts to control our life and we wanted to Make sure there's there's at least on our own machines We have some level of control of what's running there and what's happening there and I Guess some of us were hoping that maybe what we are doing does not just help ourselves But maybe there's others that can find that stuff useful and do it But it was this this David's versus Goliath feeling that we certainly had Compared to his large proprietary operating systems out there that kind of defined the market That was 95 Hobbyists that kind of tried to solve their own problems and started and learned how to collaborate to do it to do a Bit more in a team than alone 2020 things is of course the world is different. So we have open source everywhere. Um, I Think almost nothing of the IT infrastructure we use these days would work without open source Internet TV smartphones routers cars It's it's it's everywhere. So we should I guess we should celebrate say everything's great and Unfortunately, it's not I Mean there's a lot of great things actually isn't there's there's a lot of companies We have a lot of people that actually are paid for working on open source We have companies that have created business models that work Most people these days that do work on important projects actually get get a paycheck for doing that Which is which is great, but not everything's great. And there's one thing I really noticed there was like a Time when every other week there was an announcement about some great new open something project and When I looked at that I really got annoyed because they used this open term and when I looked at that The only thing I saw was a closed door. It was not open Maybe some part of the code was open source. Maybe not even that Maybe it was an open core model Or maybe it was open according to some very strange definition of opens not even an OSI compliant open source license so and I think we still as an open source community need to be very careful when somebody is Pushing something out as open, please look twice make sure it really is open source and Sim similar discussion with transparency. I mean The picture I chose for this is like those blinded pictures where you can easily look outside But you cannot look from the outside in So transparency really always needs to be dope both ways to be valuable The community were part of the open infrastructure community, of course has seen this and We talk a lot about these four opens in the open infrastructure community I think that's really important because in the end when we say open source in the open infrastructure community it is not Well, it is the license and when we say open source It's not just like an open core model when we say all the software needs to be fully open source And then of course, that's not as Helpful if you have projects that you would like to to contribute to because that's what open source projects after all are useful to and then there's no way to actually contribute to become part of that community to Insert your ideas code Contributions into the development process. So you really need to have an open development process in an open community Where the design is? Worked on in an open way so it can be Contributed to and you can understand how decisions are being taken And how you can maybe influence those decisions or become part of decision-making process So that is that is what really open means to me and I fully I fully Subscribed to the four opens that we have there. So I think we got that one square the way So was that one squared away? We should have one Or maybe we haven't I mean we should have one we should have this wonderful world of everybody building their infrastructure based on open source software having these nice little Clouds out there that work together When users have needs they can find one of these clouds one of these sheep on the picture and get Their needs being serviced and the reality we see actually is we have a small number of very large providers That dominate the market hand when you choose one of them It's actually very hard to change to get away from that and use another provider or start building your own and get control of your infrastructure So that's the reality and How could that happen? Is the question I'm asking and We should have solved the problem. So we have open source which allows us to collaborate on software But In the end today's infrastructure may have may is maybe it's a bit more difficult than the operating systems we built 20 years ago when we started with Linux We need not like making one Projects accessible, but we need to compose Infrastructure in order to have a modern it platform with cloud container infrastructure you to compose hundreds of projects in a Consistent way to make sure things work and that's probably more difficult and we probably need more Collaboration than we've had Before to create an operating system. So we are working on that I'm Trying to make sure that we have all the projects we need make sure everybody observes the four opens make sure we then establish the Collaboration across them and then obviously deal with the complexity of having distributed Dynamic infrastructure and that's something I mean we need to just be aware. It's just very very difficult and hard problem and It needs a lot of collaboration and it needs a lot of skills this picture actually I stole from a Keynote in in Texas from the open infrastructure summit where a company tried to understand the flow So it is this is five or six years old But I still think it's a good illustration of the complexity of chess just open stack And you need more an open stack in order to build modern infrastructure and Then once you've built that which is hard enough you need to operate it and I've seen Really good companies large companies with skilled teams that have failed to operate cloud infrastructure and There's a lot you need to learn before you can do that you need you need to learn you need to find the right Skilled people you need to build the right things so the one thing I We have observed this We have learned that complex systems are best run in a DevOps manner where you Start to tell to have engineers that develop the platform think about the operational aspects and you have the Operational people actually have development skills to automate and solve all those problems now We have been very successful establishing this collaboration on the development side of things We're looking at the operation things not so much actually so I think we are lacking on building structure Methodologies ways that we share knowledge that we work together to be collaborate on the operation side So this is what we want to discuss with you today Here you see a quote from the SCS website where when we started already claimed of where we want to go and I will quickly actually read it despite I usually hate reading from slides But I just really like that quote by sharing and documenting best practices For operating such cloud stacks as a difficulty to provide high quality cloud service internally or publicly is vastly reduced and By sharing and documenting best practices we do not do not mean internally, but actually Throughout these peace collaborating on the CSP layers so that the operators actually share best practices which with each other and So we actually kind of started to think about maybe it's time to propose a fifth open to add to the four fantastic opens we already have the paradigm of open operations and Of course There are already many ways we address the operations challenge by having Toolings that we share that we publish Antibody playbooks Dashboards that are shared on on sites that are shared in communities with the SCS reference implementation for example, we actually ship a whole bunch of them and For example, we have the OpenStack Health Monitor in the SCS project where we actually Started to monitor in a behavior based way from the outside Clouds that are based upon SCS to actually make transparent how they perform And I just used the word make transparent because we actually want to make it visible to the users How is the cloud environment of an operator? Functions and performs to make it visible what the user and the customer gets Yeah, maybe just adding to what you said I mean this the tool can be used by the operations team of the cloud The same way that actually users just tenant users without any privileges can use it to see the status of the cloud So we can we can use that to monitor SCS environments If you like if those environments like to be monitored we wouldn't even need to ask them for permissions exactly We do we do we don't want to pay this whatever 50-year resource cost Okay, and also there will be a talk tomorrow By Matthias and me on observability on OpenStack where we also gonna dive into that a bit more But now let's stick with the open operations So is open sourcing the tools enough and I actually say no That's that's not what actually cuts the whole deal Because if you actually look at modern cloud environments, and if you remember the keynotes from this morning Developing software and sharing software is only part of the game So what it actually goes into that whole game is people as well as processes And if you if you look at the image Basically wild tools and we all love tools are important They are basically the icing to the cake if you don't have the people and if you don't have the culture all tools will be useless well mostly and On top of that come the processes and basically this is what I call the iceberg effect Thanks to Tasha for actually visualizing it in a nice way Because often if we are among communities We talk about tools because it's much easier to obsess over tools than over culture and actually dive into problems of culture And be the transparent about culture as well as processes. I mean I'm very sure some of you can already tell a story about how they tried to actually Open up about their processes and were not allowed because of some confidentiality issues or some other stuff Or maybe because the processes were just broken and one doesn't want to talk about broken processes So with that I want to dive into the subject of psychological safety for the basically in the people layer and There's a really really good book called the fearless organization by Amy Edmondson of Harvard Business School and In the book fearless organization. She actually made a few case studies Not in it but in other industries and so one example I want to talk about is actually from a hospital where she analyzed various emergency room teams and it was Apparent from her case study that actually teams that reported more errors were actually more effective So what does that mean? It doesn't mean that better teams don't make more mistakes They are just more likely to learn from them and that's actually one very important lesson is basically errors are the best source of learning and It's important to actually start developing a healthy error culture because without having a healthy error culture within the organization or the company You will not be able to do good root cause analysis In order to actually have a good root cause analysis You need to have the mindset in the organization to be open about errors to actually Be allowed to talk about errors and admit them and that is actually a good way to then go to the next level if you actually have internally is the right mindset to Go for public root cause analysis and court provided these two examples Do you want to say a bit more about those? Yeah? Yeah, I mean just the idea you just gave us that Having the right error culture to learn from it I mean, what if you allow others to learn from your errors as well. So overall as an industry or as a group you get better I was Watching obviously DIT space always and looked at companies and looked at what direction they take So I think it was the the early 2010s when I was really counting Microsoft off. I saw they are Not very impressive hyper V technology. I Saw the Azure stack and I saw well, okay I don't think they are very competitive and then maybe a year or two later I saw a root cause analysis report from Microsoft about an outage they had on February 29th of 2012 The exact date is not a coincidence because it had to do with the leap day and They had really written down all the different things that went wrong in the infrastructure and just Reported about it publicly and how they analyze that and how they struggled and how some safety measures They had in there didn't work the way they should have worked And I was deeply impressed. I was also thinking oh shit. We need to take Microsoft seriously again so But yes, I mean I And I'm not sure I told this to my open telecom cloud friends when I wrote the the public root cause analysis on our net network outage in 2017 I was inspired by Microsoft. So sorry But yes, I think I think in the end we as a team were stronger and I think that Customers that took the time to to understand what went wrong and also learned how we took this seriously and How we improved and learned from that had more trust into us than they had before so I think that that is something that does help Yeah, where wherever I actually build up operations teams I like to actually quote an open BSD developer or former BSD developer Art Krabowski from Sweden And he actually said early 2000s only failures make us experts And that's a very simple statement that I've been carrying with me ever since because it's such a good statement because it just tells everything so and Just one important note Basically as we go down the stack from tools to processes to people the more controversial it actually gets quickly, let's glance at the processes in Preparation for this talk and we have been talking a lot about processes I just wanted to make sure to have one definition of processes in place. So if I talk processes, I don't mean Railway tracks, I mean actually Rail guards like on an autobahn that keep you in the lane But that don't tie you down to exact behavior because basically good processes allow for some flexibility but that guide you actually through the process and I was searching for good examples of open processes and One I found was a good that probably known to quite a few of you They basically opened up almost everything in their handbooks. It's publicly available and I would just recommend go read through it. It's an inspiration and If we give this talk again, I would love to have an example from here that we can also link on the slide Another step in the right direction I'm sure everyone is aware of the site reliability engineering movement that started a few years ago and Google actually Put out a few books on site reliability engineering that are also publicly available. So follow that link I've never worked at Google. So I'm I cannot judge whether the content they published matches what's lived I assume to a fair degree but that's also a very very good good example and Important is that For us all to be successful is share knowledge share status share challenges and write group public root cause analysis and For example status pages of we're actually in the SDS project with a few C. Spee just started a debate on what a good status page should actually feature and what it should offer also for the operators and Actually having a status page that transparency reports all your stuff is really valuable and it builds trust Into but it builds trust but for the users to actually that they know when there's really something up You are gonna tell them and that's not to underestimate So court do you want to have a few more words about SES? Yes, so I mean That the reason we're talking about open operations is because we really needed for sovereign cloud stack to be successful because if we just Work with providers existing ones upcoming ones to to standardize and build the infrastructure. We still have a Too high hurdle for them to be successful if we don't Make progress on the operations topic and that's why we're really building this open operations community With the mr. Community that's working with us with providers mostly To to overcome some of that and share share the learning and allow To learn from each other and not start from scratch whenever a new a new community member joins the provider Crowd so that's that's how we want to make it sustainable We have a few folks that are part of this project full-time employed by the open source business lines We're still looking for great talent. So if you're interested Please come talk to us. We're hiring like I know it's a small court Okay, maybe we are hiring one slide on this But yes, but of course also just participating in a community contributing is very valuable for us and overall I mean Working on digital sovereignty having platforms that can be influenced that can be Develop the way you need them is what we want to make easier to achieve. I want to Close with I guess we should always close with a culture to action. So please join us That's the culture action. There's a few things you can do become part of our community But I also want to make you aware of the operate first initiatives that is not exactly the same as open operations But we we share a fair amount of common themes and discussions So we'll also closely linked with them will have a forum session tomorrow morning on open operations where we can definitely have a Lot more discussions than we probably can do in the last like two minutes or so And there's a few other presentations like the observability topic that Felix will talk about Which kind of help was the the operations topic? So yes, please please join us and please have discussions Do we have time for a few questions? I Hope we do That's just If nobody stops us we do well questions feedback remarks, whatever absolutely input I think we're supposed to bring them a microphone Boss one. Otherwise, I'll just repeat the question. Okay, I'll repeat the question. So it was the Observation that there's been a lot of debate on root cause analysis, but maybe the the wording is wrong because they're Typically is not a single root cause that caused the whole issue, but really a chain of events Well Well, there's also the other term blameless post-mortem Which is also Also not liked by everyone because the term post-mortem is not really nice I Actually agree to certain reviews. It's not always a single root cause There is also a fantastic article on the net I don't have it here, but I can look it up on why using the five white take technique Is not the best way to do root cause analysis Which also dives into exactly what you just said that there might not be a single root cause and by by Focusing on trying to find a single root cause you might neglect other facts that played into it. So Yes, I agree. So we can just call it cause analysis But basically the idea stays the same as the idea stays the same that you need to be open to your own mistakes and Actually dive into those. Yeah, I would say if if a single thing cost all those failures Then your architecture probably was not very strong. So it's always it's always a chain of events and I Guess we still call it root cause analysis because that's how it's been called always But yes, if you do it well, I mean you really work your way Through that complete chain and make that complete chain transparent and at every single step you check How can I improve to to avoid this repeating? But if you look at the two examples, I think both are good examples where the complete chains have been made transparent I'm also another call to action if you have any further input on these subjects Please hit me outside and then I will get that and add it because I'm we're always looking for good examples Another question there is working there. It is working. It's working. It is working. Okay, perfect you also mentioned open operations and I think to what degree do you think this is more or less? Yeah Science fictional best wishing because I mean if for example if you look at a CSPs And they really share about how they operate their networks, especially in open ran and so forth They're giving away a key element a key competitive element to other competitors So how would you envision other than in I would say a standardization bodies? How would you envision open? Operations to really work to the benefit of everybody. Yeah, so I mean that's that's obviously the the standard Question that we do get and I would not claim that there's like the the one complete answer to that I could of course also say that like when 20 years ago. We started talking open source that we got the same thinking that Why would companies start to share their software? I Think we we need to kind of have that discussion. What do companies do in order to differentiate with their competitors? And is that really the details how you operate certain aspects of the infrastructure? Or is it other services which are a lot closer to your customer where you support your customer in bringing his applications? To a cloud native platform and and get them to work better. So that's that's kind of one question. I would ask The other point is I mean we are trying to establish an alternative to the large hyperscalers out there If we're doing this well Or I should do it I should say differently we can only succeed if you do it well and we will not do it well if like a Lumber of companies just say try to be the the single company that tries to become the alternative If we don't work together will never be strong enough to be viable as an alternative So it's not a zero-sum game actually if you build a really open Infrastructure and you do that together with with others that want to do the same The ecosystem will grow a lot more than you might lose by sharing some information that might help your competitor to be a bit better Because it's it's not really a competitor. I mean we're building an ecosystem together And yes, some customers may move between you and that's an advantage for the customers because having that freedom to actually learn that It's better to be in an open ecosystem than to be bound to a single lock in infrastructure One ad well Well, I was just about to say that is Operations really is a secret sauce that would make the difference or whether those difference are not others light like for example Excellent customer support and things like those or Superior performance because you all use the same stuff to build it, but you Basically just use better hardware or whatever. So I think it's actually and and also in collaborating in these kind of things There's actually a chance to Go after the big fish that those are the hyperscales and actually get those customers There's another question In order to learn from arrows, you don't have to make them by yourself You can't that's also very true. You can learn from arrows which other ones already have made Yeah, yeah, that's that's what I tell my kids all the time You don't have to make every error by yourself That is one thing and about 20 years ago in Germany. We had such a funny thing I know the German word and I don't know how well to translate it. It was our vice guys Yeah, yeah between different companies and exactly that happens there You know you learn you did exchange the information and you learn from experience and there was other ones I would like to pick that up very quick then, you know in the time frame of outsourcing and cost Optimization and everything that was all cancelled and it was definitely wrong. Yeah, but you just used to term Arbex Christ Which is basically working group and and just We well, we are beyond limits anyways, but but quickly So basically the working groups we have an SCS for example the special interest group monitoring that we are going to talk about tomorrow And that other talk that's exactly a group come from comprised of several C's piece where we share How do we do observability and by that each and every one brings something in and take something back and all of us get better And that's exactly that I think it's what I want to say. It's not a new invention. We're not No, we're not claiming to be new so last question here I think so Well, I think it's a change of culture. You must be more open for some things in the most cases These piece are have an incident as special or dashboards Represented what issues can be and I think it's a change of Culture that must be to really communicate failures and I think for open operation. It's a good point to The first point to document eight issues and how the operations works in the most cases every company has their own Documentation, but why not a documentation which is a standard work? Well, Matthias, I think that's good closing words. Yeah, I mean indeed I mean if you don't have a culture where you can talk about mistakes because well the theory is there are no mistakes possible You will not have open operations. Yes. Well with that, thanks for listening and be here